# Movie Reviews Sentiment Analysis
## Import Libraries

In [1]:
import re
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification

## Preprocess Data

In [2]:
reviews_df = pd.read_csv('dataset.csv')

In [3]:
reviews_df.head()

Unnamed: 0,title,review_text,rating,date,user,movie_title
0,Felt Like I Was Seeing the Inside of My Own Mi...,I have trouble turning off my brain. Anxieties...,9/10,24 May 2022,evanston_dad,Everything Everywhere All at Once (2022)
1,best film of 2022,"Profoundly deep, genuinely moving, utterly hil...",9/10,2 May 2022,movieman_kev,Everything Everywhere All at Once (2022)
2,"Don't do drugs, watch this instead.",If you take drugs for the first time and imagi...,9/10,8 April 2022,AfricanBro,Everything Everywhere All at Once (2022)
3,Fantastic,"""Be kind, especially when you don't know what'...",10/10,20 April 2022,gbill-74877,Everything Everywhere All at Once (2022)
4,The most original film ever made. Period.,Everything Everywhere All At Once is even craz...,10/10,31 March 2022,benjaminskylerhill,Everything Everywhere All at Once (2022)


In [4]:
pattern_helpful = r'\d{1,3}(?:,\d{3})* out of \d{1,3}(?:,\d{3})* found this helpful\.? Was this review helpful\? Sign in to vote\.?'
pattern_permalink = r'Permalink'
patterns_to_check = [pattern_helpful, pattern_permalink]

def check_unwanted_patterns(reviews, patterns):
    for review in reviews:
        for pattern in patterns:
            if re.search(pattern, str(review)):
                print(f"Unwanted pattern found in review: '{review}'")
                break

In [5]:
check_unwanted_patterns(reviews_df['review_text'], patterns_to_check)

Unwanted pattern found in review: 'Profoundly deep, genuinely moving, utterly hilarious, highly imaginative and a visual feast. Haven't laughed this hard, cried this much or thought so deeply about any film in 2022 Much less all in the same viewing. This was indeed everything, everywhere all at once.
2,042 out of 3,449 found this helpful. Was this review helpful? Sign in to vote.
Permalink'
Unwanted pattern found in review: 'If you take drugs for the first time and imagined Jackie Chan was a female Dr. Strange in another universe this would be it. And the synopsis is basically an Asian woman trying to do her taxes. I thought the third act of the movie felt a little stretched out but otherwise I think it's the best movie I've seen all year because I haven't laughed this much in any recent one. From the short time I spent in China, it's also an accurate and hilarious view of Chinese parents 'cause they really do be like that. I can't recommend it enough, it's so chaotic and in the middle

In [6]:
def preprocess_review(review):
    # Remove unwanted text patterns
    review = re.sub(r'\d{1,3}(?:,\d{3})* out of \d{1,3}(?:,\d{3})* found this helpful\.? Was this review helpful\? Sign in to vote\.?', '', str(review))
    review = re.sub(r'Permalink', '', review)
    review = ' '.join(review.split())
    return review

In [7]:
cleaned_reviews = [preprocess_review(review) for review in reviews_df['review_text']]

In [8]:
cleaned_reviews[:3]

['I have trouble turning off my brain. Anxieties, worries, mundane to-dos, even positive things, sometimes feel like they\'re swirling around in a chaotic funnel cloud and I would like nothing more than to sit in physical and mental silence. "Everything Everywhere All At Once" felt like the inside of my head. In a world of non-stop, 24/7 news, most of it bad, how is a person like me, who has trouble filtering out things that affect me directly from all of the other things that are just out there happening in general and over which I have no control, supposed to cope? One answer is to decide that nothing matters anyway and give up caring. But that means deciding that my wife doesn\'t matter. And that my kids don\'t matter. And that art, and nature, and things that bring joy to my life, don\'t matter. Another way is to decide that some things, ok maybe most things, don\'t matter, but that there are things that do, and those are the things that make it all worth it. I get to decide what t

In [9]:
check_unwanted_patterns(cleaned_reviews, patterns_to_check)

## Load Model

In [24]:
model = AutoModelForSequenceClassification.from_pretrained("siebert/sentiment-roberta-large-english")
tokenizer = AutoTokenizer.from_pretrained("siebert/sentiment-roberta-large-english")

config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]



## Sentiment Analysis

In [25]:
results = []
for review in cleaned_reviews:
    tokens = tokenizer(review, return_tensors="pt", truncation=False)["input_ids"][0]
    chunks = [tokens[i:i+512] for i in range(0, len(tokens), 512)]
    chunk_predictions = []
    
    for chunk in chunks:
        inputs = {"input_ids": chunk.unsqueeze(0)}

        if torch.cuda.is_available():
            inputs = {key: value.cuda() for key, value in inputs.items()}
            model.cuda()

        with torch.no_grad():
            output = model(**inputs).logits
        chunk_predictions.append(output)

    avg_output = torch.mean(torch.stack(chunk_predictions), dim=0)
    predicted_label = model.config.id2label[avg_output.argmax(-1).item()]
    results.append(predicted_label)

Token indices sequence length is longer than the specified maximum sequence length for this model (938 > 512). Running this sequence through the model will result in indexing errors


In [26]:
reviews_df['sentiment'] = results

In [27]:
reviews_df

Unnamed: 0,title,review_text,rating,date,user,movie_title,sentiment
0,Felt Like I Was Seeing the Inside of My Own Mi...,I have trouble turning off my brain. Anxieties...,9/10,24 May 2022,evanston_dad,Everything Everywhere All at Once (2022),POSITIVE
1,best film of 2022,"Profoundly deep, genuinely moving, utterly hil...",9/10,2 May 2022,movieman_kev,Everything Everywhere All at Once (2022),POSITIVE
2,"Don't do drugs, watch this instead.",If you take drugs for the first time and imagi...,9/10,8 April 2022,AfricanBro,Everything Everywhere All at Once (2022),POSITIVE
3,Fantastic,"""Be kind, especially when you don't know what'...",10/10,20 April 2022,gbill-74877,Everything Everywhere All at Once (2022),POSITIVE
4,The most original film ever made. Period.,Everything Everywhere All At Once is even craz...,10/10,31 March 2022,benjaminskylerhill,Everything Everywhere All at Once (2022),POSITIVE
...,...,...,...,...,...,...,...
1995,Great one in the 70s,Can't believe the movie was made in 1970s. It'...,6/10,15 January 2024,DrDumb,Jaws (1975),POSITIVE
1996,The best film ever made.,I saw this film when I was about 8 years old. ...,10/10,26 June 1999,baumer,Jaws (1975),POSITIVE
1997,We're gonna need a bigger boat,"Yea, ""Jaws"" is considered a classic for many p...",7/10,9 July 2011,raulfaust,Jaws (1975),POSITIVE
1998,A potboiler of the 'slow-death' variety,A potboiler with grisly action scenes that bor...,5/10,15 January 2017,zafar142007,Jaws (1975),NEGATIVE


## Save Result

In [28]:
reviews_df.to_csv('sentiment_results.csv', index=False)