# Importing the Libraries

We will be treating the problem as a multiclassification problem where the model will predict the rating of the movie based on the comment left by the movie watcher.

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from imblearn.over_sampling import SMOTE
import joblib

# Load dataset

In [2]:
df = pd.read_csv("data/kalki_movie_reviews.csv")
df.head(2)

Unnamed: 0,Comments,Ratings
0,"I didnt go in with big hopes, but i was expect...",8
1,"A unique genre, a well written story (script) ...",8


In [3]:
X = df.Comments
y = df.Ratings

# Split the data

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=33)

In [5]:
tfIdf = TfidfVectorizer(
    lowercase=True,
    stop_words="english",
    ngram_range=(1,2),
    max_features=5000
)

In [6]:
X_train = tfIdf.fit_transform(X_train)
X_test = tfIdf.transform(X_test)

In [7]:
smote = SMOTE(random_state=42)
X_train, y_train = smote.fit_resample(X_train, y_train)

# Train and save the model

In [8]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

In [9]:
print(f"Accuracy Score: {model.score(X_test, y_test)*100:.2f}%")

Accuracy Score: 100.00%


Testing on new unseen data

In [10]:
new_reviews = [
    "The visuals were stunning but the story felt weak.",
    "Best Indian sci-fi movie ever, masterpiece!",
    "Terrible acting, I regret watching this. I hate the movie",
    "It seems like the writer cum director only able to Imagine the KALKI in 2898 A. D. and got exhausted and then decided to copy all the top sci-fi movies of Hollywood from character to world building and mix it made the khichdi with Mahabharat tadka (which is the only good part).Copied hollywood ideas from :- Star wars, Black Panther, Alita Battle angel, Dune, Blade runner, Mortal Engines, you can also 1 or 2 more if you wish.Even if you convince yourself that copying ideas aren't that bad unless it's a ripoff unfortunately KALKI 2898 A. D. is a complete ripoff of those ideas and lacks essence of the story.On the other hand the special effects are also cheap levels it feels like a movie is made on low budget with unpaid interns at work and it's hard to ignore.It also feels like characters of the movie are seemingly directionless and unaware of their environment.Also the failed to place itself neither science fiction and nor in fantasy fiction both because it's distorted logic and mythological characters involved in dystopian society which tells audience that don't think too much just believe that it happens instead of explaining things.More over, the movie doesn't seem like it's crafted with passion or love rather in a capitalist POVAfter all kudos for the experiment."
]

new_tfidf = tfIdf.transform(new_reviews)
preds = model.predict(new_tfidf)

for review, rating in zip(new_reviews, preds):
    print(f"Review: {review}\n Predicted Rating: {rating}\n")

Review: The visuals were stunning but the story felt weak.
 Predicted Rating: 8

Review: Best Indian sci-fi movie ever, masterpiece!
 Predicted Rating: 10

Review: Terrible acting, I regret watching this. I hate the movie
 Predicted Rating: 10

Review: It seems like the writer cum director only able to Imagine the KALKI in 2898 A. D. and got exhausted and then decided to copy all the top sci-fi movies of Hollywood from character to world building and mix it made the khichdi with Mahabharat tadka (which is the only good part).Copied hollywood ideas from :- Star wars, Black Panther, Alita Battle angel, Dune, Blade runner, Mortal Engines, you can also 1 or 2 more if you wish.Even if you convince yourself that copying ideas aren't that bad unless it's a ripoff unfortunately KALKI 2898 A. D. is a complete ripoff of those ideas and lacks essence of the story.On the other hand the special effects are also cheap levels it feels like a movie is made on low budget with unpaid interns at work and

After using SMOTE sampling, we see a significant increase in the model's accuracy and ability to handle sentiment.

In [14]:
joblib.dump(model, "model/rating_predictor_model.pkl")
joblib.dump(tfIdf, "model/tfidf_vectorizer.pkl")

['model/tfidf_vectorizer.pkl']