# **Movie Review Classification** 

**Importing Important Libraries**

pandas: Used for data manipulation and analysis.

TfidfVectorizer: Used to convert text data into numerical feature vectors.

train_test_split: Used to split the dataset into training and testing subsets.

SVC: Support Vector Classifier, a type of SVM model.

Pipeline: Used to chain multiple steps into a single estimator.

accuracy_score: Used to evaluate the accuracy of the model.

joblib: Used for saving and loading the trained model.

In [1]:
#imports 
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
import joblib

Here We Read the training data from the specified TSV (Tab-Separated Values) file into a pandas DataFrame.

In [2]:
# Load training data
train_data = pd.read_csv("mrc-train-data.tsv", delimiter='\t')

Here We Separate the features (movie reviews) and labels (sentiments) from the training data.

In [3]:
# Split data into features and labels
X_train = train_data['review']
y_train = train_data['sentiment']

Here we Create a pipeline that consists of two steps:

TfidfVectorizer: Converts text data into TF-IDF (Term Frequency-Inverse Document Frequency) numerical features.

SVC: Support Vector Classifier with a linear kernel.

In [4]:
# Define pipeline
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('svm', SVC(kernel='linear'))
])

Here we Fit the pipeline on the training data to train the model.

In [5]:
# Train the model
pipeline.fit(X_train, y_train)

Here We Save the trained model to a file named movie_review_classifier.pkl using joblib.

In [7]:
# Save the model
joblib.dump(pipeline, 'movie_review_classifier.pkl')

['movie_review_classifier.pkl']

Here we Read the testing data from the specified TSV file into a pandas DataFrame.

In [8]:
# Load testing data
test_data = pd.read_csv("mrc-test-data.tsv", delimiter='\t')

Here we Use the trained model to predict sentiments for the testing data.

In [9]:
# Perform prediction
X_test = test_data['review']
predictions = pipeline.predict(X_test)

Here we Create a DataFrame containing the predictions along with the corresponding IDs from the testing data.

And save the predictions DataFrame to a CSV file named mrc-predictions.csv without including the index column.


In [10]:
# Create dataframe for predictions
predictions_df = pd.DataFrame({'id': test_data['id'], 'sentiment': predictions})

# Save predictions to CSV
predictions_df.to_csv('mrc-predictions.csv', index=False)