Sentiment Analysis of Movie Reviews

This project performs sentiment analysis on the Large Movie Review Dataset using various machine learning techniques.

About

This project is related to the machine learning course project. Here we are going to clean data, train, test and evaluate some models for sentiment analysis. We use a lot of traditional and new models for training and testing. Share any ideas or brainstorming if you have.

Contributors

Mohammad Javad Maheronnaghsh

Data

The dataset contains 50,000 reviews from IMDB. There are 25,000 training reviews and 25,000 testing reviews. The sentiment labels are balanced between positive and negative.

The data is loaded and preprocessed by:

Removing stopwords
Converting to lowercase
Removing punctuation
Tokenizing

Models

The following models are implemented and evaluated:

Traditional ML Models
Logistic Regression
Decision Tree
AdaBoost with Decision Tree base estimator
Neural Network Models
Simple feedforward network
Deep network with dropout and regularization

Word Embedding Models

TF-IDF vectors
FastText word embeddings trained in unsupervised mode

Training

Models are trained on 80% of the dataset and validated on 10% for hyperparameter tuning.

The TensorFlow models use the Adam optimizer and sparse categorical crossentropy loss. They are trained for 30 epochs with a batch size of 32.

The FastText model is trained with supervised labeling on the training set sentences.

Evaluation

All models are evaluated on the remaining 10% test set using accuracy and F1 score.

Classification reports are printed showing precision, recall and F1 for each sentiment class.

Results

The deep neural network achieves the best performance with 63% test accuracy. Logistic regression, decision tree, and FastText also perform reasonably well.

In general, the neural network models outperform the traditional ML models. The word embedding models achieve better performance compared to pure TF-IDF vectors.

There is still room for improvement by using more advanced architectures, pretrained embeddings, and regularization techniques.

Usage

The main scripts are:

train.py - Train a model
evaluate.py - Evaluate on test set
predict.py - Make predictions on new data

Example:

# Train FastText model
python train.py --model fasttext

# Evaluate LSTM model
python evaluate.py --model lstm

# Make predictions with ensemble
python predict.py --model ensemble

References

Link 1

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Source Codes		Source Codes
Final Edition.ipynb		Final Edition.ipynb
LICENSE		LICENSE
README.md		README.md
project documentation.pdf		project documentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Codes

Source Codes

Final Edition.ipynb

Final Edition.ipynb

LICENSE

LICENSE

README.md

README.md

project documentation.pdf

project documentation.pdf

Repository files navigation

Sentiment Analysis of Movie Reviews

About

Contributors

Data

Models

Word Embedding Models

Training

Evaluation

Results

Usage

References

About

Releases 1

Packages

Languages

License

mjmaher987/Sentiment-Analysis-Project

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Movie Reviews

About

Contributors

Data

Models

Word Embedding Models

Training

Evaluation

Results

Usage

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages