This repository contains a comprehensive implementation of sentiment analysis on movie reviews using various deep learning approaches. The project uses the IMDb dataset and demonstrates the effectiveness of different neural network architectures for text classification. The models achieve up to 86.5% accuracy using LSTM networks with GloVe word embeddings.
- Text preprocessing pipeline for cleaning and preparing review text
- Implementation of three neural network architectures:
- Simple Neural Network (SNN): 74.8% accuracy
- Convolutional Neural Network (CNN): 85.5% accuracy
- Long Short-Term Memory Network (LSTM): 86.5% accuracy
- GloVe word embeddings integration (100-dimensional)
- Web interface for real-time sentiment analysis
- Rating prediction on a scale of 1-10
# Clone the repository
git clone https://github.com/sarvkk/sentiment-analysis.git
cd sentiment-analysis
# Install dependencies
pip install -r requirements.txt
# Download GloVe embeddings
# Make sure the file a2_glove.6B.100d.txt is in the project directorypython app.pyThen navigate to http://localhost:5000 in your browser.
Open and run the SentimentAnalysis.ipynb notebook to:
- Train models from scratch
- Analyze model performance
- Make predictions on new reviews
ipynb/- Jupyter notebooks with model training and analysistemplates/- HTML templates for the web interfaceapp.py- Flask web applicationb2_preprocessing_function.py- Text preprocessing utilitiesb3_tokenizer.json- Saved tokenizer for input processinglstm_model.h5- Pre-trained LSTM modelc2_IMDb_Unseen_Predictions.csv- Sample predictions on unseen data
- Python 3.10+
- TensorFlow 2.x
- Keras
- NLTK
- pandas
- NumPy
- Flask
- scikit-learn
- seaborn
- matplotlib
| Model | Accuracy |
|---|---|
| Simple Neural Network | 74.8% |
| CNN | 85.5% |
| LSTM | 86.5% |
This project utilizes the IMDb movie review dataset for training and the GloVe word embeddings for text representation.