Skip to content

nimsala1234/sentiment-analysis-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis API by Dinithi Wijayasinghe

This project implements a simple sentiment analysis service that predicts whether a piece of text has positive or negative sentiment. The model is trained on the IMDb movie reviews dataset and exposed through a REST API built with FastAPI.

The API allows developers to send text and receive a predicted sentiment along with a confidence score.


Project Structure

Sentiment-api/ │ ├── app/ │ ├── init.py │ ├── main.py # FastAPI application │ ├── model.py # Model loading and prediction logic │ └── schemas.py # Pydantic request/response schemas │ ├── model/ │ └── sentiment_model.pkl # Saved model files │ ├── train.py # Training script ├── requirements.txt └── README.md


Requirements

Python version used: Python 3.9.6

Required libraries are listed in requirements.txt.


Setup Instructions

1. Clone the repository

git clone [(https://github.com/nimsala1234/sentiment-analysis-api)] cd Sentiment-api


2. Install dependencies

pip install -r requirements.txt


3. Train the sentiment model

Run the training script: python train.py

This will:

• Download the IMDb dataset
• Train the TF-IDF + Logistic Regression model
• Evaluate performance
• Save the trained model to: model/sentiment_model.pkl


4. Start the API server

Run the FastAPI server with: python -m uvicorn app.main:app --reload

The server will start at: http://127.0.0.1:8000

You can also access interactive API documentation at: http://127.0.0.1:8000/docs


Example API Usage

Health Check

GET /health

Response: { "status": "ok" }


Predict Sentiment

Endpoint: POST /predict

Example using curl: curl -X POST "http://127.0.0.1:8000/predict " -H "Content-Type: application/json" -d '{"text": "I absolutely loved this movie!"}'

Example response: { "text": "I absolutely loved this movie!", "sentiment": "positive", "confidence": 0.93 }


Batch Prediction (Bonus)

Endpoint: POST /predict/batch

Example request: { "texts": [ "I loved this movie", "This film was terrible" ] }

Example response: { "results": [ { "text": "I loved this movie", "sentiment": "positive", "confidence": 0.92 }, { "text": "This film was terrible", "sentiment": "negative", "confidence": 0.95 } ] }


Model Approach

The sentiment classifier was trained using the IMDb movie reviews dataset, which contains 50,000 labeled reviews. Text data was preprocessed by removing HTML tags, URLs, punctuation, and converting text to lowercase.

A TF-IDF vectorizer was used to convert text into numerical feature vectors while capturing the importance of terms across documents. The classifier used was Logistic Regression, which performs well for high-dimensional sparse text data and provides interpretable probability outputs.

To ensure robustness, 5-fold cross-validation was performed on the training data before final evaluation on the test set.

Cross-validation scores: [0.865 0.8694 0.8636 0.8742 0.8688] Mean CV Accuracy: 0.8682000000000001

With more time, I would experiment with transformer-based models such as BERT and perform hyperparameter tuning to further improve accuracy.


Evaluation Results

Example evaluation metrics on the test dataset: Evaluation Results

Accuracy: 0.8878 Precision: 0.8878 Recall: 0.8878 F1 Score: 0.8878

These results indicate that the model performs well for sentiment classification.


Notes

The IMDb dataset contains binary sentiment labels (positive and negative). Therefore, the model predicts these two classes. A neutral sentiment class could be added by training on a dataset that includes neutral labels.

About

Sentiment Analysis API using FastAPI and scikit-learn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages