NLP Integration Project: Sentiment Analysis

Group integration project of the NLP course. The purpose of this project is to implement different models and preprocessing techniques to perform sentiment analysis on a series of short texts.

Description

We implemented three model types:

Classical ML models: Implemented using scikit-learn. The current models are BernoulliNB, SGDClassifier, LogisticRegression, and RandomForest.
RNNs: Built and trained using keras, they are a Bidirectional RNN, CLSTM, and GRU.
BERT: A BERT model trained using Keras.

The inferences of all the models are then combined in an ensemble to produce a final prediction. The predictions of all the models are weighed equally.

Models

Metrics

Model	Accuracy	Precision	Recall
BernoulliNB	69.36%	66.51%	70.69%
SGDClassifier	75.44%	74.87%	75.85%
LogisticRegression	75.32%	73.57%	76.38%
RandomForest	69.50%	65.05%	71.56%
Bidirectional RNN	81.42%	50.15%	56.35%
CLSTM	79.13%	62.41%	55.25%
GRU	82.41%	65.02%	59.36%
BERT	85.38%	n/a	n/a

Training graphs

Bidirectional RNN

GRU

CLSTM

Requirements

Install PortAudio, if using a debian based linux distribution use the command: sudo apt-get install libportaudio2
Optionally create and source the python virtual environment of your choice.
Install pytorch>=1.7 using the official page according to your system.
Run pip install -r requirements.txt

Interface

The interface with this ensemble is done via the src.inference script. The colab notebook NLP-showcase.ipynb demonstrates the different modes of operation.

To use this script simply run it as a module and provide flags, which act as an input specifier to the ensemble.

python3 -m src.inference --demo

The possible flags are:

--demo: Predict the sentiment of a fixed, predefined set of sentences.
--input: Predict over a user-given sentences. The sentences should be given as a string argument, where the set of characters '&&' e.g. 'this is sentence one&&Sentence two'".
--voice: Use a voice recognition model to perform inference over an audio transcription. If the 'record' string is given as an argument, it will prompt for a recording, otherwise provide the path to the audio file as an argument.
--twitter: Perform inference over tweets from the user specified as an argument.
--reddit: Perform inference over comments from the user specified as an argument.
--data_path: Base path to the directory where all the pretrained models are stored, default=data/.

This information can be seen at any point by using the --help flag.

As mentioned in the flags, a 'data' directory is needed to run the ensemble. The directory structure must be the same as the one provided in the repository's data directory.

APIs

There are two APIs available: Twitter and Reddit. The "--twitter" and "--reddit" flags require API keys loaded as environment variables, and they take as an argument the name of a user whose tweets/comments will be analyzed.

To facilitate the management of credentials, we use the dotenv library, which allows the loading of environment variables from a .env file located in the root of the project.

Reddit API

Follow the instructions at the reddit-archive to get the credentials needed for the API use. Once you have the information, you can then set the following environment variables in the .env file.

REDDIT_EMAIL=your@mail.com
REDDIT_USER=your_user
REDDIT_PASSWORD=your_password
REDDIT_CLIENT_ID=the_app_client_id
REDDIT_CLIENT_SECRET=the_app_client_secret

Twitter API

To get this data you'll need a twitter developer account, after acquiring one you'll go to the Developer Portal to the section Projects & Apps. You can create a Standalone App and select "Keys and tokens" here you'll find all of the following keys and tokens.

TWITTER_CONSUMER_KEY="consumer_key"
TWITTER_CONSUMER_SECRET="consumer_secret"
TWITTER_ACCESS_TOKEN="access_token"
TWITTER_ACCESS_SECRET="access_secret"

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
NLP-showcase.ipynb		NLP-showcase.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP Integration Project: Sentiment Analysis

Description

Models

Metrics

Training graphs

Requirements

Interface

APIs

Reddit API

Twitter API

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

jbarap/nlp-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

NLP Integration Project: Sentiment Analysis

Description

Models

Metrics

Training graphs

Requirements

Interface

APIs

Reddit API

Twitter API

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages