# Predicting Sentiment from Tweets using BERT

The following file will take in a dataset containing Tweets, do some pre-processing on the Tweets, then use an Ensemble BERT model to predict the sentiments of Tweets. Just follow the instructions below and run each cell subsequently.

## Installing libraries

To run this code natively on your own computers instead of on Docker, you'll have to install the following libraries. 

* `pip install nltk`
* `pip3 install emoji`
* `pip install pandas`
* `pip install transformers`
* `pip install datasets`
* `pip3 install torch torchvision`
* `pip install -U scikit-learn`
* `pip install regex`

## Pre-processing

First, we'll pre-process your data to remove any URLs, emojis, or username mentions. Important to note before running this code is that you should have your dataset of Tweets in the same directory as this notebook, and you should replace the placeholder names in the code below. `"TWEETSETNAME.csv"` should be replaced by the name of your dataset, and `"TWEETSCOLUMN"` should be replaced by the name of the column in your CSV file that has the Tweets.

In [None]:
import pandas as pd
tweets = pd.read_csv("TWEETSETNAME.csv") # Change TWEETSETNAME to the name of your CSV file
tweets['label'] = int(1)
from TweetNormalizer import normalizeTweet
tweets_column = "tweet" # Change TWEETSCOLUMN to the name of the column in your CSV with the text of the Tweets
tweets[tweets_column] = tweets[tweets_column].apply(normalizeTweet)
tweets.to_csv("tweets_to_predict.csv", index = False)
tweets[0:5].to_csv("tweets_to_predict_test.csv", index = False)

## Test Run

Running the full algorithm on all of your Tweets may take a while, so to make sure that everything is set up correctly run the cell below. If you get the first few rows of a dataframe at the end of the following output, you should be good!

In [None]:
from datasets import Dataset
test_tweets = pd.read_csv("tweets_to_predict_test.csv")
test_dataset = Dataset.from_pandas(test_tweets)

from sentiment_analysis import get_sentiment_predictions

test_tweets["label"] = get_sentiment_predictions(test_dataset[tweets_column])
test_tweets[[tweets_column, "label"]].head()

## Actual Run

If that displayed correctly, then run the following chunk! Your output should be saved as `tweets_predicted.csv`. It might take a while.

In [None]:
tweets = pd.read_csv("tweets_to_predict.csv")
dataset = Dataset.from_pandas(tweets)

tweets["label"] = get_sentiment_predictions(dataset[tweets_column])
tweets.to_csv("tweets_predicted.csv", index = False)