<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png"
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Twitter Data</h1>

<hr>


### ☑️ Objectives
At the end of this session, you will be able to:
- [ ] Understand how to find and run pre-trained models
- [ ] Evaluate results from pre-trained models
- [ ] Run a pre-trained model using real twitter data


### 🔨 Pre-Assignment

Create a new Conda environment for sentiment anaylsis (sa)

```bash
  conda create -n sa python=3.8 jupyter -y
```

Activate your new environment
```bash
  conda activate sa
```

Open the jupyter-notebook
```bash
  jupyter-notebook
```

Navigate through the repo in the notebook to find `imports.ipynb` for this week and open it.

Run all of the cells in the notebook.


### Background
Please review the weekly narrative [here](https://www.notion.so/Week-2-Data-Centric-AI-the-AI-Product-Lifecycle-72a84c1517b44fcbb3e6bd11d47477dc#2b73937612bb46559f5b91dc2bf55e7d)




<hr>

## 🚀 Let's Get Started

Let's first start with our imports

In [109]:
!pip install transformers tensorflow torch pandas numpy matplotlib seaborn



In [1]:
import csv # Allows us to read and write csv files
from pprint import pprint # Make our print functions easier to read

from transformers import pipeline # Hugging face pipeline to load online models

2022-12-11 21:34:19.312712: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

These models can be applied on:
- 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.

- 🖼️ Images, for tasks like image classification, object detection, and segmentation.
- 🗣️ Audio, for tasks like speech recognition and audio classification.

This is the pipeline method in transformers that we'll be using to analyze our sentiment data. Since we're not specifying a pretrained model, the pipeline has a default sentiment analysis model called [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).

In [2]:
sentiment_pipeline = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In this example, we'll supply two polar sentiments and test out the model pipeline.

In [3]:
data = ["This is great!", "Oh no!"]
sentiment_pipeline(data)

[{'label': 'POSITIVE', 'score': 0.9998694658279419},
 {'label': 'NEGATIVE', 'score': 0.994263231754303}]

The `label` in this case indicates the prediction for the sentiment type.

The `score` indicates the confidence of the prediction (between 0 and 1).

Since our sentiments were very polar, it was easier for the model to predict the sentiment type.

Let's see what happens when we use a less clear example:

In [4]:
challenging_sentiments = ["I don't think freddriq should leave, he's been helpful.",
                          "Is that the lake we went to last month?"]
sentiment_pipeline(challenging_sentiments)

[{'label': 'NEGATIVE', 'score': 0.9955562949180603},
 {'label': 'NEGATIVE', 'score': 0.9860844016075134}]

<hr>

### Loading the Twitter Data

Let's play with some twitter data. We'll be using a modified version of the [Elon Musk twitter dataset on Kaggle](https://www.kaggle.com/datasets/andradaolteanu/all-elon-musks-tweets).

In [5]:
with open('../data/elonmusk_tweets.csv', newline='', encoding='utf8') as f:
    tweets=[]
    reader = csv.reader(f)
    twitter_data = list(reader)
    for tweet in twitter_data:
        tweets.append(tweet[0])

# pprint(tweets[:100])

['@vincent13031925 For now. Costs are decreasing rapidly.',
 'Love this beautiful shot',
 '@agnostoxxx @CathieDWood @ARKInvest Trust the shrub',
 'The art In Cyberpunk is incredible',
 '@itsALLrisky 🤣🤣',
 '@seinfeldguru @WholeMarsBlog Nope haha',
 '@WholeMarsBlog If you don’t say anything &amp; engage Autopilot, it will '
 'soon guess based on time of day, taking you home or to work or to what’s on '
 'your calendar',
 '@DeltavPhotos @PortCanaveral That rocket is a hardcore veteran of many '
 'missions',
 'Blimps rock  https://t.co/e8cu5FkNOI',
 '@engineers_feed Due to lower gravity, you can travel from surface of Mars to '
 'surface of Earth fairly easily with a single stage rocket. Earth to Mars is '
 'vastly harder.',
 '@DrPhiltill Good thread',
 '@alexellisuk Pretty much',
 '@tesla_adri @WholeMarsBlog These things are best thought of as '
 'probabilities. There are 5 forward-facing cameras. It is highly likely that '
 'at least one of them will see multiple cars ahead.',
 '@WholeMa

First things first - let's look at the sentiment as determined by the [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) (default model) in the pipeline.

In [6]:
distil_sentiment = sentiment_pipeline(tweets[0:100])

Let's check out the distribution of positive/negative Tweets and see the breakdown using Python's 🐍 standard library `collections.Counter`!

In [7]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in distil_sentiment])
pos_sent_count = tweet_distro['POSITIVE']
neg_sent_count = tweet_distro['NEGATIVE']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

49 (49.00%) of the tweets classified are positive.
51 (51.00%) of the tweets classified are negative.


Let's do that process again, but use a model with an additional potential label "NEUTRAL" called [bertweet-sentiment-analysis](https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis)

To start - we'll build a pipeline with the new model by using the 🤗 Hugging Face address: `finiteautomata/bertweet-base-sentiment-analysis`

In [8]:
bertweet_pipeline = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

Downloading:   0%|          | 0.00/890 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/540M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/295 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/843k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/17.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/150 [00:00<?, ?B/s]

emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0


Next, and the same as before, let's run the analysis on 100 of Elon's tweets.

In [9]:
bert_sentiment = bertweet_pipeline(tweets[0:100])

And then, let's check out the breakdown of positive, negative, AND neutral sentiments!

In [10]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in bert_sentiment])
pos_sent_count = tweet_distro['POS']
neu_sent_count = tweet_distro['NEU']
neg_sent_count = tweet_distro['NEG']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neu_sent_count} ({neu_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are neutral.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

29 (29.00%) of the tweets classified are positive.
64 (64.00%) of the tweets classified are neutral.
7 (7.00%) of the tweets classified are negative.


In [53]:
print(sentiment_pipeline('The movie was filmed in black and white.'))
print(sentiment_pipeline('The movie was filmed in color.'))

[{'label': 'NEGATIVE', 'score': 0.9857599139213562}]
[{'label': 'POSITIVE', 'score': 0.9532566666603088}]


In [49]:
bertweet_pipeline('The movie was filmed in black and white.')

[{'label': 'NEU', 'score': 0.9514453411102295}]

In [13]:
tweets_filtered = tweets[:100]

In [23]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

df = pd.DataFrame(tweets_filtered, columns=['tweet'])
df['label_distil'] = [x['label'] for x in distil_sentiment]
df['score_distil'] = [x['score'] for x in distil_sentiment]
df['label_bert'] = [x['label'] for x in bert_sentiment]
df['score_bert'] = [x['score'] for x in bert_sentiment]

In [24]:
df.loc[df['label_bert'] == 'NEG', :].sort_values(by=['score_bert'], ascending=False)

Unnamed: 0,tweet,label_distil,score_distil,label_bert,score_bert
15,@WholeMarsBlog This is a major problem!,NEGATIVE,0.9996,NEG,0.980315
4,@itsALLrisky 🤣🤣,NEGATIVE,0.98395,NEG,0.962732
20,@itsALLrisky 💯,NEGATIVE,0.98395,NEG,0.962206
96,@GerberKawasaki I fried a lot of neurons on that problem!,NEGATIVE,0.992092,NEG,0.960584
18,But wait how is the core of the earth lit by the sun? Stop asking questions!!,NEGATIVE,0.998437,NEG,0.805101
19,Kong vs Godzilla has record for most meth ever consumed in a writer’s room,POSITIVE,0.621601,NEG,0.763041
51,First @Neuralink product will enable someone with paralysis to use a smartphone with their mind faster than someone using thumbs,NEGATIVE,0.995486,NEG,0.760998


In [26]:
df.loc[(df['label_bert'] == 'NEU')
       & (df['label_distil'] == 'NEGATIVE'), :].sort_values(by=['score_distil'], ascending=True)

Unnamed: 0,tweet,label_distil,score_distil,label_bert,score_bert
74,"@ID_AA_Carmack Some kind of ELO level, updated once or twice a year based on what someone actually got done, might be most effective. Important that it go both up *and* down.",NEGATIVE,0.539766,NEU,0.720872
56,"@jordanxmajel @WatchersTank @SpaceX Shock absorption is built into tower arms. Since tower is ground side, it can use a lot more mass to arrest booster downward momentum.",NEGATIVE,0.679888,NEU,0.952736
82,@EvaFoxU Last Kingdom vs Vikings,NEGATIVE,0.761927,NEU,0.977231
80,"The Earth is not flat, it’s a hollow globe &amp; Donkey King lives there!",NEGATIVE,0.809989,NEU,0.954809
2,@agnostoxxx @CathieDWood @ARKInvest Trust the shrub,NEGATIVE,0.849833,NEU,0.973386
60,@MarkJam93765764 @IvanEscobosa A tidal wave of vaccine is being produced!,NEGATIVE,0.898113,NEU,0.908238
12,@tesla_adri @WholeMarsBlog These things are best thought of as probabilities. There are 5 forward-facing cameras. It is highly likely that at least one of them will see multiple cars ahead.,NEGATIVE,0.914252,NEU,0.933398
62,@IvanEscobosa Latter,NEGATIVE,0.9268,NEU,0.978084
73,@CathieDWood @wintonARK @ARKInvest What do you think of the unusually high ratio of S&amp;P market cap to GDP?,NEGATIVE,0.940365,NEU,0.945433
27,@Adamklotz_ @OwenSparks_ @WholeMarsBlog Yup,NEGATIVE,0.949726,NEU,0.932738


❓ What do you notice about the difference in the results? 

![image](../deliverables/bert-distil-negative-mismatch.png "A title is required")

❓ Do the results for the `bertweet-base` model look better, or worse, than the results for the `distilbert-base` model? Why?

> The results picked up from `bertweet-base` model look better because the `distilbert-base` is not trained on data that is specific to the source and purpose of twitting. In short, we think that Twitter has its own language which differs from the common use of the English language. Despite both models are trained for the same task, it seems that `distilbert-base` has been trained on data that might have context biased evaluations as noted by the developer of the model. 

<hr>

### Partner Exercise

With your partner, try and determine what the following tweets might be classified as. Try to classify them into the same groups as both of the model pipelines we saw today - and try adding a few of your own sentences/Tweets! 

In [106]:
example_difficult_tweets = [
    "Kong vs Godzilla has record for most meth ever consumed in a writer's room",
    "@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.",
    "Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.",
    "We could replace people with robots.",
    "We should replace people with robots.",
    "We will replace people with robots.",
    "We will replace people with robots! 🤣",
    # "Elden Ring is not a copy of Skyrim.",
    # "Elden Ring is a copy of Skyrim.",
    # "Austin bats are wonderful.",
    # "Austin bats are worth watching.",
    # "Jeff Bezos is better than Elon Musk.",
    "Snowboarding is fun and risky.",
    "Snowboarding is fun but risky.",
    "Snowboarding is fun but dangerous.",
    # "Why liberal Washington can't quit Twitter",
    # "Until there is a viable alternative, I will be at Twitter and you will have to pry my fingers from my phone.",
    # "Humans are simply not built for email.",
]

The `distilbert-base` model:

In [107]:
for tweet in example_difficult_tweets:
    pprint(sentiment_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'POSITIVE', 'score': 0.5429078936576843}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEGATIVE', 'score': 0.634838342666626}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'POSITIVE', 'score': 0.9419694542884827}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEGATIVE', 'score': 0.9975510239601135}]
We could replace people with robots.

[{'label': 'NEGATIVE', 'score': 0.998619794845581}]
We should replace people with robots.

[{'label': 'NEGATIVE', 'score': 0.987613320350647}]
We will replace people with robots.

[{'label': 'NEGATIVE', 'score': 0.9685631394386292}]
We will replace people with robots! 🤣

[{'label': 'POSITIVE', 'score': 0.99973469972

The `bertweet-base` model:

In [108]:
for tweet in example_difficult_tweets:
    pprint(bertweet_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'NEG', 'score': 0.7213016152381897}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEU', 'score': 0.8023841977119446}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'NEU', 'score': 0.8843539953231812}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEG', 'score': 0.7293974161148071}]
We could replace people with robots.

[{'label': 'NEG', 'score': 0.6872692704200745}]
We should replace people with robots.

[{'label': 'NEG', 'score': 0.6957011222839355}]
We will replace people with robots.

[{'label': 'NEG', 'score': 0.8789107203483582}]
We will replace people with robots! 🤣

[{'label': 'POS', 'score': 0.9884705543518066}]
Snowboarding is fun and risky

❓ How did you do? Did you find any surprising results? 

The word `but` is one of the most heavy indicators of negative sentiment. Curiously, this was only picked up by the `bertweet-base` model.

❓ Are there any instances where the two models gave different predictions for the same tweet?

> Definitely, we found different predictions for edge cases such as those that had combinations of neutral to positive to negative connotation. E.g. "Snowboarding is fun but dangerous."