<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png"
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Twitter Data</h1>

<hr>


### ☑️ Objectives
At the end of this session, you will be able to:
- [ ] Understand how to find and run pre-trained models
- [ ] Evaluate results from pre-trained models
- [ ] Run a pre-trained model using real twitter data


### 🔨 Pre-Assignment

Create a new Conda environment for sentiment anaylsis (sa)

```bash
  conda create -n sa python=3.8 jupyter -y
```

Activate your new environment
```bash
  conda activate sa
```

Open the jupyter-notebook
```bash
  jupyter-notebook
```

Navigate through the repo in the notebook to find `imports.ipynb` for this week and open it.

Run all of the cells in the notebook.


### Background
Please review the weekly narrative [here](https://www.notion.so/Week-2-Data-Centric-AI-the-AI-Product-Lifecycle-72a84c1517b44fcbb3e6bd11d47477dc#2b73937612bb46559f5b91dc2bf55e7d)




<hr>

## 🚀 Let's Get Started

Let's first start with our imports

In [1]:
import csv # Allows us to read and write csv files
from pprint import pprint # Make our print functions easier to read

from transformers import pipeline # Hugging face pipeline to load online models

🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

These models can be applied on:
- 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.

- 🖼️ Images, for tasks like image classification, object detection, and segmentation.
- 🗣️ Audio, for tasks like speech recognition and audio classification.

This is the pipeline method in transformers that we'll be using to analyze our sentiment data. Since we're not specifying a pretrained model, the pipeline has a default sentiment analysis model called [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).

In [2]:
sentiment_pipeline = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In this example, we'll supply two polar sentiments and test out the model pipeline.

In [3]:
data = ["This is great!", "Oh no!"]
sentiment_pipeline(data)

[{'label': 'POSITIVE', 'score': 0.9998694658279419},
 {'label': 'NEGATIVE', 'score': 0.994263231754303}]

The `label` in this case indicates the prediction for the sentiment type.

The `score` indicates the confidence of the prediction (between 0 and 1).

Since our sentiments were very polar, it was easier for the model to predict the sentiment type.

Let's see what happens when we use a less clear example:

In [4]:
challenging_sentiments = ["I don't think freddriq should leave, he's been helpful.",
                          "Is that the lake we went to last month?"]
sentiment_pipeline(challenging_sentiments)

[{'label': 'NEGATIVE', 'score': 0.9955562949180603},
 {'label': 'NEGATIVE', 'score': 0.9860844016075134}]

<hr>

### Loading the Twitter Data

Let's play with some twitter data. We'll be using a modified version of the [Elon Musk twitter dataset on Kaggle](https://www.kaggle.com/datasets/andradaolteanu/all-elon-musks-tweets).

In [5]:
with open('../data/elonmusk_tweets.csv', newline='', encoding='utf8') as f:
    tweets=[]
    reader = csv.reader(f)
    twitter_data = list(reader)
    for tweet in twitter_data:
        tweets.append(tweet[0])

pprint(tweets[:100])

['@vincent13031925 For now. Costs are decreasing rapidly.',
 'Love this beautiful shot',
 '@agnostoxxx @CathieDWood @ARKInvest Trust the shrub',
 'The art In Cyberpunk is incredible',
 '@itsALLrisky 🤣🤣',
 '@seinfeldguru @WholeMarsBlog Nope haha',
 '@WholeMarsBlog If you don’t say anything &amp; engage Autopilot, it will '
 'soon guess based on time of day, taking you home or to work or to what’s on '
 'your calendar',
 '@DeltavPhotos @PortCanaveral That rocket is a hardcore veteran of many '
 'missions',
 'Blimps rock  https://t.co/e8cu5FkNOI',
 '@engineers_feed Due to lower gravity, you can travel from surface of Mars to '
 'surface of Earth fairly easily with a single stage rocket. Earth to Mars is '
 'vastly harder.',
 '@DrPhiltill Good thread',
 '@alexellisuk Pretty much',
 '@tesla_adri @WholeMarsBlog These things are best thought of as '
 'probabilities. There are 5 forward-facing cameras. It is highly likely that '
 'at least one of them will see multiple cars ahead.',
 '@WholeMa

First things first - let's look at the sentiment as determined by the [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) (default model) in the pipeline.

In [6]:
distil_sentiment = sentiment_pipeline(tweets[0:100])

Let's check out the distribution of positive/negative Tweets and see the breakdown using Python's 🐍 standard library `collections.Counter`!

In [7]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in distil_sentiment])
pos_sent_count = tweet_distro['POSITIVE']
neg_sent_count = tweet_distro['NEGATIVE']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

49 (49.00%) of the tweets classified are positive.
51 (51.00%) of the tweets classified are negative.


Let's do that process again, but use a model with an additional potential label "NEUTRAL" called [bertweet-sentiment-analysis](https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis)

To start - we'll build a pipeline with the new model by using the 🤗 Hugging Face address: `finiteautomata/bertweet-base-sentiment-analysis`

In [8]:
bertweet_pipeline = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Next, and the same as before, let's run the analysis on 100 of Elon's tweets.

In [9]:
bert_sentiment = bertweet_pipeline(tweets[0:100])

And then, let's check out the breakdown of positive, negative, AND neutral sentiments!

In [10]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in bert_sentiment])
pos_sent_count = tweet_distro['POS']
neu_sent_count = tweet_distro['NEU']
neg_sent_count = tweet_distro['NEG']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neu_sent_count} ({neu_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are neutral.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

29 (29.00%) of the tweets classified are positive.
64 (64.00%) of the tweets classified are neutral.
7 (7.00%) of the tweets classified are negative.


❓ What do you notice about the difference in the results? 

❓ Do the results for the `bertweet-base` model look better, or worse, than the results for the `distilbert-base` model? Why?

## *Answer:*
Using the model *bertweet-sentiment-analysis*, which allows for an additional class of tweets with label *NEUTRAL* (as compared to the first, default sentiment analysis model *distilbert-base-uncased-finetuned-sst-2-english*) has the following notable effects:
- Most tweets (64%) are labeled *NEUTRAL* (a class that didn't exist with the first model).
- % tweets labeled *POSITIVE* decreases from 49% in the first model to 29% with *bertweet-base*.
- % tweets labeled *NEGATIVE* decreases sharply from 51% in the first model to a mere 7% with *bertweet*.

*Does bertweet-base look like it gave better results than distilbert-base?* To answer this, let's take a look at the labeled tweets:

In [11]:
print(bert_sentiment[0:5])

[{'label': 'NEU', 'score': 0.9523929953575134}, {'label': 'POS', 'score': 0.9909942746162415}, {'label': 'NEU', 'score': 0.9733855128288269}, {'label': 'POS', 'score': 0.9824264049530029}, {'label': 'NEG', 'score': 0.9627320766448975}]


In [12]:
print(distil_sentiment[0:5])

[{'label': 'NEGATIVE', 'score': 0.9963656663894653}, {'label': 'POSITIVE', 'score': 0.9998824596405029}, {'label': 'NEGATIVE', 'score': 0.8498326539993286}, {'label': 'POSITIVE', 'score': 0.9998857975006104}, {'label': 'NEGATIVE', 'score': 0.9839497804641724}]


In [21]:
# tweets as classified by distil_sentiment:
num_tweets = 100
pos_distil = []
neg_distil = []

for i in range(num_tweets):
    if distil_sentiment[i]['label']=='POSITIVE':
        pos_distil.append(f'{i:3d} POSITIVE {tweets[i]}')
    else:
        neg_distil.append(f'{i:3d} NEGATIVE {tweets[i]}')
        
print('# tweets distil_sentiment identified as POSITIVE: ', len(pos_distil), 
      *pos_distil, sep='\n')
print('\n\n# tweets distil_sentiment identified as NEGATIVE: ', len(neg_distil),
      *neg_distil, sep='\n')

# tweets distil_sentiment identified as POSITIVE: 
49
  1 POSITIVE Love this beautiful shot
  3 POSITIVE The art In Cyberpunk is incredible
  7 POSITIVE @DeltavPhotos @PortCanaveral That rocket is a hardcore veteran of many missions
 10 POSITIVE @DrPhiltill Good thread
 11 POSITIVE @alexellisuk Pretty much
 19 POSITIVE Kong vs Godzilla has record for most meth ever consumed in a writer’s room
 21 POSITIVE … going to moon very soon
 23 POSITIVE @TimBirks1 @Erdayastronaut @SpaceX Pretty much
 24 POSITIVE @memescryptor !
 28 POSITIVE @teslaownersSV @neuralink Turns out 🐒 love video games &amp; snacks just like us!
 30 POSITIVE @chicago_glenn I feel like this sometimes
 31 POSITIVE @OwenSparks_ @WholeMarsBlog It will
 33 POSITIVE @w00ki33 @SpaceX @SuperclusterHQ Simulation is improving rendering resolution  …
 36 POSITIVE Thanks to all that helped SpaceX!
 37 POSITIVE Just read it. Book is accurate.
 38 POSITIVE @TeslaGong Yeah
 39 POSITIVE @mikevanbus @TrungTPhan @neuralink Pretty much
 4

In [24]:
# tweets as classified by bert_sentiment:
num_tweets = 100
pos_bert = []
neu_bert = []
neg_bert = []

for j in range(num_tweets):
    if bert_sentiment[j]['label']=='POS':
        pos_bert.append(f'{j:3d} POS {tweets[j]}')
    elif bert_sentiment[j]['label']=='NEU':
        neu_bert.append(f'{j:3d} NEU {tweets[j]}')
    else:
        neg_bert.append(f'{j:3d} NEG {tweets[j]}')
        
print('# tweets bert_sentiment identified as POS: ', len(pos_bert),
      *pos_bert, sep='\n')
print('\n\n# tweets bert_sentiment identified as NEU: ', len(neu_bert),
      *neu_bert, sep='\n')
print('\n\n# tweets bert_sentiment identified as NEG: ', len(neg_bert),
      *neg_bert, sep='\n')

# tweets bert_sentiment identified as POS: 
29
  1 POS Love this beautiful shot
  3 POS The art In Cyberpunk is incredible
  8 POS Blimps rock  https://t.co/e8cu5FkNOI
 10 POS @DrPhiltill Good thread
 24 POS @memescryptor !
 26 POS @AustinTeslaClub @OwenSparks_ @WholeMarsBlog Good point.   Next major software rev will do much better with automating wipers, seat heating &amp; defrost.   Probable seat settings just based on occupant mass distribution should be possible.
 28 POS @teslaownersSV @neuralink Turns out 🐒 love video games &amp; snacks just like us!
 33 POS @w00ki33 @SpaceX @SuperclusterHQ Simulation is improving rendering resolution  …
 34 POS @cleantechnica Congrats to NIO. That is a tough milestone.
 36 POS Thanks to all that helped SpaceX!
 37 POS Just read it. Book is accurate.
 41 POS Soon our monkey will be on twitch &amp; discord haha
 42 POS @thenewsoncnbc @contessabrewer Good piece!
 44 POS @TarekWaked @TechCrunch @etherington Pretty much 🤣🤣 Great episode!
 48 POS @Ihe

### *Answer (continued)...*
A quick read of the above results, comparing *distil_sentiment* & *bert_sentiment*, suggests that *bert_sentiment* did much better (I agreed with most of *bert's* labels):
- Adding a 'Neutral' class in *bert* pulled ~80% of tweets labeled as 'Negative' by *distil_sentiment* into 'Neutral' (40/51). About 50% of tweets labeled as 'Positive' by *destil_sentiment* were re-classified as 'Neutral' (24/49). 
- Still, the default *distilbert* model got most of the *POSITIVE* and *NEGATIVE* tweets right i.e., few of the tweets re-classified by *bertweet* represented *a switch to the opposite sentiment, viz.* from *POS to NEG* (only 1/49) or *NEG to POS* (only 5/64).
- Not surprisingly, *Distilbert* had trouble with *NEUTRAL* tweets (having no place to put these!). In this sample, it seems to prefer labeling these as *NEGATIVE*. 
- ***Adding the NEUTRAL label appears to significantly improve results compared with forcing the restrictive binary output of distilbert.***

Of relevance, Hugging Face notes that *distilbert-base-uncased-finetuned-sst-2-english* was fine-tuned on SST-2, a corpus of >10K single sentences extracted from movie reviews. In addition, *distilbert* is intended to be fine-tuned on a downstream task (in this case, tweets), although its precision, recall, accuracy, & F1 metrics approach 99% when tested against subsets of the SST-2 data.

By contrast, Hugging Face's *bertweet-base-sentiment-analysis* model was trained on a 2017 corpus of ~40K tweets, presumably much more similar to our sample than *distilbert's* movie reviews. Together with the addition of a 3rd *NEUTRAL* class, this may help explain *bertweet's* superior performance here. (NB: When reviewing the above data, *bertweet* at times creates the impression of having a rather un-machine-like, nuanced 'understanding' of Elon's tweets!)

<hr>

### Partner Exercise

With your partner, try and determine what the following tweets might be classified as. Try to classify them into the same groups as both of the model pipelines we saw today - and try adding a few of your own sentences/Tweets! 

In [32]:
example_difficult_tweets = [
    "Kong vs Godzilla has record for most meth ever consumed in a writer's room",
    "@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.",
    "Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.",
    "Don’t use a calculator, use your 🧠",
    "Is there an element that comes before hydrogen 🤨?",
    "According to the Mass Effect series, there is Element Zero, also called Eezo. It doesnt exist in real life.",
    "The Void",
    "Im Westen nichts neues.",
    "Dulce et decorum est pro patria mori.",
    "Je te serre la main.",
    "At the still point of the turning world. Neither flesh nor fleshless; Neither from nor towards; at the still point, where the dance is.",
    "Tweet your reply",
    "Value creation is proportional to impact, not difficulty or complexity.",
    "Hope you have a great day!",
    "...could be worse.",
    "I think it is worse.",
    "For our momentary suffering is producing for us an eternal weight of glory.",
    "All quiet on the western front.",
    "How sweet and fitting to die for one's country.",
    "With a firm handshake.",
    "Regards",
    "Warm regards",
    "Warmest regards",
    "Best regards",
    "All the best",
    "All the very best",
    "All the very best.",
    "All the very best!",
    "@All the very best!",
    "My very best,",
    "My very best!"
]

The `distilbert-base` model:

In [33]:
for tweet in example_difficult_tweets[0:1000]:
    pprint(sentiment_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'POSITIVE', 'score': 0.5429093837738037}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEGATIVE', 'score': 0.6348387598991394}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'POSITIVE', 'score': 0.9419690370559692}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEGATIVE', 'score': 0.9979800581932068}]
Don’t use a calculator, use your 🧠

[{'label': 'NEGATIVE', 'score': 0.9768679141998291}]
Is there an element that comes before hydrogen 🤨?

[{'label': 'NEGATIVE', 'score': 0.9997283816337585}]
According to the Mass Effect series, there is Element Zero, also called Eezo. It doesnt exist in real life.

[{'label': 'NEGATIVE', 'score': 0.9991528987884521}]

The `bertweet-base` model:

In [34]:
for tweet in example_difficult_tweets[0:1000]:
    pprint(bertweet_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'NEG', 'score': 0.7213014364242554}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEU', 'score': 0.8023841977119446}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'NEU', 'score': 0.8843539357185364}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEU', 'score': 0.8713212609291077}]
Don’t use a calculator, use your 🧠

[{'label': 'NEU', 'score': 0.9143614768981934}]
Is there an element that comes before hydrogen 🤨?

[{'label': 'NEU', 'score': 0.9245191812515259}]
According to the Mass Effect series, there is Element Zero, also called Eezo. It doesnt exist in real life.

[{'label': 'NEU', 'score': 0.9685735106468201}]
The Void

[{'label': 'NEU', 'score

❓ How did you do? Did you find any surprising results? 

❓ Are there any instances where the two models gave different predictions for the same tweet?

### *Answer (continued)...*

Here again, *bertweet* seems to outperform *distilbert-base*, with a significantly larger 'Neutral' pool on the small sample above. The 'Neutral' label appears to be more relevant to sentiment analysis of tweets than one might have anticipated, and is a good reminder of the importance of looking at the data in model design.

As noted previously, the degree of similarity between training and test data is a critical determinant of model performance, with *bertweet* being trained on tweets (although arguably, no amount of training can anticipate Elon!!), while the default *distilbert* was trained on movie reviews. Nonetheless, both models show generalizability beyond their respective training domains, as suggested by the small sample of 'difficult tweets' above, which also included e.g. non-tweets: short phrases in other languages, common email signoffs (e.g. 'Best regards,'), informational text, proverbs/aphorisms, and poetry.