<p align = "center" draggable=”false” ><img src="https://user-images.githubusercontent.com/37101144/161836199-fdb0219d-0361-4988-bf26-48b0fad160a3.png"
     width="200px"
     height="auto"/>
</p>

# <h1 align="center" id="heading">Sentiment Analysis of Twitter Data</h1>

<hr>


### ☑️ Objectives
At the end of this session, you will be able to:
- [ ] Understand how to find and run pre-trained models
- [ ] Evaluate results from pre-trained models
- [ ] Run a pre-trained model using real twitter data


### 🔨 Pre-Assignment

Create a new Conda environment for sentiment anaylsis (sa)

```bash
  conda create -n sa python=3.8 jupyter -y
```

Activate your new environment
```bash
  conda activate sa
```

Open the jupyter-notebook
```bash
  jupyter-notebook
```

Navigate through the repo in the notebook to find `imports.ipynb` for this week and open it.

Run all of the cells in the notebook.


### Background
Please review the weekly narrative [here](https://www.notion.so/Week-2-Data-Centric-AI-the-AI-Product-Lifecycle-72a84c1517b44fcbb3e6bd11d47477dc#2b73937612bb46559f5b91dc2bf55e7d)




<hr>

## 🚀 Let's Get Started

Let's first start with our imports

In [1]:
import csv # Allows us to read and write csv files
from pprint import pprint # Make our print functions easier to read

from transformers import pipeline # Hugging face pipeline to load online models

🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.

These models can be applied on:
- 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.

- 🖼️ Images, for tasks like image classification, object detection, and segmentation.
- 🗣️ Audio, for tasks like speech recognition and audio classification.

This is the pipeline method in transformers that we'll be using to analyze our sentiment data. Since we're not specifying a pretrained model, the pipeline has a default sentiment analysis model called [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).

In [2]:
sentiment_pipeline = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

In this example, we'll supply two polar sentiments and test out the model pipeline.

In [3]:
data = ["This is great!", "Oh no!"]
sentiment_pipeline(data)

[{'label': 'POSITIVE', 'score': 0.9998694658279419},
 {'label': 'NEGATIVE', 'score': 0.994263231754303}]

The `label` in this case indicates the prediction for the sentiment type.

The `score` indicates the confidence of the prediction (between 0 and 1).

Since our sentiments were very polar, it was easier for the model to predict the sentiment type.

Let's see what happens when we use a less clear example:

In [4]:
challenging_sentiments = ["I don't think freddriq should leave, he's been helpful.",
                          "Is that the lake we went to last month?"]
sentiment_pipeline(challenging_sentiments)

[{'label': 'NEGATIVE', 'score': 0.9955561757087708},
 {'label': 'NEGATIVE', 'score': 0.9860844016075134}]

<hr>

### Loading the Twitter Data

Let's play with some twitter data. We'll be using a modified version of the [Elon Musk twitter dataset on Kaggle](https://www.kaggle.com/datasets/andradaolteanu/all-elon-musks-tweets).

In [5]:
with open('../data/elonmusk_tweets.csv', newline='', encoding='utf8') as f:
    tweets=[]
    reader = csv.reader(f)
    twitter_data = list(reader)
    for tweet in twitter_data:
        tweets.append(tweet[0])

pprint(tweets[:100])

['@vincent13031925 For now. Costs are decreasing rapidly.',
 'Love this beautiful shot',
 '@agnostoxxx @CathieDWood @ARKInvest Trust the shrub',
 'The art In Cyberpunk is incredible',
 '@itsALLrisky 🤣🤣',
 '@seinfeldguru @WholeMarsBlog Nope haha',
 '@WholeMarsBlog If you don’t say anything &amp; engage Autopilot, it will '
 'soon guess based on time of day, taking you home or to work or to what’s on '
 'your calendar',
 '@DeltavPhotos @PortCanaveral That rocket is a hardcore veteran of many '
 'missions',
 'Blimps rock  https://t.co/e8cu5FkNOI',
 '@engineers_feed Due to lower gravity, you can travel from surface of Mars to '
 'surface of Earth fairly easily with a single stage rocket. Earth to Mars is '
 'vastly harder.',
 '@DrPhiltill Good thread',
 '@alexellisuk Pretty much',
 '@tesla_adri @WholeMarsBlog These things are best thought of as '
 'probabilities. There are 5 forward-facing cameras. It is highly likely that '
 'at least one of them will see multiple cars ahead.',
 '@WholeMa

First things first - let's look at the sentiment as determined by the [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) (default model) in the pipeline.

In [6]:
distil_sentiment = sentiment_pipeline(tweets[0:100])

In [7]:
distil_sentiment

[{'label': 'NEGATIVE', 'score': 0.9963656663894653},
 {'label': 'POSITIVE', 'score': 0.9998824596405029},
 {'label': 'NEGATIVE', 'score': 0.84983229637146},
 {'label': 'POSITIVE', 'score': 0.9998857975006104},
 {'label': 'NEGATIVE', 'score': 0.9839497804641724},
 {'label': 'NEGATIVE', 'score': 0.9933285713195801},
 {'label': 'NEGATIVE', 'score': 0.9917682409286499},
 {'label': 'POSITIVE', 'score': 0.9983181953430176},
 {'label': 'NEGATIVE', 'score': 0.9937851428985596},
 {'label': 'NEGATIVE', 'score': 0.9840983748435974},
 {'label': 'POSITIVE', 'score': 0.9970496892929077},
 {'label': 'POSITIVE', 'score': 0.996302604675293},
 {'label': 'NEGATIVE', 'score': 0.9142526388168335},
 {'label': 'NEGATIVE', 'score': 0.9978026747703552},
 {'label': 'NEGATIVE', 'score': 0.9946601986885071},
 {'label': 'NEGATIVE', 'score': 0.9995997548103333},
 {'label': 'NEGATIVE', 'score': 0.9987119436264038},
 {'label': 'NEGATIVE', 'score': 0.9935503005981445},
 {'label': 'NEGATIVE', 'score': 0.998436868190765

Let's check out the distribution of positive/negative Tweets and see the breakdown using Python's 🐍 standard library `collections.Counter`!

In [8]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in distil_sentiment])
pos_sent_count = tweet_distro['POSITIVE']
neg_sent_count = tweet_distro['NEGATIVE']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

49 (49.00%) of the tweets classified are positive.
51 (51.00%) of the tweets classified are negative.


Let's do that process again, but use a model with an additional potential label "NEUTRAL" called [bertweet-sentiment-analysis](https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis)

To start - we'll build a pipeline with the new model by using the 🤗 Hugging Face address: `finiteautomata/bertweet-base-sentiment-analysis`

In [9]:
bertweet_pipeline = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Next, and the same as before, let's run the analysis on 100 of Elon's tweets.

In [10]:
bert_sentiment = bertweet_pipeline(tweets[0:100])

And then, let's check out the breakdown of positive, negative, AND neutral sentiments!

In [11]:
from collections import Counter

tweet_distro = Counter([x['label'] for x in bert_sentiment])
pos_sent_count = tweet_distro['POS']
neu_sent_count = tweet_distro['NEU']
neg_sent_count = tweet_distro['NEG']
total_sent_count = sum(tweet_distro.values())

print(f"{pos_sent_count} ({pos_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are positive.")
print(f"{neu_sent_count} ({neu_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are neutral.")
print(f"{neg_sent_count} ({neg_sent_count / total_sent_count * 100:.2f}%) of the tweets classified are negative.")

29 (29.00%) of the tweets classified are positive.
64 (64.00%) of the tweets classified are neutral.
7 (7.00%) of the tweets classified are negative.


❓ What do you notice about the difference in the results?
Significant uptick in neutral tweets percentage (64%) which were otherwise classified as positive or negative by distilbert.

❓ Do the results for the `bertweet-base` model look better, or worse, than the results for the `distilbert-base` model? Why?
Bertweet model certainly does better than distilbert in classifying tweets with the added neutral category, however certain instances like pure emojis (tweets #5: 🤣🤣- classified negative) and tweets "First @Neuralink product will enable someone with paralysis to use a smartphone with their mind faster than someone using thumbs" classified negative are up for debate.

In [21]:
i=0
for tweet in tweets[0:100]:
    pprint(bert_sentiment[i])
    print(tweet)
    i=i+1

{'label': 'NEU', 'score': 0.9523929953575134}
@vincent13031925 For now. Costs are decreasing rapidly.
{'label': 'POS', 'score': 0.9909942746162415}
Love this beautiful shot
{'label': 'NEU', 'score': 0.9733855128288269}
@agnostoxxx @CathieDWood @ARKInvest Trust the shrub
{'label': 'POS', 'score': 0.9824264049530029}
The art In Cyberpunk is incredible
{'label': 'NEG', 'score': 0.9627320766448975}
@itsALLrisky 🤣🤣
{'label': 'NEU', 'score': 0.8657803535461426}
@seinfeldguru @WholeMarsBlog Nope haha
{'label': 'NEU', 'score': 0.926353394985199}
@WholeMarsBlog If you don’t say anything &amp; engage Autopilot, it will soon guess based on time of day, taking you home or to work or to what’s on your calendar
{'label': 'NEU', 'score': 0.7412322759628296}
@DeltavPhotos @PortCanaveral That rocket is a hardcore veteran of many missions
{'label': 'POS', 'score': 0.6090269684791565}
Blimps rock  https://t.co/e8cu5FkNOI
{'label': 'NEU', 'score': 0.9455981254577637}
@engineers_feed Due to lower gravity, 

<hr>

### Partner Exercise

With your partner, try and determine what the following tweets might be classified as. Try to classify them into the same groups as both of the model pipelines we saw today - and try adding a few of your own sentences/Tweets! 

In [14]:
example_difficult_tweets = [
    "Kong vs Godzilla has record for most meth ever consumed in a writer's room",
    "@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.",
    "Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.",
    "Google is training its robots to be more like humans",
    "U.S. is at 'effectively peak employment,' bringing hot wage growth into focus",
]

The `distilbert-base` model:

In [15]:
for tweet in example_difficult_tweets[0:1000]:
    pprint(sentiment_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'POSITIVE', 'score': 0.5429084897041321}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEGATIVE', 'score': 0.6348373889923096}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'POSITIVE', 'score': 0.9419695138931274}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEGATIVE', 'score': 0.9967387318611145}]
Google is training its robots to be more like humans

[{'label': 'POSITIVE', 'score': 0.9993935823440552}]
U.S. is at 'effectively peak employment,' bringing hot wage growth into focus



The `bertweet-base` model:

In [16]:
for tweet in example_difficult_tweets[0:1000]:
    pprint(bertweet_pipeline(tweet))
    print(tweet + '\n')

[{'label': 'NEG', 'score': 0.7213014960289001}]
Kong vs Godzilla has record for most meth ever consumed in a writer's room

[{'label': 'NEU', 'score': 0.8023842573165894}]
@ashleevance Battery energy density is the key to electric aircraft. Autonomy for aircraft could have been done a long time ago. Modern airliners are very close to autonomous.

[{'label': 'NEU', 'score': 0.8843539357185364}]
Tesla's action is not directly reflective of my opinion. Having some Bitcoin, which is simply a less dumb form of liquidity than cash, is adventurous enough for an S&P500 company.

[{'label': 'NEU', 'score': 0.8345044255256653}]
Google is training its robots to be more like humans

[{'label': 'NEG', 'score': 0.5765505433082581}]
U.S. is at 'effectively peak employment,' bringing hot wage growth into focus



❓ How did you do? Did you find any surprising results?
❓ Are there any instances where the two models gave different predictions for the same tweet?
The differences between the models in this small sample are astounding. Distilbert picked up 'peak employment' as positive while bert classified this tweet to be negative drawing more focus to the fact that 'hot wage growth' is being studied. Along similar lines, distilbert understood "most meth ever consumed in a writer's room" to be positive while bert understood the true intent of this not being positive.
Bert does marginally better in understanding the sentiment of tweets.