Scraping data from Twitter with Snscrape and then doing sentiment analysis with transformers on tweets about WWIII 

### Installing the Snscrape Library

In [1]:
# pip install snscrape

Collecting snscrape
  Downloading snscrape-0.3.4-py3-none-any.whl (35 kB)
Installing collected packages: snscrape
Successfully installed snscrape-0.3.4
[0m

### Importing the necessary Libraries

In [3]:
!pip install seaborn

Collecting seaborn
  Using cached seaborn-0.12.1-py3-none-any.whl (288 kB)
Installing collected packages: seaborn
Successfully installed seaborn-0.12.1


You should consider upgrading via the 'C:\Users\jonch\code\scrape-social-medias\venv\Scripts\python.exe -m pip install --upgrade pip' command.


In [3]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import itertools
import snscrape.modules.twitter as sntwitter
import plotly.graph_objects as go
from datetime import datetime

In [4]:
start_time = datetime.now()

#Creating dataframe called 'data' and storing the tweets from May 1st 2020 to 4th August 2022 for 'WWIII'

data = pd.DataFrame(itertools.islice(sntwitter.TwitterSearchScraper(
    '"#stablediffusion since:2020-10-20 until:2022-10-26"').get_items(), 100))
end_time = datetime.now()

# Printing the time taken to scrape these tweets
print('Duration: {} '.format(end_time - start_time))

Duration: 0:00:03.595367 


In [5]:
data.to_csv('tweets_sentiment_hashtag.csv')
data.head()

Unnamed: 0,url,date,content,renderedContent,id,user,replyCount,retweetCount,likeCount,quoteCount,...,media,retweetedTweet,quotedTweet,inReplyToTweetId,inReplyToUser,mentionedUsers,coordinates,place,hashtags,cashtags
0,https://twitter.com/thecastledking/status/1585...,2022-10-25 23:58:18+00:00,"""Lovecraft's Treehouse of Horrors"" - made with...","""Lovecraft's Treehouse of Horrors"" - made with...",1585058200121851905,"{'username': 'thecastledking', 'id': 246475182...",0,0,0,0,...,[{'previewUrl': 'https://pbs.twimg.com/media/F...,,,,,"[{'username': 'NightcafeStudio', 'id': 1226012...",,,"[aiart, nightcafe, digitalart, art, artwork, a...",
1,https://twitter.com/harithra86/status/15850580...,2022-10-25 23:57:48+00:00,Hope these creatures spread the good vibes of ...,Hope these creatures spread the good vibes of ...,1585058074745790464,"{'username': 'harithra86', 'id': 425924082, 'd...",0,0,0,0,...,[{'previewUrl': 'https://pbs.twimg.com/media/F...,,,,,,,,"[aiart, generativeart, aigenerativeart, automa...",
2,https://twitter.com/arrbeesound/status/1585057...,2022-10-25 23:55:54+00:00,World's first Ai Music Video! Zombies in Littl...,World's first Ai Music Video! Zombies in Littl...,1585057595215466499,"{'username': 'arrbeesound', 'id': 862909217711...",2,0,7,4,...,,,,,,"[{'username': 'Sa1ntDenis', 'id': 102948349338...",,,"[deforum, stablediffusion, ai, aiart]",
3,https://twitter.com/thecastledking/status/1585...,2022-10-25 23:55:50+00:00,"""Cornhouse"" - made with @NightCafeStudio \n\nh...","""Cornhouse"" - made with @NightCafeStudio \n\nc...",1585057581315158016,"{'username': 'thecastledking', 'id': 246475182...",0,0,0,0,...,[{'previewUrl': 'https://pbs.twimg.com/media/F...,,,,,"[{'username': 'NightcafeStudio', 'id': 1226012...",,,"[aiart, nightcafe, digitalart, art, artwork, a...",
4,https://twitter.com/thecastledking/status/1585...,2022-10-25 23:53:28+00:00,"""A Technicolor Farmhouse"" - made with @NightCa...","""A Technicolor Farmhouse"" - made with @NightCa...",1585056983199404032,"{'username': 'thecastledking', 'id': 246475182...",0,0,0,0,...,[{'previewUrl': 'https://pbs.twimg.com/media/F...,,,,,"[{'username': 'NightcafeStudio', 'id': 1226012...",,,"[aiart, nightcafe, digitalart, art, artwork, a...",


In [6]:
data.shape, data.columns

((100, 27),
 Index(['url', 'date', 'content', 'renderedContent', 'id', 'user', 'replyCount',
        'retweetCount', 'likeCount', 'quoteCount', 'conversationId', 'lang',
        'source', 'sourceUrl', 'sourceLabel', 'outlinks', 'tcooutlinks',
        'media', 'retweetedTweet', 'quotedTweet', 'inReplyToTweetId',
        'inReplyToUser', 'mentionedUsers', 'coordinates', 'place', 'hashtags',
        'cashtags'],
       dtype='object'))

In [7]:
media_ls = data['media'].to_list()
media_ls

[[{'previewUrl': 'https://pbs.twimg.com/media/Ff9CCG8WAAA0NmB?format=jpg&name=small',
   'fullUrl': 'https://pbs.twimg.com/media/Ff9CCG8WAAA0NmB?format=jpg&name=large'}],
 [{'previewUrl': 'https://pbs.twimg.com/media/Ff9B6wLX0AAz29t?format=jpg&name=small',
   'fullUrl': 'https://pbs.twimg.com/media/Ff9B6wLX0AAz29t?format=jpg&name=large'}],
 None,
 [{'previewUrl': 'https://pbs.twimg.com/media/Ff9BeFdWYAAI85b?format=jpg&name=small',
   'fullUrl': 'https://pbs.twimg.com/media/Ff9BeFdWYAAI85b?format=jpg&name=large'}],
 [{'previewUrl': 'https://pbs.twimg.com/media/Ff9A7TRX0AA2pv-?format=jpg&name=small',
   'fullUrl': 'https://pbs.twimg.com/media/Ff9A7TRX0AA2pv-?format=jpg&name=large'}],
 [{'previewUrl': 'https://pbs.twimg.com/media/Ff9AuO-WIAEBGuJ?format=png&name=small',
   'fullUrl': 'https://pbs.twimg.com/media/Ff9AuO-WIAEBGuJ?format=png&name=large'},
  {'previewUrl': 'https://pbs.twimg.com/media/Ff9AuPAWAAIZyZE?format=png&name=small',
   'fullUrl': 'https://pbs.twimg.com/media/Ff9AuPAWAA

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 27 columns):
 #   Column            Non-Null Count  Dtype              
---  ------            --------------  -----              
 0   url               100 non-null    object             
 1   date              100 non-null    datetime64[ns, UTC]
 2   content           100 non-null    object             
 3   renderedContent   100 non-null    object             
 4   id                100 non-null    int64              
 5   user              100 non-null    object             
 6   replyCount        100 non-null    int64              
 7   retweetCount      100 non-null    int64              
 8   likeCount         100 non-null    int64              
 9   quoteCount        100 non-null    int64              
 10  conversationId    100 non-null    int64              
 11  lang              100 non-null    object             
 12  source            100 non-null    object             
 13  source

In [10]:
#keeping only random 10000 records of date, id, content and user columns
tweets = data[['date', 'id', 'content', 'user']].sample(frac = 0.2, random_state=4097).reset_index(drop=True)

In [11]:
tweets.shape

(20, 4)

In [12]:
tweets.head()

Unnamed: 0,date,id,content,user
0,2022-10-25 23:50:54+00:00,1585056336567767041,すっかり秋めいて寒くなってきましたね、というわけで秋仕様オフスタイル川島さんです。\n#st...,"{'username': 'makura_tsumu', 'id': 10850922856..."
1,2022-10-25 23:41:09+00:00,1585053882421760000,#WaifuDiffusion \n#画像生成AI \n#stablediffusion ...,"{'username': 'futi8futi9', 'id': 316486459, 'd..."
2,2022-10-25 23:28:05+00:00,1585050597232812033,"""Insanely detailed treehouse castle overlookin...","{'username': 'GustaveDeresse', 'id': 134821803..."
3,2022-10-25 23:11:24+00:00,1585046396289548293,"""reality""\n#aiart #furry #stablediffusion http...","{'username': 'Luluuuuulynx', 'id': 89974691362..."
4,2022-10-25 23:23:40+00:00,1585049482819141632,@nousr_ Fantastic fine-tune \n#stablediffusion...,"{'username': 'GenX_ARG', 'id': 149148486441521..."


## Now we would figure out the public sentiments towards the Tweets' Text

In [13]:
# We use a pre trained model from the Hugging Face Transformers Library to perform sentiment analysis
# Installing the Library

!pip install transformers

Collecting transformers
  Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
Collecting pyyaml>=5.1
  Using cached PyYAML-6.0-cp39-cp39-win_amd64.whl (151 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.1-cp39-cp39-win_amd64.whl (3.3 MB)
Collecting regex!=2019.12.17
  Using cached regex-2022.9.13-cp39-cp39-win_amd64.whl (267 kB)
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
Collecting typing-extensions>=3.7.4.3
  Using cached typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Installing collected packages: typing-extensions, pyyaml, tokenizers, regex, huggingface-hub, transformers
Successfully installed huggingface-hub-0.10.1 pyyaml-6.0 regex-2022.9.13 tokenizers-0.13.1 transformers-4.23.1 typing-extensions-4.4.0


You should consider upgrading via the 'C:\Users\jonch\code\scrape-social-medias\venv\Scripts\python.exe -m pip install --upgrade pip' command.


## Sentiment Analysis

In [14]:
#Importing pipeline from Transformers
from transformers import pipeline
sentiment_classifier = pipeline('sentiment-analysis')

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 5 files to the new cache system


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
100%|██████████| 5/5 [00:01<00:00,  3.47it/s]
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 629/629 [00:00<00:00, 299kB/s]


RuntimeError: At least one of TensorFlow 2.0 or PyTorch should be installed. To install TensorFlow 2.0, read the instructions at https://www.tensorflow.org/install/ To install PyTorch, read the instructions at https://pytorch.org/.

In [11]:
# Passing the tweets into the sentiment pipeline and extracting the sentiment score and label

tweets = (tweets.assign(sentiment = lambda x: x['content'].apply(lambda s: sentiment_classifier(s)))
.assign(label = lambda x: x['sentiment'].apply(lambda s: (s[0]['label'])),
        score = lambda x: x['sentiment'].apply(lambda s: (s[0]['score']))))

In [12]:
# Checking Top 20 of our new dataframe
tweets.head(20)

Unnamed: 0,date,id,content,username,sentiment,label,score
0,2022-05-24 03:52:59+00:00,1528947140021608448,If I was *intentionally* trying to END ALL LIF...,realseand,"[{'label': 'NEGATIVE', 'score': 0.998236060142...",NEGATIVE,0.998236
1,2022-08-02 14:41:49+00:00,1554477574272737287,#WWIII https://t.co/I0lB2kRx2V,FlawdaJittt,"[{'label': 'NEGATIVE', 'score': 0.995616555213...",NEGATIVE,0.995617
2,2022-08-01 17:47:13+00:00,1554161846155870209,I don't like China or Russia but there's no re...,Has1984,"[{'label': 'NEGATIVE', 'score': 0.998806834220...",NEGATIVE,0.998807
3,2022-08-03 16:31:59+00:00,1554867688291373057,#WWIII https://t.co/Wk1G9Q9gIu,online_prepper,"[{'label': 'NEGATIVE', 'score': 0.995367288589...",NEGATIVE,0.995367
4,2022-07-14 17:31:35+00:00,1547634929650921475,@HistoricTime @McFaul The West can unilaterall...,SalihTorgeir,"[{'label': 'NEGATIVE', 'score': 0.990689158439...",NEGATIVE,0.990689
5,2022-07-26 16:46:01+00:00,1551972115196563457,@RepRoKhanna @SpeakerPelosi Democrats starting...,bnweaver81,"[{'label': 'NEGATIVE', 'score': 0.991808950901...",NEGATIVE,0.991809
6,2022-06-26 09:29:21+00:00,1540990588044148736,😂😂😂😂 WWIII? https://t.co/xalBqP8gtU,kingdisking,"[{'label': 'NEGATIVE', 'score': 0.997159957885...",NEGATIVE,0.99716
7,2022-08-03 02:23:36+00:00,1554654184791625729,The US has 25% of the global monkeypox cases b...,larsupreme,"[{'label': 'NEGATIVE', 'score': 0.999324560165...",NEGATIVE,0.999325
8,2022-08-03 02:06:38+00:00,1554649914126307329,@MayaBijouXXX Man I still ain’t hit yet😒,1stWWIIIVeteran,"[{'label': 'NEGATIVE', 'score': 0.958495855331...",NEGATIVE,0.958496
9,2022-07-12 07:45:58+00:00,1546762776638619648,@iucunde_docet Facci capire “Raoul” anche per ...,AndreaCDami,"[{'label': 'NEGATIVE', 'score': 0.980673789978...",NEGATIVE,0.980674


## Checking the Tweets randomly & analyzing the sentiments

In [13]:
# Checking the tweets randomly and analyizng the sentiments
tweets['content'][2]

"I don't like China or Russia but there's no reason to have WWIII.  If our political leaders drag us into another war, I hope the people refuse and revolt.  We have too many problems here to start others elsewhere"

In [14]:
tweets['content'][11]

"https://t.co/wO8zI68MC9 ⚠️~#GeorgeWashington’s Farewell Address, 1796--&gt;U SHOULD HAVE LISTENED TO GEORGE!~ @POTUS @SecBlinken @UnderSecStateP @UnderSecStateJ @DHSgov @NATO--&gt;NOW U'VE GOTTEN URSELVES IN2 A SITUATION THAT U CAN'T GET OUT OF!~ ⌛️ #WWIII ⚠️ https://t.co/SYvPSiwZuP"

In [15]:
tweets['content'][17]

'@Crary76 They are worried about economics, not us intervening militarily. The US military defending Taiwan if PRC invaded literally kicks off WWIII. And Taiwan can’t afford half measures like we’re giving in Ukraine, we go to all out war or they’re done.'

## Visualizing the sentiments

In [16]:
fig = go.Figure()
fig.add_trace(go.Bar(x = tweets["label"],y = tweets["score"]))          
fig.update_layout(plot_bgcolor = "black")              
fig.show()