# Article Notebook for Scraping Twitter Using snscrape's CLI Commands With Python
<br>Package Github: https://github.com/JustAnotherArchivist/snscrape
<br>This notebook will be using the development version of snscrape

Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

### Notebook Author: Martin Beck
<b>Information current as of November, 26th 2020</b><br>

This notebook contains materials for scraping tweets from Twitter using snscrape's CLI commands with Python

<b>Dependencies: </b> 
- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).
- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.
- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook.

In [151]:
# Run the pip install command below if you don't already have the library
# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git

# Run the below command if you don't already have Pandas
# !pip install pandas

# Imports
import os
import pandas as pd

# Query by Username
The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas

# Query by Text Search
The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas

In [149]:
# Setting variables to be used in format string command below
tweet_count = 30000
text_query = "illegal refugees"
#text_query = "Afghanistan filter:retweets"

since_date = "2021-08-9"
until_date = "2021-9-30"

# Using OS library to call CLI commands in Python
os.system('snscrape --jsonl --max-results {} --since {} twitter-search "{} until:{}"> text-query-tweets.json'.format(tweet_count, since_date, text_query, until_date))

0

In [150]:
tweets_illegal = pd.read_json('text-query-tweets.json', lines=True)
tweets_illegal.shape

(9091, 28)

In [143]:
tweets_refugeeswelcome = pd.read_json('text-query-tweets.json', lines=True)
tweets_refugeeswelcome.shape

(16765, 28)

In [148]:
tweets_refugeeswelcome["rawContent"][2]

'Very important webinar! Don’t forget to tune in! @VCU @VCUHealth @VCUNeuro We welcome Afghan refugees into our communities! @AANMember #RefugeesNeedHelp #RefugeesWelcome'

In [141]:
tweets_refugeesnotwelcome = pd.read_json('text-query-tweets.json', lines=True)
tweets_refugeesnotwelcome.shape

(940, 28)

In [140]:
tweets_refugeesnotwelcome

'@FOX5Atlanta Who is making the decision to where they send these refugees and burden the state?  These folks are not translators and have not been vetted. #refugeesNotWelcome #COVID19'

In [131]:
df_tweets =tweets_df2[tweets_df2["renderedContent"].str.contains('immigr', regex=True, na=False)]

In [132]:
df_tweets

Unnamed: 0,_type,url,date,rawContent,renderedContent,id,user,replyCount,retweetCount,likeCount,...,retweetedTweet,quotedTweet,inReplyToTweetId,inReplyToUser,mentionedUsers,coordinates,place,hashtags,cashtags,card
3,snscrape.modules.twitter.Tweet,https://twitter.com/quilting_plans/status/1442...,2021-09-26 19:25:17+00:00,#refugeesNOTwelcome #immigration #BidenLiedPeo...,#refugeesNOTwelcome #immigration #BidenLiedPeo...,1442208675125473280,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,0,0,...,,"{'_type': 'snscrape.modules.twitter.Tweet', 'u...",,,,,,"[refugeesNOTwelcome, immigration, BidenLiedPeo...",,
31,snscrape.modules.twitter.Tweet,https://twitter.com/lindajaniebrou1/status/143...,2021-09-17 14:48:55+00:00,"@jayehm1958 @tedcruz Yeah, so says Wikipedia a...","@jayehm1958 @tedcruz Yeah, so says Wikipedia a...",1438877635073953792,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,0,0,...,,,1.438839e+18,"{'_type': 'snscrape.modules.twitter.User', 'us...","[{'_type': 'snscrape.modules.twitter.User', 'u...",,,"[cancuncruz, Haitian, refugeesnotwelcome, Texa...",,
32,snscrape.modules.twitter.Tweet,https://twitter.com/lindajaniebrou1/status/143...,2021-09-17 14:34:53+00:00,"@49ersfutbol @tedcruz Yeah, #Haitians horrify ...","@49ersfutbol @tedcruz Yeah, #Haitians horrify ...",1438874101402284032,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,1,1,...,,,1.438751e+18,"{'_type': 'snscrape.modules.twitter.User', 'us...","[{'_type': 'snscrape.modules.twitter.User', 'u...",,,"[Haitians, refugeesnotwelcome, cancuncruz, Tex...",,
49,snscrape.modules.twitter.Tweet,https://twitter.com/LiberalBullies/status/1436...,2021-09-11 22:49:48+00:00,All #Arabs are #terrorist \n#Islamic #terroris...,All #Arabs are #terrorist \n#Islamic #terroris...,1436824324258684928,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,0,0,...,,"{'_type': 'snscrape.modules.twitter.Tweet', 'u...",,,,,,"[Arabs, terrorist, Islamic, terrorism, Islam, ...",,
55,snscrape.modules.twitter.Tweet,https://twitter.com/Fr_Conservateur/status/143...,2021-09-10 12:05:59+00:00,"Ouvrez les yeux, Emmanuel #Macron. Ecoutez les...","Ouvrez les yeux, Emmanuel #Macron. Ecoutez les...",1436299915811041294,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,7,8,...,,,,,,,,"[Macron, Zemmour, StopImmigration, RefugeesNot...",,{'_type': 'snscrape.modules.twitter.SummaryCar...
60,snscrape.modules.twitter.Tweet,https://twitter.com/olivier_migette/status/143...,2021-09-09 14:51:46+00:00,#refugeesNOTwelcome • D’un côté un pays qui pr...,#refugeesNOTwelcome • D’un côté un pays qui pr...,1435979247034699780,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,0,0,...,,"{'_type': 'snscrape.modules.twitter.Tweet', 'u...",,,"[{'_type': 'snscrape.modules.twitter.User', 'u...",{'_type': 'snscrape.modules.twitter.Coordinate...,"{'_type': 'snscrape.modules.twitter.Place', 'f...",[refugeesNOTwelcome],,
73,snscrape.modules.twitter.Tweet,https://twitter.com/MattyLad/status/1435855845...,2021-09-09 06:41:25+00:00,"After Priti Patel says turn back bigger boats,...","After Priti Patel says turn back bigger boats,...",1435855845758226434,"{'_type': 'snscrape.modules.twitter.User', 'us...",1,0,0,...,,,,,"[{'_type': 'snscrape.modules.twitter.User', 'u...",,,"[Migrants, Calais, ILLEGALimmigrants, illegali...",,
74,snscrape.modules.twitter.Tweet,https://twitter.com/_Mr_Vivek_/status/14358298...,2021-09-09 04:58:10+00:00,बस ऐसे गद्दारों के वजह से हमारा भविष्य अंधकार ...,बस ऐसे गद्दारों के वजह से हमारा भविष्य अंधकार ...,1435829861881364483,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,0,2,...,,,,,,,,"[Migrants, illegalimmigration, Rohingya, Refug...",,
94,snscrape.modules.twitter.Tweet,https://twitter.com/LiberalBullies/status/1435...,2021-09-07 10:09:52+00:00,#BigTech #BigBrother #twitter #censure \nOPPRE...,#BigTech #BigBrother #twitter #censure \nOPPRE...,1435183528908492802,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,0,0,...,,"{'_type': 'snscrape.modules.twitter.Tweet', 'u...",,,,,,"[BigTech, BigBrother, twitter, censure, Workin...",,
152,snscrape.modules.twitter.Tweet,https://twitter.com/LiberalBullies/status/1432...,2021-08-31 21:10:58+00:00,#RefugeesNotWelcome #Afganisthan GO HOME\nSTOP...,#RefugeesNotWelcome #Afganisthan GO HOME\nSTOP...,1432813187708035074,"{'_type': 'snscrape.modules.twitter.User', 'us...",0,0,0,...,,,,,,,,"[RefugeesNotWelcome, Afganisthan, immigration,...",,{'_type': 'snscrape.modules.twitter.EventCard'...


In [108]:
df_tweets["renderedContent"][4996]    

'A diverse coalition will continue to engage communities on issues of immigrant rights, affordable housing, climate and criminal justice. trib.al/DByNWJX'

In [110]:
df_tweets.to_csv('taliban.csv', sep=',', index=False,header=True )

In [111]:
# Reads the json generated from the CLI command above and creates a pandas dataframe
tweets_df2 = pd.read_json('text-query-tweets.json', lines=True)

# Displays first 5 entries from dataframe
tweets_df2.head()
df_tweets = tweets_df2[["date","rawContent","renderedContent","id","place"]]
c

Unnamed: 0,date,rawContent,renderedContent,id,place
0,2021-09-21 23:59:56+00:00,Kamala Harris: Support The Immigration For the...,Kamala Harris: Support The Immigration For the...,1440465853686771721,
1,2021-09-21 23:59:54+00:00,@peakcapitolism and the liberals have been muc...,@peakcapitolism and the liberals have been muc...,1440465843662454790,
2,2021-09-21 23:59:44+00:00,@cuttinchief @SS2_spacewrench @CNN America has...,@cuttinchief @SS2_spacewrench @CNN America has...,1440465801543225351,
3,2021-09-21 23:59:44+00:00,"@FoxNews Someone enters my house, threatens me...","@FoxNews Someone enters my house, threatens me...",1440465801052495881,
4,2021-09-21 23:59:20+00:00,@kfbk Feds Launch Investigation Into 'Horrific...,@kfbk Feds Launch Investigation Into 'Horrific...,1440465701425205260,
...,...,...,...,...,...
4995,2021-09-21 17:00:49+00:00,A diverse coalition will continue to engage co...,A diverse coalition will continue to engage co...,1440360379058589708,
4996,2021-09-21 17:00:49+00:00,A diverse coalition will continue to engage co...,A diverse coalition will continue to engage co...,1440360378618167304,
4997,2021-09-21 17:00:49+00:00,A diverse coalition will continue to engage co...,A diverse coalition will continue to engage co...,1440360377619943429,
4998,2021-09-21 17:00:48+00:00,A diverse coalition will continue to engage co...,A diverse coalition will continue to engage co...,1440360376353243152,


In [112]:
df_tweets = pd.read_csv("afghanistan3.csv")
df_tweets

Unnamed: 0,date,rawContent,renderedContent,id,place
0,2021-09-04 23:59:57+00:00,"@KP24 If India 🇮🇳, Australia 🇦🇺 and England 🏴󠁧...","@KP24 If India 🇮🇳, Australia 🇦🇺 and England 🏴󠁧...",1434305262119276551,
1,2021-09-04 23:59:56+00:00,"Overall, do you think the U.S. role in Afghani...","Overall, do you think the U.S. role in Afghani...",1434305260596596736,
2,2021-09-04 23:59:56+00:00,Afghanistan: Life for those left behind https:...,Afghanistan: Life for those left behind bbc.in...,1434305260521205764,
3,2021-09-04 23:59:56+00:00,Afghanistan: Life for those left behind https:...,Afghanistan: Life for those left behind bbc.in...,1434305260428873728,
4,2021-09-04 23:59:48+00:00,@Hrushik06039597 @NA2NRF It’s terror game over...,@Hrushik06039597 @NA2NRF It’s terror game over...,1434305225284784129,
...,...,...,...,...,...
89995,2021-09-03 14:10:25+00:00,I reckon it's the Haqqani side wanting to figh...,I reckon it's the Haqqani side wanting to figh...,1433794513571549184,
89996,2021-09-03 14:10:23+00:00,Am concerned about the Taliban-China nexus par...,Am concerned about the Taliban-China nexus par...,1433794506088980481,
89997,2021-09-03 14:10:22+00:00,Does anyone see or hear from the Wealthy about...,Does anyone see or hear from the Wealthy about...,1433794499973722115,
89998,2021-09-03 14:10:22+00:00,"#maas Der Aussenminister vor 2 Wochen: ""Wenn d...","#maas Der Aussenminister vor 2 Wochen: ""Wenn d...",1433794499822735371,


In [26]:
df_tweets[["renderedContent"]].head()
df_tweets.to_csv('user-tweets.csv', sep=',', index=False)


In [10]:
# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)