# Article Notebook for Scraping Twitter Using snscrape's Python Wrapper
<br>Package Github: https://github.com/JustAnotherArchivist/snscrape
<br>This notebook will be using the development version of snscrape

Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

### Notebook Author: Martin Beck
<b>Information current as of November, 28th 2020</b><br>

This notebook contains materials for scraping tweets from Twitter using snscrape's Python Wrapper

<b>Dependencies: </b> 
- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).
- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.
- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook.

In [None]:
!pip install jupyterthemes
!jt -t chesterish

In [None]:
!pip install xlrd

In [None]:
# Imports
import snscrape.modules.twitter as sntwitter
import pandas as pd
import xlrd

In [None]:
help(sntwitter) # replyCount: int, retweetCount: int, likeCount: int, quoteCount: int

# Query by Username
The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas

In [None]:
book = xlrd.open_workbook('EnergyTwitterAccounts.xls')
sheet = book.sheet_by_name('Sheet1')
accounts1 = [sheet.cell_value(r, 0) for r in range(sheet.nrows)]
accounts2 = [sheet.cell_value(r, 1) for r in range(sheet.nrows)]

In [None]:
# Setting variables to be used below
maxTweets = 50

# Creating list to append tweet data to
tweets_list = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for _ in range(len(accounts1)):
    for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:'+ accounts1[_]).get_items()):
        if i>=maxTweets:
            break
        tweets_list.append([tweet.date, tweet.id, tweet.user.displayname, tweet.content, tweet.likeCount, tweet.replyCount, tweet.retweetCount, tweet.tcooutlinks, accounts2[_]])

In [None]:
# Creating a dataframe from the tweets list above
df = pd.DataFrame(tweets_list, columns=['Datetime', 'Tweet Id', 'Account', 'Text', 'Likes', 'Replies', 'Retweets', 'Links','Type'])

# Display first 5 entries from dataframe
df.head()

In [None]:
string = []
for _ in range(df.shape[0]):
    string.append(df.Text.iloc[_])
wordsoup = " ".join(string).split()

In [None]:
df_writers = df[df.Type=='Writer/Podcaster'].copy()
string = []
for _ in range(df_writers.shape[0]):
    string.append(df_writers.Text.iloc[_])
wordsoup_writers = " ".join(string).split()

df_thinktanks = df[df.Type=='Think Tank/Activism'].copy()
string = []
for _ in range(df_thinktanks.shape[0]):
    string.append(df_thinktanks.Text.iloc[_])
wordsoup_thinktanks = " ".join(string).split()

df_Fedempl = df[df.Type=='Politician/Fed Employee'].copy()
string = []
for _ in range(df_Fedempl.shape[0]):
    string.append(df_Fedempl.Text.iloc[_])
wordsoup_Fedempl = " ".join(string).split()

In [None]:
df_Fedagcy = df[df.Type=='Fed Agency'].copy()
string = []
for _ in range(df_Fedagcy.shape[0]):
    string.append(df_Fedagcy.Text.iloc[_])
wordsoup_Fedagcy = " ".join(string).split()

In [None]:
# stop words taken from git forum. Plus a few of my own added.
removal_list = ["As","poll", "United", "opportunity", "Minister", "@FERC", "virtual" "more:", "Panelists:", "join", "Symposium", "@ENERGY", "Our", "Join", "time", "support", "here:", "But", "@TIME", "US", "And", "@GreenBiz", "w/","How", "--", "If", "@amywestervelt", "Watch","New", "learn", "things", "join", "webinar", "Register","A", "&amp" ,"&amp;","I","here","|", "role", "discuss","It's", "including", "it's","today", "hear", "key","report","The","We","This", "-","Read", "What", "full", "In", "May", "For", "Thank", "—","U.S.", "–", "great", "protect", "here", "create", "latest", "lead", "It", "Learn","good", "build", "create", "a","about","above","after","again","against","ain","all","am","an","and","any","are","aren","aren't","as","at","be","because","been","before","being","below","between","both","but","by","can","couldn","couldn't","d","did","didn","didn't","do","does","doesn","doesn't","doing","don","don't","down","during","each","few","for","from","further","had","hadn","hadn't","has","hasn","hasn't","have","haven","haven't","having","he","her","here","hers","herself","him","himself","his","how","i","if","in","into","is","isn","isn't","it","it's","its","itself","just","ll","m","ma","me","mightn","mightn't","more","most","mustn","mustn't","my","myself","needn","needn't","no","nor","not","now","o","of","off","on","once","only","or","other","our","ours","ourselves","out","over","own","re","s","same","shan","shan't","she","she's","should","should've","shouldn","shouldn't","so","some","such","t","than","that","that'll","the","their","theirs","them","themselves","then","there","these","they","this","those","through","to","too","under","until","up","ve","very","was","wasn","wasn't","we","were","weren","weren't","what","when","where","which","while","who","whom","why","will","with","won","won't","wouldn","wouldn't","y","you","you'd","you'll","you're","you've","your","yours","yourself","yourselves","could","he'd","he'll","he's","here's","how's","i'd","i'll","i'm","i've","let's","ought","she'd","she'll","that's","there's","they'd","they'll","they're","they've","we'd","we'll","we're","we've","what's","when's","where's","who's","why's","would","able","abst","accordance","according","accordingly","across","act","actually","added","adj","affected","affecting","affects","afterwards","ah","almost","alone","along","already","also","although","always","among","amongst","announce","another","anybody","anyhow","anymore","anyone","anything","anyway","anyways","anywhere","apparently","approximately","arent","arise","around","aside","ask","asking","auth","available","away","awfully","b","back","became","become","becomes","becoming","beforehand","begin","beginning","beginnings","begins","behind","believe","beside","besides","beyond","biol","brief","briefly","c","ca","came","cannot","can't","cause","causes","certain","certainly","co","com","come","comes","contain","containing","contains","couldnt","date","different","done","downwards","due","e","ed","edu","effect","eg","eight","eighty","either","else","elsewhere","end","ending","enough","especially","et","etc","even","ever","every","everybody","everyone","everything","everywhere","ex","except","f","far","ff","fifth","first","five","fix","followed","following","follows","former","formerly","forth","found","four","furthermore","g","gave","get","gets","getting","give","given","gives","giving","go","goes","gone","got","gotten","h","happens","hardly","hed","hence","hereafter","hereby","herein","heres","hereupon","hes","hi","hid","hither","home","howbeit","however","hundred","id","ie","im","immediate","immediately","importance","important","inc","indeed","index","information","instead","invention","inward","itd","it'll","j","k","keep","keeps","kept","kg","km","know","known","knows","l","largely","last","lately","later","latter","latterly","least","less","lest","let","lets","like","liked","likely","line","little","'ll","look","looking","looks","ltd","made","mainly","make","makes","many","may","maybe","mean","means","meantime","meanwhile","merely","mg","might","million","miss","ml","moreover","mostly","mr","mrs","much","mug","must","n","na","name","namely","nay","nd","near","nearly","necessarily","necessary","need","needs","neither","never","nevertheless","new","next","nine","ninety","nobody","non","none","nonetheless","noone","normally","nos","noted","nothing","nowhere","obtain","obtained","obviously","often","oh","ok","okay","old","omitted","one","ones","onto","ord","others","otherwise","outside","overall","owing","p","page","pages","part","particular","particularly","past","per","perhaps","placed","please","plus","poorly","possible","possibly","potentially","pp","predominantly","present","previously","primarily","probably","promptly","proud","provides","put","q","que","quickly","quite","qv","r","ran","rather","rd","readily","really","recent","recently","ref","refs","regarding","regardless","regards","related","relatively","research","respectively","resulted","resulting","results","right","run","said","saw","say","saying","says","sec","section","see","seeing","seem","seemed","seeming","seems","seen","self","selves","sent","seven","several","shall","shed","shes","show","showed","shown","showns","shows","significant","significantly","similar","similarly","since","six","slightly","somebody","somehow","someone","somethan","something","sometime","sometimes","somewhat","somewhere","soon","sorry","specifically","specified","specify","specifying","still","stop","strongly","sub","substantially","successfully","sufficiently","suggest","sup","sure","take","taken","taking","tell","tends","th","thank","thanks","thanx","thats","that've","thence","thereafter","thereby","thered","therefore","therein","there'll","thereof","therere","theres","thereto","thereupon","there've","theyd","theyre","think","thou","though","thoughh","thousand","throug","throughout","thru","thus","til","tip","together","took","toward","towards","tried","tries","truly","try","trying","ts","twice","two","u","un","unfortunately","unless","unlike","unlikely","unto","upon","ups","us","use","used","useful","usefully","usefulness","uses","using","usually","v","value","various","'ve","via","viz","vol","vols","vs","w","want","wants","wasnt","way","wed","welcome","went","werent","whatever","what'll","whats","whence","whenever","whereafter","whereas","whereby","wherein","wheres","whereupon","wherever","whether","whim","whither","whod","whoever","whole","who'll","whomever","whos","whose","widely","willing","wish","within","without","wont","words","world","wouldnt","www","x","yes","yet","youd","youre","z","zero","a's","ain't","allow","allows","apart","appear","appreciate","appropriate","associated","best","better","c'mon","c's","cant","changes","clearly","concerning","consequently","consider","considering","corresponding","course","currently","definitely","described","despite","entirely","exactly","example","going","greetings","hello","help","hopefully","ignored","inasmuch","indicate","indicated","indicates","inner","insofar","it'd","keep","keeps","novel","presumably","reasonably","second","secondly","sensible","serious","seriously","sure","t's","third","thorough","thoroughly","three","well","wonder"]
final_soup = [word for word in wordsoup if word not in removal_list]

In [229]:
#stop words removal
final_writers = [word for word in wordsoup_writers if word not in removal_list]
final_thinktanks = [word for word in wordsoup_thinktanks if word not in removal_list]
final_Fedempl = [word for word in wordsoup_Fedempl if word not in removal_list]
final_Fedagcy = [word for word in wordsoup_Fedagcy if word not in removal_list]

In [230]:
#shared words removal
shared_list = ["climate", "energy", "Energy", "Climate", "change"]

final_writers = [word for word in final_writers if word not in shared_list]
final_thinktanks = [word for word in final_thinktanks if word not in shared_list]
final_Fedempl = [word for word in final_Fedempl if word not in shared_list]
final_Fedagcy = [word for word in final_Fedagcy if word not in shared_list]

In [None]:
#peek at results
from collections import Counter
import numpy
numpy.transpose(Counter(final_writers).most_common(20)), numpy.transpose(Counter(final_thinktanks).most_common(20)), numpy.transpose(Counter(final_Fedempl).most_common(20)), numpy.transpose(Counter(final_Fedagcy).most_common(20))

In [None]:
!pip install wordcloud

In [223]:
#word clouds
from wordcloud import WordCloud

In [215]:
from PIL import Image
print('Pillow Version:', PIL.__version__)

Pillow Version: 8.1.0


In [200]:
wc_writers = " ".join(final_writers)
wc_thintanks = " ".join(final_thinktanks)
wc_Fedempl = " ".join(final_Fedempl)
wc_Fedagcy = " ".join(final_Fedagcy)

In [235]:
wc_writers



In [236]:
def create_wordcloud(text):
    mask = numpy.array(Image.open("cloud.jpg"))
 
    # create wordcloud object
    wc = WordCloud(background_color="white",
                    max_words=20, 
                    mask=mask,
                    stopwords = ["https://t.co/", "http", "t", "co", "s", ])
 
    wc.generate(text)
 
    # save wordcloud
    wc.to_file("output.jpg")

In [237]:
create_wordcloud(wc_writers)

In [None]:
# Export dataframe into a CSV
df.to_csv('user-tweets.csv', sep=',', index=False)

# Query by Text Search
The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas

In [None]:
# Setting variables to be used below
maxTweets = 500

# Creating list to append tweet data to
tweets_list2 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('its the elephant since:2020-06-01 until:2020-07-31').get_items()):
    if i>maxTweets:
        break
    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])

In [None]:
# Creating a dataframe from the tweets list above
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

# Display first 5 entries from dataframe
tweets_df2.head()

In [None]:
# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets.csv', sep=',', index=False)