# Article Notebook for Scraping Twitter Using snscrape's Python Wrapper
<br>Package Github: https://github.com/JustAnotherArchivist/snscrape
<br>This notebook will be using the development version of snscrape

Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

### Notebook Author: Martin Beck
<b>Information current as of November, 28th 2020</b><br>

This notebook contains materials for scraping tweets from Twitter using snscrape's Python Wrapper

<b>Dependencies: </b> 
- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).
- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.
- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook.

In [1]:
# Run the pip install command below if you don't already have the library
# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git

# Run the below command if you don't already have Pandas
# !pip install pandas

# Imports
import snscrape.modules.twitter as sntwitter
import pandas as pd

# Query by Username
The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas

In [35]:
# Setting variables to be used below
maxTweets = 100

# Creating list to append tweet data to
tweets_list1 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):
    if i>maxTweets:
        break
    tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.user.username])

In [36]:
# Creating a dataframe from the tweets list above
tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

# Display first 5 entries from dataframe
tweets_df1.head()

Unnamed: 0,Datetime,Tweet Id,Text,Username
0,2020-11-27 21:25:36+00:00,1332435430801690624,@JesseDorogusker @Square ‚ù§Ô∏è,jack
1,2020-11-18 19:49:02+00:00,1329149637006041088,@NeerajKA Welcome!,jack
2,2020-11-18 18:59:50+00:00,1329137255026311168,Join @CashApp! #Bitcoin https://t.co/SbYANIZyix,jack
3,2020-11-18 18:57:29+00:00,1329136665684705280,@kateconger @sarahintampa Nah,jack
4,2020-11-18 18:54:05+00:00,1329135806192107521,@mmasnick Terrible idea! And terribly false.,jack


In [37]:
# Export dataframe into a CSV
tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)

# Query by Text Search
The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas

In [17]:
# Setting variables to be used below
maxTweets = 5000

# Creating list to append tweet data to
tweets_list2 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('xlhome since:2020-01-01 until:2020-12-31').get_items()):
    if i>maxTweets:
        break
    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])

In [18]:
# Creating a dataframe from the tweets list above
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Username', 'Text'])

# Display first 5 entries from dataframe
tweets_df2.head()

Unnamed: 0,Datetime,Tweet Id,Text,Username
0,2020-12-30 17:11:33+00:00,1344330294376034307,@HalomoanIkhsan Gasken pindah xlhome 1gbps,MaulanaHasyim
1,2020-12-30 03:31:18+00:00,1344123870899400706,@collegemenfess xlhome 280k/bulan,imma_aggie
2,2020-12-30 02:50:24+00:00,1344113578278719488,@tovan_hnd Buat wifi? \nIndihome msh kenceng s...,izzaty2011
3,2020-12-29 17:33:04+00:00,1343973323680583680,xlhome klo tiap pagi buta gini error mulu deh,maybooya
4,2020-12-29 17:07:54+00:00,1343966989442633729,@makeitpaw xlhome üòîüíî,strawbright


In [19]:
# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets-v3.csv', sep=',', index=False)