# Article Notebook for Scraping Twitter Using snscrape's Python Wrapper
<br>Package Github: https://github.com/JustAnotherArchivist/snscrape
<br>This notebook will be using the development version of snscrape

Article Read-Along: https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af

### Notebook Author: Martin Beck
<b>Information current as of November, 28th 2020</b><br>

This notebook contains materials for scraping tweets from Twitter using snscrape's Python Wrapper

<b>Dependencies: </b> 
- Your <b>Python</b> version must be <b>3.8</b> or higher. The development version of snscrape will not work with Python 3.7 or lower. You can download the latest Python version [here](https://www.python.org/downloads/).
- <b>Development version of snscrape</b>, uncomment the pip install line in the below cell to pip install in the notebook if you don't already have it.
- <b>Pandas</b>, the dataframes allows easy manipulation and indexing of data, this is more of a preference but is what I follow in this notebook.

In [4]:
# Run the pip install command below if you don't already have the library
# !pip install git+https://github.com/JustAnotherArchivist/snscrape.git

# Run the below command if you don't already have Pandas
# !pip install pandas

# Imports
import snscrape.modules.twitter as sntwitter
import pandas as pd

# Query by Username
The code below will scrape for 100 tweets by a username then provide a CSV file with Pandas

In [35]:
# Setting variables to be used below
maxTweets = 100

# Creating list to append tweet data to
tweets_list1 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:jack').get_items()):
    if i>maxTweets:
        break
    tweets_list1.append([tweet.date, tweet.id, tweet.content, tweet.user.username])

In [36]:
# Creating a dataframe from the tweets list above
tweets_df1 = pd.DataFrame(tweets_list1, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])

# Display first 5 entries from dataframe
tweets_df1.head()

Unnamed: 0,Datetime,Tweet Id,Text,Username
0,2020-11-27 21:25:36+00:00,1332435430801690624,@JesseDorogusker @Square ❤️,jack
1,2020-11-18 19:49:02+00:00,1329149637006041088,@NeerajKA Welcome!,jack
2,2020-11-18 18:59:50+00:00,1329137255026311168,Join @CashApp! #Bitcoin https://t.co/SbYANIZyix,jack
3,2020-11-18 18:57:29+00:00,1329136665684705280,@kateconger @sarahintampa Nah,jack
4,2020-11-18 18:54:05+00:00,1329135806192107521,@mmasnick Terrible idea! And terribly false.,jack


In [37]:
# Export dataframe into a CSV
tweets_df1.to_csv('user-tweets.csv', sep=',', index=False)

# Query by Text Search
The code below will scrape for 500 tweets between June 1st, 2020 and July 31st, 2020, by a text search then provide a CSV file with Pandas

In [5]:
# Setting variables to be used below
maxTweets = 5000

# Creating list to append tweet data to
tweets_list2 = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('xlhome since:2022-01-01 until:2022-01-30').get_items()):   
    if i>maxTweets:
        break
    tweets_list2.append([tweet.date, tweet.id, tweet.content, tweet.user.username])
    

In [6]:
# Creating a dataframe from the tweets list above
tweets_df2 = pd.DataFrame(tweets_list2, columns=['Datetime', 'Tweet Id', 'Username', 'Text'])

# Display first 5 entries from dataframe
tweets_df2.head()

Unnamed: 0,Datetime,Tweet Id,Username,Text
0,2022-01-29 12:57:57+00:00,1487409682327404546,ini xlhome ngedown lagi kah loading mulu pas b...,kkareomi
1,2022-01-28 16:36:51+00:00,1487102383427514369,@nawelshs i love xlhome,Atzer_
2,2022-01-28 11:03:22+00:00,1487018457803542528,"@waguwagusuk ganti xlhome kakkkk, lancar jaya ...",yoshicat_
3,2022-01-28 06:41:23+00:00,1486952527882907648,@xlhomeid Duuh 4 hari internet mati tidak ada ...,FloraW44274560
4,2022-01-27 13:39:32+00:00,1486695368628731910,"Sah! XL Axiata Akuisisi Link Net Senilai Rp 8,...",OniDewono


In [7]:
# Export dataframe into a CSV
tweets_df2.to_csv('text-query-tweets-v5.csv', sep=',', index=False)