## <div class="alert alert-success" align="center"><h2>Webscraping Tweets without Twitter API</h2></div>

<div style="border-radius:10px;border:#254E58 solid;padding: 15px;background-color:white;font-size:110%;text-align:left">

<h3 align="left"><font color=#254E58>Details:</font></h3>   
 

Code is written with the help of python library snscrape.

**snscrape** is a Python library that can be used to scrape tweets through Twitter's API without any restrictions or request limits. Moreover, you don't even need a Twitter developer account to scrape tweets when you use snscrape.

Check out this [easy-to-follow tutorial on how to scrape tweets using snscrape](http://https://www.youtube.com/watch?v=jtIMnmbnOFo&t=307s) on Youtube.

Also checkout [snscrape on GitHub](http://https://github.com/JustAnotherArchivist/snscrape).

In [None]:
!pip install snscrape
import snscrape.modules.twitter as snstwitter
import pandas as pd
from tqdm import tqdm

In [None]:
start_date = pd.Timestamp('2012-01-01')
end_date = pd.Timestamp('now').floor('D')

<div style="border-radius:10px;border:#254E58 solid;padding: 15px;background-color:white;font-size:110%;text-align:left">
These lines define the start and end dates for the Twitter search query. The start date is set to January 1st, 2012 and the end date is set to the current date, rounded down to the nearest day.

In [None]:
query = "Fantasy Premier League OR FPL since:{} until:{}".format(
    start_date.strftime('%Y-%m-%d'), end_date.strftime('%Y-%m-%d'))

<div style="border-radius:10px;border:#254E58 solid;padding: 15px;background-color:white;font-size:110%;text-align:left">
This line defines the search query to be used in the Twitter search. It searches for tweets containing either "Fantasy Premier League" or "FPL" between the start and end dates defined above.

In [None]:
limit = 10000

<div style="border-radius:10px;border:#254E58 solid;padding: 15px;background-color:white;font-size:110%;text-align:left">
This line defines the number of tweets to be collected per year. The limit is set to 10,000 tweets per year.

In [None]:
tweets = []

In [None]:
# Loop over each year
for year in range(start_date.year, end_date.year+1):
    # Define the start and end dates for the current year
    year_start_date = pd.Timestamp('{}-01-01'.format(year))
    year_end_date = pd.Timestamp('{}-12-31'.format(year))

    # Define the query string for the current year
    year_query = "{} since:{} until:{}".format(query, year_start_date.strftime('%Y-%m-%d'), year_end_date.strftime('%Y-%m-%d'))

    # Loop over each tweet for the current year
    for tweet in tqdm(snstwitter.TwitterSearchScraper(year_query).get_items()):
        if len(tweets) >= limit*(year-start_date.year+1):
            break
        else:
            tweets.append([tweet.id,tweet.date,tweet.username, tweet.content,
                           tweet.hashtags,tweet.retweetCount,tweet.likeCount,
                           tweet.replyCount,tweet.source,
                           tweet.user.location,tweet.user.verified,
                           tweet.user.followersCount,tweet.user.friendsCount])
            

<div style="border-radius:10px;border:#254E58 solid;padding: 15px;background-color:white;font-size:110%;text-align:left">

This line appends a list of tweet attributes to the tweets list for each tweet found in the search query. The tweet attributes include the ID, timestamp, username, text content, hashtags, retweet count, like count, reply count, source, user location, verified account status, number of followers, and number of accounts followed.

In [None]:
df = pd.DataFrame(tweets, columns=['ID','Timestamp','User','Text',
                                   'Hashtag','Retweets','Likes',
                                   'Replies','Source',
                                   'Location','Verified_Account',
                                   'Followers','Following'])

<div style="border-radius:10px;border:#254E58 solid;padding: 15px;background-color:white;font-size:110%;text-align:left">
This line creates a pandas DataFrame from the tweets list with columns labeled ID, Timestamp, User, Text, Hashtag, Retweets, Likes, Replies, Source, Location, Verified_Account, Followers, and Following. Each row of the DataFrame corresponds

<div style="border-radius:10px;border:#254E58 solid;padding: 15px;background-color:white;font-size:110%;text-align:left">
Below screenshot shows how much time each chunk of 10K Tweets/year took to execute.


![image.png](attachment:e9c68272-6b36-443c-a702-9a1b315cb519.png)

<a id="1"></a>
# <div style="padding:20px;color:white;margin:0;font-size:18px;font-family:Georgia;text-align:center;display:fill;border-radius:30px;background-color:#254E58;overflow:hidden"><b>I was able to scrape 114,466 Tweets within 90mins roughly.Attached [Dataset Link](https://www.kaggle.com/datasets/prasad22/fpl-tweets-dataset)</b></div>
 