# **Collecting tweets**

### Installing SNScrape library

In [1]:
#Developer version of SNScrape (it runs only on Jupyter Notebook)

!pip install git+https://github.com/JustAnotherArchivist/snscrape.git

Collecting git+https://github.com/JustAnotherArchivist/snscrape.git
  Cloning https://github.com/JustAnotherArchivist/snscrape.git to /private/var/folders/k0/vjj5ssxx6gv9smwjt_3pmlgw0000gp/T/pip-req-build-e_4o_pud
  Running command git clone --filter=blob:none --quiet https://github.com/JustAnotherArchivist/snscrape.git /private/var/folders/k0/vjj5ssxx6gv9smwjt_3pmlgw0000gp/T/pip-req-build-e_4o_pud
  Resolved https://github.com/JustAnotherArchivist/snscrape.git to commit 285d5874fc20a9d9463ed261f45c5f4118277d05
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: snscrape
  Building wheel for snscrape (pyproject.toml) ... [?25ldone
[?25h  Created wheel for snscrape: filename=snscrape-0.6.2.20230320-py3-none-any.whl size=71808 sha256=92bbd080923bd34d3c50597b66327c6aede88b6ff09a2476

### SNScrape guides:

- SNScrape's GitHub: https://github.com/JustAnotherArchivist/snscrape

- Medium's article and tutorial: https://medium.com/dataseries/how-to-scrape-millions-of-tweets-using-snscrape-195ee3594721

### Importing fundamental libraries

In [2]:
import snscrape.modules.twitter as sntwitter
import pandas as pd
from datetime import date, timedelta
from time import sleep
from tqdm.auto import tqdm

#Ignore warnings
import warnings
warnings.filterwarnings("ignore")

### Building a function to catch the tweets and store into a DataFrame

In [3]:
def tweet_collector(name, initial_month, final_month, initial_day, final_day, number_daily_tweets, total):
    
    # List to store tweets
    tweets_list = []
    
    # Progress bar
    progress_bar = tqdm(total = total)

    # Transforming dates in objects datatype
    initial_date = date(2022, initial_month, initial_day).isoformat()
    final_date = date(2022, final_month, final_day).isoformat()
    
    # Counting
    count = 0

   # Loop
    while count != total:
        for j in range(1):
            for i, tweet in enumerate(sntwitter.TwitterSearchScraper(f'{name} since:{initial_date} until:{final_date} lang:pt -filter:replies').get_items()):
                if i >= number_daily_tweets:
                    final_date = (date.fromisoformat(final_date) - timedelta(days = 1)).isoformat()
                    break
                tweets_list.append([tweet.date, tweet.id, tweet.content, tweet.user.username,
                                     tweet.replyCount, tweet.retweetCount, tweet.likeCount, tweet.quoteCount])
                sleep(0.01)
                count += 1
                progress_bar.update(1)

    progress_bar.close()

    # Creating a DataFrame
    df = pd.DataFrame(tweets_list, columns=['date', 'tweet_id', 'tweet', 'username', 'replies_score',
                                             'retweet_score', 'likes_score', 'quotes_score'])
    
    # Storing as csv
    df.to_csv(f'{name}_tweets_{initial_month}.csv', encoding = 'utf-8', index = False)
    
    print("\n Congratulations! All tweets were collected!")

## **Lula**

### August

In [5]:
tweet_collector('lula', 8, 9, 16, 1, 125, 2000)
#tweet_collector('lula', 8, 9, 16, 1, 2000, 32000)

  0%|          | 0/2000 [00:00<?, ?it/s]


 Congratulations! All tweets were collected!


### September

In [None]:
tweet_collector('lula', 9, 10, 1, 1, 2000, 60000)

  0%|          | 0/60000 [00:00<?, ?it/s]

### October

In [None]:
tweet_collector('lula', 10, 10, 1, 31, 2000, 60000)

  0%|          | 0/60000 [00:00<?, ?it/s]

  lista_tweets.append([tweet.date, tweet.id, tweet.content, tweet.user.username,


Fim!


# **Bolsonaro**

### August

In [None]:
tweet_collector('bolsonaro', 8, 9, 16, 1, 2000, 32000)

  0%|          | 0/16000 [00:00<?, ?it/s]

100%|██████████| 16000/16000 [17:27<00:00, 20.67it/s] 

Fim!


### September

In [None]:
tweet_collector('bolsonaro', 9, 10, 1, 1, 2000, 60000)

  0%|          | 0/30000 [00:00<?, ?it/s]

100%|██████████| 30000/30000 [33:29<00:00, 11.78it/s] 

Fim!


### October

In [None]:
tweet_collector('bolsonaro', 10, 10, 1, 31, 2000, 60000)

  0%|          | 0/60000 [00:00<?, ?it/s]

  lista_tweets.append([tweet.date, tweet.id, tweet.content, tweet.user.username,


Fim!
