# Twitter Scrapping Tool

## Resources

- [Analyzing Tweets with NLP in minutes with Spark, Optimus and Twint](https://towardsdatascience.com/analyzing-tweets-with-nlp-in-minutes-with-spark-optimus-and-twint-a0c96084995f)

## Install Twint and Requirements

In [1]:
%%capture

# Install "twint" and requirements
!pip install -e ./twint -r ./twint/requirements.txt
!pip install -r requirements.txt

# Enable extension
!jupyter nbextension enable --py widgetsnbextension

## Import and Setup Libraries

In [2]:
# Solve compatibility issues with notebooks and RunTime errors.
import nest_asyncio
nest_asyncio.apply()

In [3]:
import twint
import pandas as pd
from datetime import datetime, timedelta
from tqdm.notebook import tqdm

## Scrape Twitter according to Config

In [4]:
def scrape_twit(config, until, since=None):
    if not since:
        since = until
        
    df = pd.DataFrame()
    
    start = datetime.strptime(since, '%Y-%m-%d')
    end = datetime.strptime(until, '%Y-%m-%d')
    delta = end - start
    
    for i in tqdm(range(1, delta.days + 1)):
        itr = start + timedelta(days=i)
        config.Until = itr.strftime('%Y-%m-%d')
        
        twint.run.Search(config)
        
        df = df.append(twint.storage.panda.Tweets_df, ignore_index=True)
        df = df.drop_duplicates(subset='id')
    
    return df

In [5]:
## Initialize search configuration
config = twint.Config()

## Search keys
# config.Username = "elonmusk"
config.Search = "btc"

## Search Settings
config.Lang = "en"
config.Limit = 500
config.Verified = True

## Search interval
# config.Since = '2015-01-01'
# config.Until = '2017-01-01'
# config.Timedelta = 10

## Output settings
config.Hide_output = True
config.Store_csv = True
config.Output = "Output/btc_3_month.csv"
config.Pandas = True

## Run search
# twint.run.Search(config)

In [6]:
df = scrape_twit(config, since='2021-10-29', until='2022-01-29')

  0%|          | 0/92 [00:00<?, ?it/s]

[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.
[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.
[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.
[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.


In [7]:
df

Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1454232278092107780,1454229634648793094,1.635551e+12,2021-10-29 19:42:48,-0500,,"@Croesus_BTC @skwp Interestingly, you can also...",en,[],[],...,,,,,,"[{'screen_name': 'Croesus_BTC', 'name': 'Croes...",,,,
1,1454227990963970049,1454227990963970049,1.635550e+12,2021-10-29 19:25:45,-0500,,Yes - I’m allowed to get into #cryptocurrencie...,en,[cryptocurrencies],"[btc, kishu]",...,,,,,,[],,,,
2,1454224457799815172,1454224457799815172,1.635549e+12,2021-10-29 19:11:43,-0500,,Why WOULDNT a #btc maxi take advantage of the ...,en,"[btc, bitcoin]",[],...,,,,,,[],,,,
3,1454220449265508359,1454220446723821569,1.635548e+12,2021-10-29 18:55:47,-0500,,a fight we must remember is that BTC &gt; FB...,en,[],[],...,,,,,,[],,,,
4,1454206491498516483,1454206491498516483,1.635545e+12,2021-10-29 18:00:20,-0500,,Hindsight is always 20/20 am i right folks 🤪 ...,en,[],[btc],...,,,,,,[],,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15957,1486874424364183556,1486859727908843520,1.643333e+12,2022-01-27 20:31:02,-0500,,"@WellbourneM Once the $btc price stabilizes, A...",en,[],[btc],...,,,,,,"[{'screen_name': 'WellbourneM', 'name': '🇺🇸 Ma...",,,,
15958,1486871174164324352,1486867784734810113,1.643333e+12,2022-01-27 20:18:07,-0500,,@JuanSGuarnizo TO THE MOON $BTC $ETH $USDT $EL...,en,[],"[btc, eth, usdt]",...,,,,,,"[{'screen_name': 'JuanSGuarnizo', 'name': 'ElJ...",,,,
15959,1486868220413984771,1486856307328098307,1.643332e+12,2022-01-27 20:06:23,-0500,,@BosRec I fought this kind of FUD from BTC max...,en,[],[],...,,,,,,"[{'screen_name': 'BosRec', 'name': 'Rec_Bos', ...",,,,
15960,1486865585774678016,1486865585774678016,1.643331e+12,2022-01-27 19:55:54,-0500,,BREAKING: Candidate for Governor of #Texas lay...,en,"[texas, bitcoin, crypto, cryptocurrency, crypt...",[],...,,,,,,[],,,,
