# Twitter Scrapping Tool

## Resources

- [Analyzing Tweets with NLP in minutes with Spark, Optimus and Twint](https://towardsdatascience.com/analyzing-tweets-with-nlp-in-minutes-with-spark-optimus-and-twint-a0c96084995f)

## Import and Setup Libraries

In [2]:
# Solve compatibility issues with notebooks and RunTime errors.
import nest_asyncio
nest_asyncio.apply()

In [3]:
import twint
import pandas as pd
from datetime import datetime, timedelta
from tqdm.notebook import tqdm

## Scrape Twitter according to Config

### Scraping Function

In [4]:
# Function for running single/multiple search on a date interval
def scrape_twitter(search, limit, since, until=None, output="../data/output.csv"):
    
    # Initialize search configuration
    config = twint.Config()

    # Search keys
    config.Search = search

    # Search settings
    config.Lang = "en"
    config.Limit = limit
    config.Verified = False
    
    # Output settings
    config.Hide_output = True
    config.Store_csv = True
    config.Output = output
    config.Pandas = True
    
    # Run search
    if not until:
        until = since
        
    df = pd.DataFrame()
    
    # Get start and end dates
    start = datetime.strptime(since, '%Y-%m-%d')
    end = datetime.strptime(until, '%Y-%m-%d')
    delta = end - start
    
    # Loop each dates
    for i in tqdm(range(1, delta.days + 2)):
        itr = start + timedelta(days=i)
        config.Until = itr.strftime('%Y-%m-%d')
        
        twint.run.Search(config)
        
        df = pd.concat([df, twint.storage.panda.Tweets_df], ignore_index=True)
        df = df.drop_duplicates(subset='id')
    
    return df

### Example Search

In [5]:
df = scrape_twitter(
    search="bitcoin",
    limit=500,
    since='2021-10-29',
    until='2021-10-30'
)

  0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
df

Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1454236589744443397,1454236589744443397,1.635552e+12,2021-10-29 19:59:56,-0500,,#Investing 109.5% in this optimal #crypto port...,en,"[investing, crypto, cash, bitcoin, risk]",[],...,,,,,,[],,,,
1,1454236579896172545,1454127372198977537,1.635552e+12,2021-10-29 19:59:53,-0500,,@coachjnietz Train them up in the way they sho...,en,[bitcoin],[],...,,,,,,"[{'screen_name': 'coachjnietz', 'name': 'Joel ...",,,,
2,1454236559381782532,1454236559381782532,1.635552e+12,2021-10-29 19:59:48,-0500,,Bitcoin price index https://t.co/o7UcHJUhC6 #...,en,"[usd, eur, cny, gbp, rub]",[],...,,,,,,[],,,,
3,1454236511247994880,1454236511247994880,1.635552e+12,2021-10-29 19:59:37,-0500,,MASS Adoption: - @hbb_kp Superbowl Ads - @cry...,en,[bitcoin],[],...,,,,,,[],,,,
4,1454236495863111683,1454236495863111683,1.635552e+12,2021-10-29 19:59:33,-0500,,"ASIC gives trading of Bitcoin, Ethereum ETFs g...",en,[],[],...,,,,,,[],,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
997,1454593278481997830,1454593278481997830,1.635637e+12,2021-10-30 19:37:17,-0500,,#25% Profit on #TVK – FREE Crypto Bot in 2021 ...,en,[tvk],[],...,,,,,,[],,,,
998,1454593273922719747,1454593273922719747,1.635637e+12,2021-10-30 19:37:16,-0500,,El pan sold out in 30 mins! Big thanks to @Hol...,en,[bitcoin],[],...,,,,,,[],,,,
999,1454593271762726914,1454580456184180742,1.635637e+12,2021-10-30 19:37:15,-0500,,@WhaleStats @chainlink @Bitcoin @ethereum @fan...,en,[],[],...,,,,,,"[{'screen_name': 'WhaleStats', 'name': 'WhaleS...",,,,
1000,1454593266561732614,1454591515477651456,1.635637e+12,2021-10-30 19:37:14,-0500,,@FEDB0Y @zerohedge smarty ? if you can just ...,en,[],[],...,,,,,,"[{'screen_name': 'FEDB0Y', 'name': 'hood rich ...",,,,
