[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/luisar/tweets-retrieval-tweepy/blob/main/get_tweets.ipynb)

# Get Tweets

This script gets all tweets of whichever hashtag is given to it. It only retrieves the tweets from the current day (today) and yesterday. It is then saved into a .csv file.

This is done using the `tweepy` library. The `twitter_config.py` file is not included but should be created in the same directory of this file, in order to access the Twitter API.

In order to access it via Google Colab, you need to define a variable with the path location and then call `sys.path.append(os.path.abspath(fileLocaltion)`.

In [14]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
!pip install tweepy

In [30]:
import tweepy
import os, sys, datetime
import pandas as pd

In [11]:
py_file_location = "path/to/colab"
sys.path.append(os.path.abspath(py_file_location))

In [15]:
from twitter_config import *

# Setup a Connection

In [17]:
auth = tweepy.OAuthHandler(TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET)
auth.set_access_token(TWITTER_ACCESS_TOKEN, TWITTER_ACCESS_TOKEN_SECRET)

api = tweepy.API(auth, wait_on_rate_limit=True)

# Create Variables + Search Tweets

Tweets are searched with the the [Cursor()](https://docs.tweepy.org/en/v3.8.0/cursor_tutorial.html) function.

Basically the api.search param is passed to Cursor() to do its magic. From the docs, these are some of the available params:

* from: Get a specific user profile
* since: From where should the search start (date)
* until: Until when should the search go (date)

Alongside this, other params can be used like `language` (self explanatory) and `tweet_mode = extended` which means it'll return the whole tweet instead of just the first 140 chars.

You can also do `api.user_timeline` instead of `api.search` to get a particular timeline.

You can also be specific about the location by doing (taken from [Twitter Docs](https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location):

* place - the place name or the place ID
* place_country - the country code. See here to see the country code
* point_radius - the circular geographic area within which to search for
* bounding_box - the 4 sided geographic area, within which to search for

Like:

```
place = 'Mexico'
tweets_list = tweepy.Cursor(api.search, q="place: " + place,tweet_mode='extended', lang='es').items()
```

In [44]:
today = datetime.date.today()
yesterday= today - datetime.timedelta(days = 1)
hashtag = "#covid"
userHandle = "finkd"

In [45]:
tweets_list = tweepy.Cursor(api.search,
                            q = hashtag,
                            since = str(yesterday),
                            until = str(today),
                            tweet_mode='extended',
                            lang='es').items()
                            #id = "<user_id>"
                            #verify = False

#tweets_list = tweepy.Cursor(api.search, q="from: " + userHandle, tweet_mode = 'extended', lang = 'en').items()

# Scanning the Tweets

Simply do a `for in` loop over the list of tweets and, out of each, extract the text, date, number of retweets, and favorite count.

Everything is stored in a new list called `to_write`.

In [None]:
to_write = []

for tweet in tweets_list:
    text = tweet._json["full_text"]
    print(text)
    favourite_count = tweet.favorite_count
    retweet_count = tweet.retweet_count
    created_at = tweet.created_at
    
    line = {'text' : text, 'favourite_count' : favourite_count, 'retweet_count' : retweet_count, 'created_at' : created_at}
    to_write.append(line)

# Pandas and Save

Finally convert the list to a Pandas DataFrame and save the output.

In [47]:
df = pd.DataFrame(to_write)

In [48]:
df.head(10)

Unnamed: 0,text,favourite_count,retweet_count,created_at
0,RT @sandralopezleon: 🦠 #LongCovid \nTuviste #C...,0,737,2022-03-26 23:58:28
1,RT @sandralopezleon: 🦠 #LongCovid \nTuviste #C...,0,737,2022-03-26 23:58:02
2,Exposición de la Biblia 🌪️🔥 Gratis\nhttps://t....,0,0,2022-03-26 23:57:46
3,Exposición de la Biblia 🌪️🔥 Gratis\nhttps://t....,0,0,2022-03-26 23:56:52
4,RT @sandralopezleon: 🦠 #LongCovid \nTuviste #C...,0,737,2022-03-26 23:56:50
5,RT @sandralopezleon: 🦠 #LongCovid \nTuviste #C...,0,737,2022-03-26 23:52:39
6,RT @IAIM_VE: #LaPrevenciónEsLaClave || Conoce ...,0,13,2022-03-26 23:52:15
7,RT @sandralopezleon: 🦠 #LongCovid \nTuviste #C...,0,737,2022-03-26 23:50:47
8,RT @sandralopezleon: 🦠 #LongCovid \nTuviste #C...,0,737,2022-03-26 23:48:18
9,RT @ElFinanciero_Mx: Algunas mujeres han tenid...,0,3,2022-03-26 23:48:00


In [49]:
df.to_csv(py_file_location + 'tweets.csv', mode='a', header=False)