# tweepy

The `tweepy` library has a really good [documentation](http://docs.tweepy.org/en/latest/) that you should check out if you want to use it. 

## Import libraries

In [None]:
import tweepy
import pandas as pd

## Authentication

In order to use `tweepy` you need to have registered a Twitter application. Once you have done so, you can find the necessary information in the *Keys & Tokens* menu for your app. If you need some guidance/reminder on where to find that information, you can have a look at the documentation of the `rwteet` package which has a [section on this topic](https://rtweet.info/articles/auth.html). Keep in mind that the layout of the Twitter developer pages might change. **NB**: You should treat all information relating to your API key like a password and never share it or post it publicly anywhere.

For the purpose of this notebook, we will store the keys in a separate file. To do this, simply open the file [config_twitter.py](./config_twitter.py), enter the information for your app, and save the file by pressing CTRL + S (Windows) or CMD + S (MacOS). After that you can run the following line to import the file and authenticate.

In [None]:
import config_twitter

auth = tweepy.OAuthHandler(config_twitter.API_KEY, config_twitter.API_KEY_SECRET)
auth.set_access_token(config_twitter.ACCESS_TOKEN, config_twitter.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)

The arguments `wait_on_rate_limit=True` and `wait_on_rate_limit_notify=True` enable `Tweepy` to make sure that we stay within the Twitter API rate limits.

## The `REST API`

The `data` folder in this directory contains a `csv` file with a few Twitter screen names that we will use in the following examples.

In [None]:
accounts = pd.read_csv('data/twitter_accounts.csv')
accounts = accounts['Screen Name'].tolist()
accounts

### Account information

Before we collect information about the accounts we may first want to retrieve the user IDs. The reason for this is that while screen names can change, user IDs remain the same.

In [None]:
ids = [api.get_user(i) for i in accounts]

Now we can use the user IDs to gather the account information. We will store the results as a `pandas` dataframe.

In [None]:
account_info = [[i.name, i.screen_name, i.id, i.description, i.location, i.followers_count, i.friends_count, i.protected] for i in ids]
account_info = pd.DataFrame(account_info, columns = ['Name', 'Handle', 'Twitter ID Number', 'Description', 'Location', 'Number of Followers', 'Number of Friends', 'Protected'])
account_info

If you want to, you can store the result as a `CSV` file by executing the following cell.

In [None]:
account_info.to_csv('output/twitter_accounts_workshop_orgs.csv', index = False)

### Historical tweets & metadata

The following cell defines the function `get_tweet_data` that takes a screen name and requests the tweets from the `user_timeline` via the Twitter REST API. It also collects some of the metadata for each tweet.

In [None]:
def get_tweet_data(user, user_meta=False):
    tweets = []
    
    for tw in tweepy.Cursor(api.user_timeline, screen_name=user, exclude_replies=False, count = 200, tweet_mode = 'extended').items():
        tdict = {}
        
        tdict['text'] = tw.full_text.replace('\n', '').strip() # remove newline tags + leading and trailing whitespace    
        tdict['tweet_id'] = tw.id
        tdict['retweet_count'] = tw.retweet_count
        tdict['fav_count'] = tw.favorite_count
        tdict['user_id'] = tw.user.id        
        tdict['user_screen_name'] = tw.user.screen_name
        tdict['time'] = tw.created_at
        tdict['hashtags'] = [hashtag['text'] for hashtag in tw.entities['hashtags']]
        tdict['user_mentions'] = [user['screen_name'] for user in tw.entities['user_mentions']]
        
        if user_meta is True:
            tdict['location'] = tw.user.location
            tdict['user_description'] = tw.user.description
            tdict['user_url'] = tw.user.url 
        else:
            pass
        
        tweets.append(tdict)
    
    return tweets

We can now use this function to collect historical tweets (+ some metadata) for the accounts from our list. Note that running the next cell might take a few seconds.

In [None]:
account_tweets = [get_tweet_data(i) for i in accounts]

account_tweets = [y for x in account_tweets for y in x]

We can now, again, convert the result to a `pandas` dataframe.

In [None]:
account_tweets_df = pd.DataFrame(account_tweets)
account_tweets_df.head()