# tweepy

The `tweepy` library for `Python` has a really extensive [documentation](http://docs.tweepy.org/en/latest/) that you should check out if you want to use it.

The brief examples in this notebook are based on blog posts by [Suhem Parack](https://dev.to/twitterdev/a-comprehensive-guide-for-using-the-twitter-api-v2-using-tweepy-in-python-15d9) from the *Twitter* developer team and [Jan Kirenz](https://www.kirenz.com/post/2021-12-10-twitter-api-v2-tweepy-and-pandas-in-python/twitter-api-v2-tweepy-and-pandas-in-python/) from HdM Stuttgart.

## Import libraries

In addition to `tweepy`, we need to import [`pandas`](https://pandas.pydata.org/) for some (light) data wrangling.

In [None]:
import tweepy
import pandas as pd

## Authentication

Before we can collect data via the Twitter v2 API, we need to set up our credentials.

**NB**: You should treat all information relating to your API access like a password and never share it or post it publicly anywhere. Although nobody except you should be able to access your personal instance of this notebook, if you want to be extra cautious, you can delete your API access information from the following cell after running it once (and save the notebook again after that).

In [None]:
client = tweepy.Client(bearer_token='REPLACE_ME')

## Collecting tweets from specific users

The file [twitter_accounts.csv](./data/twitter_accounts.csv) in the `data` folder of this repository contains the Twitter screen names of [*GESIS - Leibniz Institute for the Social Sciences*](https://twitter.com/gesis_org), [*GESIS Training*](https://twitter.com/gesistraining/) and the [*Social Data Science Lab*](https://twitter.com/socdatalab) at *Cardiff University* which we will use in the following examples.

In [None]:
accounts = pd.read_csv('data/twitter_accounts.csv')
accounts = accounts['Screen_Name'].tolist()
accounts

For some functions we need the user ID (instead of the screen name). We can get that with the `get_user()` function.

In [None]:
users = []

for i in accounts:
    user = client.get_user(username=i)
    users.append(user.data)

In [None]:
user_df = pd.DataFrame(users)

In [None]:
user_df

In the following example, we collect all tweets from the *GESIS* account that are not retweets from January 1st to June 22nd, 2022.

In [None]:
gesis_tweet_list = []

query = 'from:gesis_org -is:retweet'

start_time = '2022-01-01T00:00:00Z'

end_time = '2022-06-22T00:00:00Z'

for tweet in tweepy.Paginator(client.search_all_tweets, query=query,
                                tweet_fields=['author_id','created_at','text','source','lang'], 
                                start_time=start_time,
                                end_time=end_time,
                                max_results=100).flatten(limit=1000):
    tweet_list.append(tweet.data)

We can then turn the result into a [`pandas` dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) (as we already did for the user information before).

In [None]:
gesis_tweets_df = pd.DataFrame(tweet_list)

In [None]:
gesis_tweets_df

## Saving the results

If we want to, we can save the results as a `.csv` file.

In [None]:
gesis_tweets_df.to_csv("./data/gesis_tweets_tweepy.csv")