# Working with Twitter data

This notebook can be downloaded via [https://edu.nl/83a7b](https://edu.nl/83a7b)


## Getting keys and tokens

1. If you don’t have a Twitter account yet, sign up at https://twitter.com/ 

2. If you do have a Twitter account, you should be able to access the Twitter developer portal at https://developer.twitter.com/. 

3. Create a new App. 
    - Under “Projects & Apps” > “Overview”, click on “Create App”. 
    - Enter a name for your app. Note that the name of this app must be unique. 
        
4. If you managed to create an App, you should be able to find your credentials in the project folder created for this new App, under “Keys and Tokens”. To work with the Twitter API, you need four values:  
    - API Key 
    - API Secret 
    - Access Token 
    - Access Secret
    
    
5. Note that these keys and tokens will only be shown once, immediately after you have created them. Make sure that you copy these values directly. It is useful toi make a Python file on your computer, named ‘credentials.py’ in which you store these values, as follows

    ```
    #Twitter API credentials
    consumer_key = 'xxx'
    consumer_secret = 'xxx'
    access_key = 'xxx
    access_secret = 'xxx'
    ```

If, for some reason, you are unable to find the keys and tokens again, you can always regerate these values in the Twitter Development Portal.   


## Twitter API V1.1

If you have obtained credentials (keys and tokens) using the steps that have been discussed above, this should give you access to the Twitter API V1.1. This basic version of the API is useful for getting started, and for testing certain solutions. Using the v1.1. API, you can download tweets, you can post new tweets, and you can request user information. There are a number of limitations, however. You can only download tweets that have been posted within the last 7 days, for example. 

There is currently a new W2 Twitter API. If you have access to the V1.1 API, you can [easily migrate to this new version](https://developer.twitter.com/en/docs/twitter-api/migrate/overview). The V2 API offers a number of [new possibilities](https://developer.twitter.com/en/docs/twitter-api/migrate/whats-new), but there are still some limitations with respect to the number of tweets you can download. If you still need more options, you can apply for a premium, and enterprise or an academic account at Twitter. 

## Tweepy

There are several Python packages that let you work with the Twitter API. Examples include `Twython` and `Tweepy`. This notebook explains the use of tweepy. The package can be installed first. 

In [None]:
!pip install -U tweepy

After a successful installation, you should be able to import the package. 

In [1]:
# import module
import tweepy

If you want to see an overview of all the functions that have been defined in the `tweepy` package, you can consult [the documentation](https://docs.tweepy.org/en/stable/client.html). 

## Authentication

The process of getting authenticated is can be somewhat challenging, but `Tweepy` fortunately makes this process much easier. In short, you need to instantiate an `OAuthHandler` object using your keys and tokens. 

If you have saved these values in a file named 'credentials.py', as was recommended above, you should be able to import these values using the code below.

In [None]:
from credentials import *

print( consumer_key )

With the relevant values available in your code, you can authenticate youself using the `OAuthHandler` object. 

In [None]:
# auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# auth.set_access_token(access_key, access_secret)
# api = tweepy.API(auth)

In [2]:
auth = tweepy.OAuth2BearerHandler("")
api = tweepy.API(auth)

The API() method in turn creates an `tweepy.api.API` object, which you can use to communicate with the Twitter API.

In [3]:
print(type(api))

<class 'tweepy.api.API'>


## Getting data about a specific user

In [4]:
user = api.get_user(screen_name = 'UniLeidenNews')
print(type(user))


<class 'tweepy.models.User'>


You can request values for a [wide range of attributes](https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/user)

In [5]:
print(user.screen_name)
print(user.followers_count)
print(user.description)
print(user.created_at)

UniLeidenNews
18986
{English, for Dutch: @UniLeiden}
Leiden University is the oldest university in The Netherlands, offering bachelor's, master's and PhD programmes.
2010-03-23 10:03:22+00:00


## Displaying tweets from your own timeline

Using the `home_timeline()` method, you can request the 20 most recent items from your own timeline. 

In [None]:
recent_tweets = api.home_timeline()
print(len(recent_tweets))

The tweets that are returned are of the type [`tweepy.models.Status`](https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet). You can convert the JSON data associated with these tweets into a Python dictionary using the `_json` property. 

In [None]:
for tweet in my_timeline:
    tweet_data = tweet._json
    print( tweet_data['user']['name'] , tweet_data['created_at'] )

The following properties are available for each of these tweets: 

```
"created_at" (<class 'str'>)
"id" (<class 'int'>)
"id_str" (<class 'str'>)
"full_text" (<class 'str'>)
"truncated" (<class 'bool'>)
"display_text_range" (<class 'list'>)
"entities" (<class 'dict'>)
"metadata" (<class 'dict'>)
"source" (<class 'str'>)
"in_reply_to_status_id" (<class 'NoneType'>)
"in_reply_to_status_id_str" (<class 'NoneType'>)
"in_reply_to_user_id" (<class 'NoneType'>)
"in_reply_to_user_id_str" (<class 'NoneType'>)
"in_reply_to_screen_name" (<class 'NoneType'>)
"user" (<class 'dict'>)
"geo" (<class 'NoneType'>)
"coordinates" (<class 'NoneType'>)
"place" (<class 'NoneType'>)
"contributors" (<class 'NoneType'>)
"retweeted_status" (<class 'dict'>)
"is_quote_status" (<class 'bool'>)
"retweet_count" (<class 'int'>)
"favorite_count" (<class 'int'>)
"favorited" (<class 'bool'>)
"retweeted" (<class 'bool'>)
"lang" (<class 'str'>)
```

As you can see, some of these items are dictionaries in themselves. "entities", for example, is a dictionary withe the following items. 

```
"hashtags" (<class 'list'>)
"symbols" (<class 'list'>)
"user_mentions" (<class 'list'>)
"urls" (<class 'list'>)
```

The same goes for "metadata". This dictionary has the following items. 

```
"iso_language_code" (<class 'str'>)
"result_type" (<class 'str'>)
```

It can be useful to create a separe function which specifies the propertioes to be shown for each of the tweets. This function selects specific options form the long list that was given above.  

In [None]:
def display_tweet(status):
    return_value = f'{status["id"]}'
    return_value += f'\t{status["text"]}'
    return_value += f'\t{status["created_at"]}'
    return_value += f'\t{status["user"]["screen_name"]}'
    return return_value + '\n'
    


This function can be used as follows:

In [None]:
for tweet in my_timeline:
    tweet_data = tweet._json
    print( display_tweet(tweet_data) )

## Getting tweets containing a search term

Using the `search_tweets()` method, you can search for tweets containing a search term, posted during the last 7 days. 

The Twitter API V1.1 has [a limitation of 3000 tweets per minute](https://developer.twitter.com/en/docs/twitter-api/rate-limits). It you exceed this number, the API will return a 429 error response. 

The method can be used with the following parameters:

- **q**: a search query string, 0f 500 characters maximum. 
- **lang**: an ISO 639-1 code of the langauge of the tweets.
- **result_type**: The Twitter V1.1 search service does not make all Tweets available. The API only returns a sample. This parameter specifies the nature of this sample. 'recent' returns the most result results, and 'popular' returns the most popular tweets. 
- **count**: The number of results to retrieve.

The API usually returns 200 tweets per call only. The code below also make use of a `Cursor`, which automatically requests a next sets if the number of tweets to be retrieved is higher than 200. 

In [None]:
search_term = '#Ukraine'
nr_tweets = 500

list_tweets = []

tweets = tweepy.Cursor(api.search_tweets , 
            q= search_term , lang="en" ).items(nr_tweets)

for tweet in tweets:
    list_tweets.append(tweet)
        
print(len(list_tweets))



The tweets that were retrieved can be displayed using the `display_tweet()` function that was defined earlier. 

In [None]:
for tweet in list_tweets:
    tweet_data = tweet._json
    print( display_tweet(tweet_data) )

## Getting tweets from a specific user


`user_timeline()` returns a collection of the most recent tweets posted by the user specified via the `screen_name` parameter. The method only returns the 3200 most recent tweets. The `count` parameters indicates the number of tweets to be downloaded. The `max_id` parameter can be used to specify that you only want tweets whose ID is lowed than the one you mentioned in this parameter.  

The code below tries to download as many tweets as possible. 

In [None]:
user_name = "UniLeidenNews"

# list to capture all the tweets
all_tweets = []

new_tweets = api.user_timeline(screen_name = user_name)

all_tweets.extend(new_tweets)

# Find the ID of the oldest tweet 
oldest = all_tweets[-1].id - 1

# Download more tweets, all older than 'oldest' in previous set

while len(new_tweets) > 0:
    print(f"Getting tweets before {oldest}")
    new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

    # Add these tweets to the list
    all_tweets.extend(new_tweets)

    #update the id of the oldest tweet less one
    oldest = all_tweets[-1].id - 1

    print(f"{len(all_tweets)} tweets downloaded so far ...")




The code below writes the data about these tweets to a CSV file.

In [60]:
import re

#write the csv
with open(f'{user_name}_tweets.csv', 'w') as f:

    f.write("id,created_at,retweets,likes,text\n")
    for tweet in all_tweets:
        tweet.text = re.sub( r'\n+' , ' ' , tweet.text )
        tweet.text = re.sub( r',' , '' , tweet.text )
        f.write( f'{tweet.id_str},{tweet.created_at},{tweet.retweet_count},{tweet.favorite_count},{tweet.text}\n' )
