In [None]:
%load_ext lab_black

# The mother of all APIs Twitter

## What is an API?

**API** stands for **Application Programming Interface**, it defines interactions between multiple software components.

An API simplifies programming by abstracting the underlying implementation by only exposing functions a developer might actually need. 

It can thus also hide informations from developers.
On one hand it can hide functions a outside developer shall have no access to, on the other hand it can hide multiple complicated functions inside one simple API call.

## Example: Getting to know the Twitter Web API

Twitter offers an API allowing developers to easily extract and push data from/to Twitter.

To get access you need to register as a developer at https://developer.twitter.com/ and apply for API acess.

_Note: As a safeguard you can find the data we will extract in `../data/twitter.p`._

You can look up the possible API commands at https://developer.twitter.com/en/docs/twitter-api

## Requesting Barack Obamas Twitter Profile:

You can retrieve basic information about Twitter Users using the following API endpoint: `https://api.twitter.com/2/users/by/username/<USERNAME>`

In [None]:
import pandas as pd

# import matplotlib.pyplot as plt
import requests

# the Twitter API endpoint
twitter = "https://api.twitter.com/2/"
# Include your API token into the HTTP Header
headers = {
    "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAGFXMwEAAAAAsIEoxSy%2B%2BkvkOY2q6%2Fi2KFPLm7Q%3DtZ5Ul8yDXkDOpVhLoR5Iw1AjP7Wr8sc3jEVhrzuvnZaCLB4tm8"
}
# Send a HTTP-GET Request to retrieve the user "BarackObama"
resp = requests.get(twitter + "users/by/username/BarackObama", headers=headers)
print(resp.json())

## Requesting information of more than one Twitter user

Besides the generic retrieval per user, the API also allows to pass queries for lists of users.
`https://api.twitter.com/2/users/by?usernames=<USER1>,<USER2>,<..>`

In [None]:
resp = requests.get(
    twitter + "users/by?usernames=BarackObama,elonmusk,katyperry", headers=headers
)
print(resp.json())

It also allows to retrieve more than the three default fields (`id, name, username`) by requesting a key value pair by adding `&key=value` at the end of the request.

Examples:

| key | value | returned fields |
| --- | --- | --- |
| `user.fields` | `created_at` | `user.created_at` |
| `expansions` | `pinned_tweet_id` | `tweet.id`, `tweet.text` |
| `tweet.fields` | `created_at` | `includes.users.created_at` |

Thus requesting `https://api.twitter.com/2/users/by?usernames=katyperry&user.fields=created_at&expansions=pinned_tweet_id` will additionally return the data Kate Perry's account has been created at and the id of his currently pinned Tweet.

In [None]:
resp = requests.get(
    twitter
    + "users/by?usernames=katyperry&user.fields=created_at&expansions=pinned_tweet_id",
    headers=headers,
)
print(resp.json())

## Exercise: Lets try to retvieve the content of the twitter messages for the last 7 days of the top 20 most followed Twitter users (excluding brands)

### 1. Find top 20 Twitter user

The list of top Twitter users is found here: https://en.wikipedia.org/wiki/List_of_most-followed_Twitter_accounts

In [None]:
from IPython.display import IFrame

IFrame(
    src="https://en.wikipedia.org/wiki/List_of_most-followed_Twitter_accounts",
    width="100%",
    height="500px",
)

In [None]:
most_followed_users = [
    "BarackObama",
    "justinbieber",
    "katyperry",
    "elonmusk",
    "rihanna",
    "Cristiano",
    "taylorswift13",
    "ladygaga",
    # "narendramodi",
    "TheEllenShow",
    "KimKardashian",
    "selenagomez",
    "jtimberlake",
    "BillGates",
    "neymarjr",
    "britneyspears",
    "ddlovato",
    "shakira",
    "KingJames",
    "jimmyfallon",
]
most_followed_users_str = ",".join(most_followed_users)

### 2. Query the most recent Tweets of a set of Twitter users


The "recent search endpoint" returns Tweets from the last seven days that match a search query. The command `https://api.twitter.com/2/recent?query=from:BarackObama` returns maximum 10 (default) tweets of Brack Obama of the last 7 days. 

For our example we add `tweet.fields=public_metrics,created_at` to retrieve additonal information and as for 50 tweets maximum by addinf `max_results=15` to he url. 

In [None]:
tweet_dict = {}
# retrieve one user after the other
for user in most_followed_users:
    resp = requests.get(
        twitter
        + f"tweets/search/recent?query=from:{user}&tweet.fields=public_metrics,created_at&max_results=15",
        headers=headers,
    )
    # extract the data
    data = resp.json()
    # Print status update
    if data.get("data") is not None:
        print(f"{user}: No. of tweets: {len(data.get('data'))}")
    else:
        print(f"{user}: No. of tweets: 0")
    tweet_dict[user] = data

In [None]:
# # UNCOMMENT TO SAVE DATA TO DISK
# import pickle

# fname = "../data/twitter.p"
# pickle.dump(tweet_dict, open(fname, "wb"))

In [None]:
# # UNCOMMENT TO LOAD DATA FROM DISK
# import pickle

# fname = "../data/twitter.p"
# tweet_dict = pickle.load(open(fname, "rb"))

### 3. Extract text from the tweets

In [None]:
# load default libraries
import numpy as np
import pandas as pd

In [None]:
tweet_dict["BarackObama"].get("data")

In [None]:
tweet_dict["BarackObama"].get("data")[0]

In [None]:
tweet_dict["BarackObama"].get("data")[0]["text"]

#### Extact text of all tweets

In [None]:
text = []
for user in tweet_dict.keys():
    data = tweet_dict[user].get("data")
    if data is not None:
        for tweet in data:
            text.append(tweet["text"])
text = "".join(text)
text

### 4. Visualise the text in form of a Wordcloud

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt

%matplotlib inline

#### Preprocessing - Add stopwords

In [None]:
from wordcloud import STOPWORDS

fn = "../data/twitter_stopwords.txt"
with open(fn, "r") as f:
    twitter_stopwords = f.readlines()
twitter_stopwords = [x.replace("\n", "") for x in twitter_stopwords]
STOPWORDS = list(STOPWORDS) + twitter_stopwords + ["MannKiBaat", "KimKardashian", "RT"]
# STOPWORDS

#### Generate Wordcloud

In [None]:
# Create and generate a word cloud image:
wordcloud = WordCloud(stopwords=STOPWORDS).generate(text)

# Display the generated image:
fig, ax = plt.subplots(figsize=(12, 6))
ax.imshow(wordcloud, interpolation="bilinear")
ax.axis("off")