# 1. What is an API?

**API** stands for **Application Programming Interface**, it defines interactions between multiple software components.

An API simplifies programming by abstracting the underlying implementation by only exposing functions a developer might actually need. 

It can thus also hide informations from developers.
On one hand it can hide functions a outside developer shall have no access to, on the other hand it can hide multiple complicated functions inside one simple API call.

#### *There are multiple kinds of APIs*

- **Libraries**:
    - Libraries we already got to know are basically APIs, since they abstract and simplify the underlying implementation making it really simple to use them.
        - Examples: `pandas`, `matplotlib`, `sklearn`
        - E.g. You can find the `sklearn` API Reference at: https://scikit-learn.org/stable/modules/classes.html
- **Interface between programs**:
    - An API might also act as a middleman between two programs.
    - E.g. you want to use a library in Python which was implemented for R. An API might translate the Python commands into R to make the library run.
- **In Operating Systems**
    - Abstract the underlying implementation of the Operating System to make sure the outcome is generic.
        - E.g. POSIX is an API standard utilized (mostly) by UNIX systems such as Linux, BSD, macOS
        - It makes sure POSIX-conform API calls work the same way on each Operating System
        - E.g. the `echo` command shall work the same way independent of the Operating System you use
    - Backwards Compatibility:
        - E.g. translating old system calls into those utilized in newer OS versions.
        - Examples: Win32API (allowing old Windows Programs to be run in `Compatibility Mode` under new versions), being able to play original XBOX/Playstation games on the current Console iteration
- **Web API**
    - Allow to access functions through Web Protocols, such as HTTP (Hypertext Transfer Protocol)
    - API defines endpoints (e.g. `api.twitter.com/2/users/by/username/<USERNAME>`) to fetch or push data, or even trigger code execution
    - Responses are usually defined in `JSON` (JavaScript Object Notation) or `XML` (Extensible Markup Language) (somewhat similar to Python dictionaries)

# 2. Example: Getting to know the Twitter Web API

Twitter offers an API allowing developers to easily extract and push data from/to Twitter.

To get access you need to register as a developer at https://developer.twitter.com/ and apply for API acess.

_Note: Not needed for this lecture as we only want to show an example of an API in the real world. You can find the data we will extract in `../data/twitter.p`._

_Although we absolutely recommend to register for any API access in the wild to play around. Good alternatives to Twitter would be any major site that you are well familiar with. E.g. Google, YouTube, Spotify, Facebook_

You can look up the possible API commands at https://developer.twitter.com/en/docs/twitter-api

### Requesting Barack Obamas Twitter Profile:

You can retrieve basic information about Twitter Users using the following API endpoint: `https://api.twitter.com/2/users/by/username/<USERNAME>`

>```python
>import pandas as pd
>import matplotlib.pyplot as plt
>import requests
>
># the Twitter API endpoint
>twitter = "https://api.twitter.com/2/"
># Include your API token into the HTTP Header
>headers = {"Authorization": "Bearer <YOUR API TOKEN>"}
># Send a HTTP-GET Request to retrieve the user "BarackObama"
>resp = requests.get(twitter + "users/by/username/BarackObama", headers=headers)
>print(resp.json())
>```

#### Output
>```python
>{'data': {'id': '813286', 'name': 'Barack Obama', 'username': 'BarackObama'}}
```

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import requests

# the Twitter API endpoint
twitter = "https://api.twitter.com/2/"
# Include your API token into the HTTP Header
headers = {"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAGFXMwEAAAAAsIEoxSy%2B%2BkvkOY2q6%2Fi2KFPLm7Q%3DtZ5Ul8yDXkDOpVhLoR5Iw1AjP7Wr8sc3jEVhrzuvnZaCLB4tm8"}
# Send a HTTP-GET Request to retrieve the user "BarackObama"
resp = requests.get(twitter + "users/by/username/BarackObama", headers=headers)
print(resp.json())

{'data': {'id': '813286', 'name': 'Barack Obama', 'username': 'BarackObama'}}


### Requesting the Top Ten most followed Twitter users

Besides the generic retrieval per user, the API also allows to pass queries for lists of users.
`https://api.twitter.com/2/users/by?usernames=<USER1>,<USER2>,<..>`

In [2]:
resp = requests.get(twitter + "users/by?usernames=BarackObama,justinbieber,katyperry", headers=headers)
print(resp.json())

{'data': [{'id': '813286', 'name': 'Barack Obama', 'username': 'BarackObama'}, {'id': '27260086', 'name': 'Justin Bieber', 'username': 'justinbieber'}, {'id': '21447363', 'name': 'KATY PERRY', 'username': 'katyperry'}]}


It also allows to retrieve more than the three default fields (`id, name, username`) by requesting a key value pair by adding `&key=value` at the end of the request.

Examples:

| key | value | returned fields |
| --- | --- | --- |
| `user.fields` | `created_at` | `user.created_at` |
| `expansions` | `pinned_tweet_id` | `tweet.id`, `tweet.text` |
| `tweet.fields` | `created_at` | `includes.users.created_at` |

Thus requesting `https://api.twitter.com/2/users/by?usernames=BarackObama&user.fields=created_at&expansions=pinned_tweet_id` will additionally return the data Obamas Account has been created at and the id of his currently pinned Tweet.

In [3]:
resp = requests.get(twitter + "users/by?usernames=BarackObama,justinbieber,katyperry&user.fields=created_at&expansions=pinned_tweet_id", headers=headers)
print(resp.json())

{'data': [{'username': 'BarackObama', 'created_at': '2007-03-05T22:08:25.000Z', 'name': 'Barack Obama', 'id': '813286'}, {'username': 'justinbieber', 'created_at': '2009-03-28T16:41:22.000Z', 'name': 'Justin Bieber', 'pinned_tweet_id': '1344890442576490496', 'id': '27260086'}, {'username': 'katyperry', 'created_at': '2009-02-20T23:45:56.000Z', 'name': 'KATY PERRY', 'pinned_tweet_id': '1299195688719249409', 'id': '21447363'}], 'includes': {'tweets': [{'id': '1344890442576490496', 'text': '#ANYONE out now\nhttps://t.co/Uh4E6vUzZv https://t.co/UjiStYfwrz'}, {'id': '1299195688719249409', 'text': 'IT’S HERE! IT’S REALLY HERE! 🙃 I finally got back my smile! Hope this record puts one on your face 🙂 #SMILE 🙂 IS OUT EVERYWHERE NOW! LOVE YOU GUYS SO MUCH ENJOY 🤡♥️ (sent from my hospital bed lol) https://t.co/BImXF3kEcw https://t.co/2UmVajDoyn'}]}}


### Lets try it out for the top 10 most followed Twitter users:

>```python
    most_followed_users = [
                "BarackObama",
                "justinbieber",
                "katyperry",
                "rihanna",
                "Cristiano",
                "taylorswift13",
                "ladygaga",
                "ArianaGrande",
                "TheEllenShow",
                "YouTube"
                ]
    most_followed_users_str = ",".join(most_followed_users)
    resp = requests.get(twitter + f"users/by?usernames={most_followed_users_string}", headers=headers)
>```

In [4]:
most_followed_users = [
            "BarackObama",
            "justinbieber",
            "katyperry",
            "rihanna",
            "Cristiano",
            "taylorswift13",
            "ladygaga",
            "ArianaGrande",
            "TheEllenShow",
            "YouTube"
            ]
most_followed_users_str = ",".join(most_followed_users)
resp = requests.get(twitter + f"users/by?usernames={most_followed_users_str}&user.fields=created_at&expansions=pinned_tweet_id", headers=headers)
resp.json()

{'data': [{'username': 'BarackObama',
   'id': '813286',
   'name': 'Barack Obama',
   'created_at': '2007-03-05T22:08:25.000Z'},
  {'username': 'justinbieber',
   'id': '27260086',
   'name': 'Justin Bieber',
   'pinned_tweet_id': '1344890442576490496',
   'created_at': '2009-03-28T16:41:22.000Z'},
  {'username': 'katyperry',
   'id': '21447363',
   'name': 'KATY PERRY',
   'pinned_tweet_id': '1299195688719249409',
   'created_at': '2009-02-20T23:45:56.000Z'},
  {'username': 'rihanna',
   'id': '79293791',
   'name': 'Rihanna',
   'created_at': '2009-10-02T21:37:33.000Z'},
  {'username': 'Cristiano',
   'id': '155659213',
   'name': 'Cristiano Ronaldo',
   'created_at': '2010-06-14T19:09:20.000Z'},
  {'username': 'taylorswift13',
   'id': '17919972',
   'name': 'Taylor Swift',
   'pinned_tweet_id': '1360093736932429828',
   'created_at': '2008-12-06T10:10:54.000Z'},
  {'username': 'ladygaga',
   'id': '14230524',
   'name': 'Lady Gaga',
   'pinned_tweet_id': '1266218931028549632',
   

#### Output
>```python
{'data': [
      {'id': '813286',
       'username': 'BarackObama',
       'name': 'Barack Obama',
       'created_at': '2007-03-05T22:08:25.000Z'}
       ,
      {'pinned_tweet_id': '1344890442576490496',
       'id': '27260086',
       'username': 'justinbieber',
       'name': 'Justin Bieber',
       'created_at': '2009-03-28T16:41:22.000Z'}
       ,
      {'pinned_tweet_id': '1299195688719249409',
       'id': '21447363',
       'username': 'katyperry',
       'name': 'KATY PERRY',
       'created_at': '2009-02-20T23:45:56.000Z'}
       ,
      {'id': '79293791',
       'username': 'rihanna',
       'name': 'Rihanna',
       'created_at': '2009-10-02T21:37:33.000Z'}
       ,
      {'id': '155659213',
       'username': 'Cristiano',
       'name': 'Cristiano Ronaldo',
       'created_at': '2010-06-14T19:09:20.000Z'}
       ,
      {'pinned_tweet_id': '1360093736932429828',
       'id': '17919972',
       'username': 'taylorswift13',
       'name': 'Taylor Swift',
       'created_at': '2008-12-06T10:10:54.000Z'}
       ,
      {'pinned_tweet_id': '1266218931028549632',
       'id': '14230524',
       'username': 'ladygaga',
       'name': 'Lady Gaga',
       'created_at': '2008-03-26T22:37:48.000Z'}
       ,
      {'pinned_tweet_id': '1322025587368742912',
       'id': '34507480',
       'username': 'ArianaGrande',
       'name': 'Ariana Grande',
       'created_at': '2009-04-23T02:56:31.000Z'}
       ,
      {'pinned_tweet_id': '1260696876149506048',
       'id': '15846407',
       'username': 'TheEllenShow',
       'name': 'Ellen DeGeneres',
       'created_at': '2008-08-14T03:50:42.000Z'}
       ,
      {'id': '10228272',
       'username': 'YouTube',
       'name': 'YouTube',
       'created_at': '2007-11-13T21:43:46.000Z'}]
       ,
     'includes': {'tweets': [
       {'id': '1344890442576490496',
        'text': '#ANYONE out now\nhttps://t.co/Uh4E6vUzZv https://t.co/UjiStYfwrz'},
       {'id': '1299195688719249409',
        'text': 'IT’S HERE! IT’S REALLY HERE! 🙃 I finally got back my smile! Hope this record puts one on your face 🙂 #SMILE 🙂 IS OUT EVERYWHERE NOW! LOVE YOU GUYS SO MUCH ENJOY 🤡♥️ (sent from my hospital bed lol) https://t.co/BImXF3kEcw https://t.co/2UmVajDoyn'},
       {'id': '1360093736932429828',
        'text': 'My new version of Love Story (Taylor’s Version) is out now  💛💛 Get it instantly when you pre-order Fearless (Taylor’s Version)  https://t.co/NqBDS6cGFl https://t.co/KdHdZXnWbP'},
       {'id': '1266218931028549632',
        'text': 'Now dance motherf💕ckers!!!!!!! #Chromatica https://t.co/GjJUC3PRWz'},
       {'id': '1322025587368742912',
        'text': '🤍 positions (the album) is out now 🤍 https://t.co/FpkiHYLFqt https://t.co/J33o6KMTmo'}]},
     'errors': [
       {'detail': 'Could not find tweet with pinned_tweet_id: [1260696876149506048].',
       'title': 'Not Found Error',
       'resource_type': 'tweet',
       'parameter': 'pinned_tweet_id',
       'value': '1260696876149506048',
       'type': 'https://api.twitter.com/2/problems/resource-not-found'}]}
```

In [5]:
a = 1

In [6]:
f"Hello {a}"

'Hello 1'

# 3. Analyzing the data

We extracted for you the user- and most recent tweet- (last 7 days) data. For both we only used the `created_at` and `public_metrics` queries. 

>```python
tweet_dict = {}
for user in most_followed_users:
    resp = requests.get(twitter + f"tweets/search/recent?query=from:{user}&tweet.fields=public_metrics,created_at", headers=headers, proxies=proxies)
    tweet_dict[user] = resp.json()
>```

>```python
user_dict = {}
for user in most_followed_users:
    resp = requests.get(twitter + f"users/by?usernames={user}&user.fields=public_metrics,created_at", headers=headers, proxies=proxies)
    user_dict[user] = resp.json()
>```

>```python
># this is the data made available to you
twitter_data = {
    'user_info': user_dict,
    'tweets': tweet_dict
}
>```

Lets see how much we can find out about their Twitter behavior just from requesting these two keys through the API!

In [7]:
tweet_dict = {}
for user in most_followed_users:
    resp = requests.get(twitter + f"tweets/search/recent?query=from:{user}&tweet.fields=public_metrics,created_at", headers=headers)
    tweet_dict[user] = resp.json()

user_dict = {}
for user in most_followed_users:
    resp = requests.get(twitter + f"users/by?usernames={user}&user.fields=public_metrics,created_at", headers=headers)
    user_dict[user] = resp.json()

# this is the data made available to you
twitter_data = {
    'user_info': user_dict,
    'tweets': tweet_dict
}

In [8]:
# load default libraries
import numpy as np
import pandas as pd
import pickle

# load the data
twitter_data = pickle.load( open( "../data/twitter.p", "rb" ) )

## 3.1 Inspect the dictionary

In [10]:
twitter_data['user_info'].keys()

dict_keys(['BarackObama', 'justinbieber', 'katyperry', 'rihanna', 'Cristiano', 'taylorswift13', 'ladygaga', 'ArianaGrande', 'TheEllenShow', 'YouTube'])

In [11]:
twitter_data['user_info']['BarackObama'].keys()
#twitter_data['user_info']['BarackObama']['data'][0]

dict_keys(['data'])

The dictionary consists of two parts:
- `user_info` containing the metadata (num followers, etc.) of each of the top ten users
- `tweets` containing tweets of the corresponding users and its metadata (num likes, etc.)

Each subdictionary contains for each user key a list of dictionaries hidden behind the `data` key

## 3.2 Building a DataFrame

To better access individual values we want to turn the dictionary into a DataFrame.

In [12]:
twitter_data['user_info'].keys()

dict_keys(['BarackObama', 'justinbieber', 'katyperry', 'rihanna', 'Cristiano', 'taylorswift13', 'ladygaga', 'ArianaGrande', 'TheEllenShow', 'YouTube'])

In [13]:
df = pd.DataFrame()
for user in twitter_data['user_info'].keys():
    # user data
    data = twitter_data['user_info'][user]['data'][0]
    # public metrics is a dictionary inside user data
    # with dict.pop(key) we can remove a key from a dictionary
    # with dict.update(other_dict) we can merge two dictionaries together
    data.update(data.pop('public_metrics'))
    # when adding a row from a dictionary into a pandas DataFrame you need to pass 'ignore_index=True'
    # Note: pd.DataFrame.append() doesn't work inplace like the list.append()! 
    # Thus we need to overwrite our DataFrame with the appended one
    df = df.append(data, ignore_index = True)    
                
df

Unnamed: 0,created_at,followers_count,following_count,id,listed_count,name,tweet_count,username
0,2007-03-05T22:08:25.000Z,129471046.0,594725.0,813286,223136.0,Barack Obama,16101.0,BarackObama
1,2009-03-28T16:41:22.000Z,113972402.0,291827.0,27260086,545458.0,Justin Bieber,31253.0,justinbieber
2,2009-02-20T23:45:56.000Z,109476487.0,233.0,21447363,129705.0,KATY PERRY,11074.0,katyperry
3,2009-10-02T21:37:33.000Z,101953106.0,1019.0,79293791,93979.0,Rihanna,10546.0,rihanna
4,2010-06-14T19:09:20.000Z,91167759.0,56.0,155659213,83774.0,Cristiano Ronaldo,3651.0,Cristiano
5,2008-12-06T10:10:54.000Z,88411875.0,0.0,17919972,113035.0,Taylor Swift,608.0,taylorswift13
6,2008-03-26T22:37:48.000Z,83932413.0,120512.0,14230524,205857.0,Lady Gaga,9441.0,ladygaga
7,2009-04-23T02:56:31.000Z,82048983.0,57010.0,34507480,55024.0,Ariana Grande,46856.0,ArianaGrande
8,2008-08-14T03:50:42.000Z,79243895.0,26893.0,15846407,97845.0,Ellen DeGeneres,22541.0,TheEllenShow
9,2007-11-13T21:43:46.000Z,72961127.0,1200.0,10228272,78925.0,YouTube,31287.0,YouTube


In [14]:
# Set the index to a meaningful unique identifier
df = df.set_index('username')

In [15]:
df

Unnamed: 0_level_0,created_at,followers_count,following_count,id,listed_count,name,tweet_count
username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BarackObama,2007-03-05T22:08:25.000Z,129471046.0,594725.0,813286,223136.0,Barack Obama,16101.0
justinbieber,2009-03-28T16:41:22.000Z,113972402.0,291827.0,27260086,545458.0,Justin Bieber,31253.0
katyperry,2009-02-20T23:45:56.000Z,109476487.0,233.0,21447363,129705.0,KATY PERRY,11074.0
rihanna,2009-10-02T21:37:33.000Z,101953106.0,1019.0,79293791,93979.0,Rihanna,10546.0
Cristiano,2010-06-14T19:09:20.000Z,91167759.0,56.0,155659213,83774.0,Cristiano Ronaldo,3651.0
taylorswift13,2008-12-06T10:10:54.000Z,88411875.0,0.0,17919972,113035.0,Taylor Swift,608.0
ladygaga,2008-03-26T22:37:48.000Z,83932413.0,120512.0,14230524,205857.0,Lady Gaga,9441.0
ArianaGrande,2009-04-23T02:56:31.000Z,82048983.0,57010.0,34507480,55024.0,Ariana Grande,46856.0
TheEllenShow,2008-08-14T03:50:42.000Z,79243895.0,26893.0,15846407,97845.0,Ellen DeGeneres,22541.0
YouTube,2007-11-13T21:43:46.000Z,72961127.0,1200.0,10228272,78925.0,YouTube,31287.0


## 3.3 Analyze Tweets

We got the user metadata into a DataFrame, but we're still left with the tweets part of the dictionary untouched.
Lets do something with it ;-)

The tweet subdictionary contains all(*) Tweets a user has posted in the last 7 days (Fetched at 18.02.2021).

(*) _Note: Twitter only allows to fetch 100 Tweets at once. Which is why the number of Tweets the "YouTube" Account has posted is incorrect_

### 3.3.1. Number of Tweets posted

> **Task:** Insert the number of Tweets posted by each user in the last 7 days into our DataFrame.

In [None]:
## your code here 

In [17]:
# %load ../src/_solutions/number_of_tweets.py
# Build a dictionary containing ("User": "number_of_tweets") pairs
n_recent_tweets = {}

for user in twitter_data['tweets'].keys():
    if 'data' in twitter_data['tweets'][user].keys():
        n_recent_tweets[user] = len(twitter_data['tweets'][user]['data'])
    else:
        # Lady Gaga seems to be inactive for at least the last 7 days
        n_recent_tweets[user] = 0

# our dictionary makes it simple to put our fresh extracted data into our DataFrame
# Since the DataFrame Index and the dictionary keys fit
df['n_recent_tweets'] = pd.Series(n_recent_tweets)
df

Unnamed: 0_level_0,created_at,followers_count,following_count,id,listed_count,name,tweet_count,n_recent_tweets
username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
BarackObama,2007-03-05T22:08:25.000Z,129471046.0,594725.0,813286,223136.0,Barack Obama,16101.0,4
justinbieber,2009-03-28T16:41:22.000Z,113972402.0,291827.0,27260086,545458.0,Justin Bieber,31253.0,10
katyperry,2009-02-20T23:45:56.000Z,109476487.0,233.0,21447363,129705.0,KATY PERRY,11074.0,34
rihanna,2009-10-02T21:37:33.000Z,101953106.0,1019.0,79293791,93979.0,Rihanna,10546.0,6
Cristiano,2010-06-14T19:09:20.000Z,91167759.0,56.0,155659213,83774.0,Cristiano Ronaldo,3651.0,3
taylorswift13,2008-12-06T10:10:54.000Z,88411875.0,0.0,17919972,113035.0,Taylor Swift,608.0,5
ladygaga,2008-03-26T22:37:48.000Z,83932413.0,120512.0,14230524,205857.0,Lady Gaga,9441.0,0
ArianaGrande,2009-04-23T02:56:31.000Z,82048983.0,57010.0,34507480,55024.0,Ariana Grande,46856.0,24
TheEllenShow,2008-08-14T03:50:42.000Z,79243895.0,26893.0,15846407,97845.0,Ellen DeGeneres,22541.0,25
YouTube,2007-11-13T21:43:46.000Z,72961127.0,1200.0,10228272,78925.0,YouTube,31287.0,100


### 3.3.2 Most likes on a tweet
> **Task:** Insert the number of most liked Tweets each user in the last 7 days into our DataFrame.

In [None]:
## your code here 

In [19]:
# %load ../src/_solutions/max_number_of_tweets.py
# Build a dictionary containing ("User": "maximum likes on tweets") pairs
most_likes = {}

for user in twitter_data['tweets'].keys():
    maxx = 0
    if 'data' in twitter_data['tweets'][user]:
        for i in range(len(twitter_data['tweets'][user]['data'])):
            maxx = max(maxx, twitter_data['tweets'][user]['data'][i]['public_metrics']['like_count'])
    most_likes[user] = maxx
    
# our dictionary makes it simple to put our fresh extracted data into our DataFrame
# Since the DataFrame Index and the dictionary keys fit
df['most_likes'] = pd.Series(most_likes)
df

Unnamed: 0_level_0,created_at,followers_count,following_count,id,listed_count,name,tweet_count,n_recent_tweets,most_likes
username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BarackObama,2007-03-05T22:08:25.000Z,129471046.0,594725.0,813286,223136.0,Barack Obama,16101.0,4,921190
justinbieber,2009-03-28T16:41:22.000Z,113972402.0,291827.0,27260086,545458.0,Justin Bieber,31253.0,10,572138
katyperry,2009-02-20T23:45:56.000Z,109476487.0,233.0,21447363,129705.0,KATY PERRY,11074.0,34,49759
rihanna,2009-10-02T21:37:33.000Z,101953106.0,1019.0,79293791,93979.0,Rihanna,10546.0,6,290653
Cristiano,2010-06-14T19:09:20.000Z,91167759.0,56.0,155659213,83774.0,Cristiano Ronaldo,3651.0,3,535425
taylorswift13,2008-12-06T10:10:54.000Z,88411875.0,0.0,17919972,113035.0,Taylor Swift,608.0,5,894547
ladygaga,2008-03-26T22:37:48.000Z,83932413.0,120512.0,14230524,205857.0,Lady Gaga,9441.0,0,0
ArianaGrande,2009-04-23T02:56:31.000Z,82048983.0,57010.0,34507480,55024.0,Ariana Grande,46856.0,24,293151
TheEllenShow,2008-08-14T03:50:42.000Z,79243895.0,26893.0,15846407,97845.0,Ellen DeGeneres,22541.0,25,10444
YouTube,2007-11-13T21:43:46.000Z,72961127.0,1200.0,10228272,78925.0,YouTube,31287.0,100,9742


At this point we should notice a pattern in the way our code finds out the maximum value for some kind of metric (keyword: code refactoring).
> **Task:** Define a function which does the following:    
> **INPUT**: metric, function (min, max, std, ...)  
> **OUTPUT**: dictionary where for each user the function was applied to all values related to the metric

In [None]:
## your code here 

In [21]:
# %load ../src/_solutions/tweets_metric_func.py
def tweets_metric_eval(metric, func):
    dict_metric = {}
    for user in twitter_data['tweets'].keys():
        val_list = []
        if 'data' in twitter_data['tweets'][user]:
            for tweet in twitter_data['tweets'][user]['data']:
                val_list.append(tweet['public_metrics'][metric])
        try:
            dict_metric[user] = func(val_list)
        except Exception as e:
            dict_metric[user] = np.nan
    return dict_metric

Now, using the function, we can easily calculate the minimum/maximum/... values with respect to a metric. Try it out!

> **Task:**  
> - Insert for each user the maximum number of replys on a Tweet into our DataFrame.  
> - Insert for each user the maximum number of retweets on a Tweet into our DataFrame. 
> - Insert for each user the minimal number of like_count on a Tweet into our DataFrame.  
> - Insert for each user the avarage number of likes on a Tweet into our Dataframe.

In [None]:
## your code here 

In [23]:
# %load ../src/_solutions/metric_tests.py
df['reply_count_max'] = pd.Series(tweets_metric_eval('reply_count', np.max))
df['like_count_min'] = pd.Series(tweets_metric_eval('like_count', np.min))
df['retweets_max'] = pd.Series(tweets_metric_eval('retweet_count', np.max))
df['like_count_std'] = pd.Series(tweets_metric_eval('like_count', np.std))

  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
  ret = ret.dtype.type(ret / rcount)


In [None]:
df

## 3.4 Analyze DataFrame

Now let's look at our built DataFrame and read the minimum or maximum values from a column.

> **Task:**  
> **Q1)** Who tweeted the most in the last 7 days?  
> **Q2)** Who has the most likes on a tweet?  
> **Q3)** Who has the most retweets on a tweet?    

In [None]:
## your code here 

In [None]:
# %load ../src/_solutions/max_tweeted.py

In [None]:
# %load ../src/_solutions/max_likes.py

In [None]:
# %load ../src/_solutions/max_retweets.py

> **Task**: Compute the daily mean of posted tweets since the creation of the account for a single user.

> _Hint: You can use `pd.datetime.now()` to retrieve the current date. The columns of interest are: `created_at` and `tweet_count`._

In [None]:
df['created_at'] = pd.to_datetime(df['created_at']).dt.tz_convert(tz=None)

In [None]:
## your code here

In [None]:
# %load ../src/_solutions/daily_mean_posted_tweets.py

# 4. Building our own API

## FastAPI

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.

- Fast: Very high performance, on par with NodeJS and Go (thanks to Starlette and Pydantic). One of the fastest Python frameworks available.
- Fast to code: Increase the speed to develop features by about 200% to 300%. *
- Fewer bugs: Reduce about 40% of human (developer) induced errors. *
- Intuitive: Great editor support. Completion everywhere. Less time debugging.
- Easy: Designed to be easy to use and learn. Less time reading docs.
- Short: Minimize code duplication. Multiple features from each parameter declaration. Fewer bugs.
- Robust: Get production-ready code. With automatic interactive documentation.
- Standards-based: Based on (and fully compatible with) the open standards for APIs: OpenAPI (previously known as Swagger) and JSON Schema.

Source: https://fastapi.tiangolo.com/


### **Very simple example of an Web-API**

#### **Underlying implementation**
The underlying implementation consists of seperated functions.
```python
model = sklearn.linear_model.LinearRegression()
xs, y, valid_xs, valid_y = train_test_split(dataset)

def fit_model(model, xs, y):
    return model.fit(xs,y)

def predict(trained_model, xs):
    return trained_model.predict(xs)

def eval_prediction(y_orig, y_predicted):
    return sklearn.metrics.mean_squared_error(y_orig, y_predicted)

def something_secret():
    # do secret stuff
```

#### **API**
The Web-API allows by accessing a website at `/train_and_eval` to chain all of the above functions together to train and evaluate a model.

It doesn't expose the `something_secret()` function to the outside world though!

```python
@api.get("/train_and_eval")
def train_and_eval():
    model_trained = fit_model(xs, y)
    valid_y_pred = predict(model_trained, valid_xs)
    rmse = eval_prediction(valid_y, valid_y_pred)
    
    return {"rmse": rmse, "model_params": model_trained.get_params()}
```

## Lets try it out!

We have implemented a very rudimentary API, which you can find in `../src/api_01.py`.
The only thing it does is returning "Hello World" when accessing it.

In [24]:
# %load ../src/api_01.py
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def read_root():
    return {"message": "Hello World"}

You can launch it the following way:
- Open your Terminal (Linux, macOS) or Anaconda Prompt (Windows)
- Activate class14 in `conda`
- Head over to the `src` directory of todays lesson (use `cd`, `ls` and `pwd` to orientate yourself)
- Run `uvicorn api_01:app --debug`

- _Further Info_
    - `uvicorn` is a  minimal low-level server/application interface allowing asynchronous executions
    - Similarly to `jupyter lab` it hosts some kind of website on our host
    - Syntax: `uvicorn <python file>:<variable or function to run>`
    - The `--debug` flag makes sure the server restarts each time we change the source file.

>After starting the server head over to https://127.0.0.1:8000/.
>You should see the "Hello World" message

>_Note: `127.0.0.1` is the same as `localhost` which is the IP-Address/Hostname of your own PC_

FastAPI also offers a Docpage at http://127.0.0.1:8000/docs which:
- Shows you all available API commands
- Allows you to experiment with them
- Prints you the `curl` command you should use from the terminal to receive the same result as in the browser
    - _Note: We limit ourselves to the HTTP-GET Method, which is used to retrieve data. (Your browser does per default) If we wanted to send something to the server we should use the HTTP-POST Method, which is easier to do with `curl` than manually through your browser._

## Try something a bit more complex

>**Goal**: Make a model accessible through an API

### The Iris Dataset

The Iris Dataset is one of the most well known Classification datasets.
Based on the length and width of the sepal and petal we want to classify iris flowers correctly.

![_img/iris.png](_img/iris.png)
Taken from [Machine Learning for Beginners](https://www.datacamp.com/community/tutorials/machine-learning-in-r)

The sepal is the small/lower blossom, petal the upper/main blossom

In [25]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [26]:
iris = datasets.load_iris()

In [27]:
print(iris['DESCR'])

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [28]:
X = iris['data']
y = iris['target']

In [30]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [31]:
iris['target_names']

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [32]:
# Split dataset into training and test data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

Lets train a Random Forest Classifier to classify our data.

Reminder: Random Forests consist of multiple Decision Trees. 

In [33]:
# n_jobs = -1 allows to use all our available computing power
rf = RandomForestClassifier(n_jobs=-1, random_state=1)

In [34]:
# train the Random Forest
rf.fit(x_train, y_train)

RandomForestClassifier(n_jobs=-1, random_state=1)

In [35]:
print(rf)

RandomForestClassifier(n_jobs=-1, random_state=1)


In [36]:
# Evaluate performance
accuracy_score(y_test, rf.predict(x_test))

0.9666666666666667

>We have already implemented this into an API, so don't worry. You can launch it by again going into the `src` directory and executing `uvicorn api_02:app --debug`. 

>Play around with it and feel free to add some extra functionality (doesn't necessarily have something to do with the model. Be creative)