# Accessing data on the web through APIs

by Koenraad De Smedt at UiB

---
An *Application Program Interface* (API) is a website that accepts HTTP/HTTPS requests and sends a response. If the request is valid, the response will have a successful status code and the program extract data from it. Many websites provide data in the JSON format, which can easily be converted to a Python *dict*.

This notebook shows how to:

1.  Accessing a remote API with parameters
2.  Getting a dict from JSON in an API response
3.  Select information from parts of the dict
4.  Convert a dict to a *pandas* series
5.  Sort a series.

Warning: The external websites in these examples are regularly updated, so that the data in the response may be different from earlier. Also, it is possible that the APIs themselves are changed.

---

In [None]:
import requests

---
## Gender of names

The following example accesses an API on a website that provides the gender of a name. The example asks for the gender of *Alexa*. This could be useful in social media analysis, for instance.


In [None]:
requests.get('https://api.genderize.io', params={'name':'Alexa'}).json()['gender']

Let us break this example down in steps. We send a *get* request and provide the parameters as a *dict*. 

If the `status_code` of the response is 200, the request was successful and a valid response is obtained.

In [None]:
response = requests.get('https://api.genderize.io', params={'name':'Alexa'})
print(response)
print(response.status_code)

The response contains data as a JSON object. By using the `.json` decoder, we can convert this object to a dict.

In [None]:
data = response.json()
print(data)
type(data)

There are various pieces of information in this dict which can be used for further processing.

In [None]:
print(data['gender'])
print(data['name'], 'is', data['gender'])

### Exercise 1

Define a function `find_gender` with one argument, a name, prints the gender and the probability as in the following example.

```
>>> (find_gender 'Alexa')
Alexa is female with probability 0.93
```

Then change your function definition so that the probability is printed as a percentage.

```
>>> (find_gender 'Alexa')
Alexa is female (93%)
```

### Exercise 2

According to the [documentation](https://genderize.io/), this API accepts and optional extra parameter `country_id` which is a [two-letter country code](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2). Extend the function with an optional parameter for the country. If the country is given, it is given as a parameter to the API and is put in the output as well. Example:

```
>>> find_gender('Kim', 'KR')
Kim is male in KR (80%)
>>> find_gender('Kim')
Kim is female (56%)
```



---
## Countries

Another API is that of https://restcountries.com/ which returns information about countries. This JSON response has a more complicated structure than the one above.

Check out this example in a browser window: https://restcountries.com/v3.1/alpha?codes=be which returns a lot of information about Belgium. Observe that the result appears as a list containing a dict.

In [None]:
countries_url = 'https://restcountries.com/v3.1/alpha'


Let's say that we are only interested in the country name and the population. These pieces of information can be 'mined' from the API result. So we make a function that takes a country code as its argument and returns two values: the common country name (in English) and the population.

In [None]:
def get_country_info(country_code):
  info = requests.get(countries_url, params={'codes':country_code}).json()
  #print(info)
  return info[0]['name']['common'], info[0]['population']

Let's test. Because the function returns two values, we need two variables to receive these values.

In [None]:
country_name, population = get_country_info('be')
print(f'{country_name} (pop. {population})')

### Exercise 3

Extend the function `find_gender` from the previous exercise by printing the 
common name and population of the country, instead of the percentage. Example:

```
>>> find_gender('Kim', 'BE')
Kim is female in Belgium (pop. 11555997)
```

As a slightly more complex variant, also print the number of people recorded with that name in the given country.

```
>>> find_gender('Kim', 'BE')
Kim is female in Belgium (3373 recorded out of pop. 11555997)
```

---
## Covid-19

There is an API with information about the Covid-19 situation in all countries.
It is described in https://covid19api.com/.

The following will give a list of dicts, one for each country. Let's show the first two of these.

In [None]:
covid_url = 'https://api.covid19api.com/summary'
covid_list = requests.get(covid_url).json()['Countries']
covid_list[:2]

Transform this list of dicts into one dict with countries as keys and numbers of deaths as values.

In [None]:
covid_death_dict = {c['Country']:c['TotalDeaths'] for c in covid_list}
covid_death_dict

### Exercise 4


Write simple code to find the total number of deaths for Norway in `covid_death_dict`.

---
## Dict to Panda series

The dict can be converted into a Panda series. A series is a one-dimensional labeled array capable of holding data of any type. In this case it is series of numbers.

In [None]:
import pandas as pd
covid_series = pd.Series(covid_death_dict)
covid_series.index.name = 'country'
covid_series

The series can be sorted and plotted. The y axis may use exponential numbering. `1e6` means `10 ** 6`.

In [None]:
covid_series = covid_series.sort_values(ascending=False)
covid_series.plot(use_index=False)

The y axis can be logarithmic by adding the argument `logy=True`.

In [None]:
covid_series.plot(use_index=False, logy=True)

### Exercise 5

1.  Print the first 5 items of the sorted `covid_series` to see the top countries with the most deaths.

2.  (optional) Extend by looking up the population of countries to find the deaths per capita.

### Exercise 6

(optional) This is a slightly bigger project for those who want to try some more APIs. There is a Digital Humanities Course Registry (a cooperation between CLARINO and DARIAH) which has an [API](https://dhcr.clarin-dariah.eu/api/v1/). Make a program to get information from the API. For instance, define a function which retrieves all courses given in certain languages, together with their institutions, such as the following:

```
>>> find_courses_lang(['Norwegian','Swedish','Danish'])
Masterprogram i digital kultur - Universitetet i Bergen
Digitala Humaniora - Åbo Akademi University
IT and Cognition - Copenhagen University
```

Other possible exercises are plotting the number of courses by language, country, institution, discipline, etc.