> **Note:** In most sessions you will be solving exercises posed in a Jupyter notebook that looks like this one. Because you are cloning a Github repository that only we can push to, you should **NEVER EDIT** any of the files you pull from Github. Instead, what you should do, is either make a new notebook and write your solutions in there, or **make a copy of this notebook and save it somewhere else** on your computer, not inside the `isds2020` folder that you cloned, so you can write your answers in there. If you edit the notebook you pulled from Github, those edits (possible your solutions to the exercises) may be overwritten and lost the next time you pull from Github. This is important, so don't hesitate to ask if it is unclear.

# Session 2: Strings, requests and APIs

In this combined teaching module and exercise set you will be working with collecting from the web. We will start out with some basic string operations and build on that to make a query for fetching data.

*Alternative sources*: If you get lost, you might find [this page](https://pythonprogramming.net/string-concatenation-formatting-intermediate-python-tutorial/) on pythonprogramming.net useful. [This page](https://www.python-course.eu/python3_sequential_data_types.php) also gives an introduction to the basics of strings and their related data types. 

# Basic Python (continued)

In Assignment 0 we covered a lot about basic Python. We only scratched the surface on many topics and in this session we will be taking a deeper looking at. 

Start out with watching the introduction video below describing the usefulness of text data and dictionaries.

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('QU3AfoezgTE', width=640, height=360)

## Strings

You might be wondering, what exactly are strings. Strings are sequential containers of characters. In Python we use two kinds of characters:
-  American Standard Code (`ascii`)
    - Characters from English alphabet, numbers, symbols for writing 
    - 8 bit information

- Unicode (`UTF`)  
    - Characters from European and Asian language and much more
    - 16 bit information
    - Available in URLs recently, e.g. [møn.dk](https://møn.dk)        
    
    
Note that while unicode is a little more heavy and costs more space it is way more flexible and we get less errors when importing data from non-ascii languages. This reason is also why Python adopted unicode since Python 3 as standard.

To see that strings are sequential containers, see the example below where we slice them like a list or dataframe/array:

In [2]:
str1 = 'police'
str1[2:]

'lice'

#### Common string operations 

Strings have multiple operations and functions associated. In this exercise we investigate a few of these. We also explore the sequence form of a string and how it can be sliced and accessed via indices. In the following we provide a small tour of some of the most important ones.

You can alter the sentence-case of strings by using the string methods `upper`, `lower`, `capitalize`. Example:

In [3]:
str1.upper()

'POLICE'

We can also use the `replace` method to substitute parts of the strings:

In [29]:
str1.replace('po', 'ma')

'malice'

We can also check whether a substring is within a given string and much more. The general syntax is 
```python 
T in S
``` 
which checks whether a string `S` contains the substring `T`. See two applications below: 

In [7]:
print('ice' in str1, 'mice' in str1)

True False


Another procedure is to add strings together like below:

In [8]:
str2 = 'officer'
str1 + ' ' + str2

'police officer'

In the first couple of exercises you should use the examples above to answer

> **Ex. 2.1.1**: Let `s1='Chameleon'` and `s2='ham'`. Check whether the string `s2` is a substring of `s1`. Is `'hello'` a substring `'goodbye'`?



In [11]:
s1 = 'Chamaleon'
s2 = 'ham'

s2 in s1, 'hello' in 'goodbye'

(True, False)

> **Ex. 2.1.2**: From the string `s1` select the last four characters. What is the index of the character `a` in `s1`?

> *Hint*: We can selecting a substring by slicing it with the `[]` notation, from the start to end where start is included and end is excluded. Recall that Python has zero-based indexing, see explanation [here](https://softwareengineering.stackexchange.com/questions/110804/why-are-zero-based-arrays-the-norm).


In [12]:
s1[5:]

# Because python starts counting from 0, the index of character a in s1 is 2

'leon'

#### More string operations 
In addition to the techniques above strings are equipped with an array of _methods_, for solving more complex tasks. For example the `str.join(list)` method will insert a string in between each element of a list. Oppositely `str1.split(str2)` splits `str1` into a list. `.strip()` removes spaces in the beginning and end of a word and the f-string fills in specified blanks in a string.  Below we illustrate the use of each function

```python
>>> " ".join(['Hello', 'World!']) 
'Hello World!'

>>> ' Hello World!   '.strip() 
'Hello World!'

>>> w = 'World'
>>> f'Hello {w}' 
'Hello World!'

>>> 'a,b,c'.split(',') 
['a','b','c']
```

> **Ex. 2.1.3:** Use the `join()` and `strip()` functions to retrieve the sentence `The quick brown fox jumps over the lazy dog` from the list  `list_of_words` in the code cell below.

In [32]:
list_of_words = ['       The        ', '   quick   ', '     brown      ',
                 ' fox          ', '          jumps     ', '   over ',
                 '          the   ', '  lazy     ', '          dog     ']

list_of_words = " ".join(list_of_words)
" ".join(list_of_words.split())

'The quick brown fox jumps over the lazy dog'

> **Ex. 2.1.4:** Let `l1 = ['r ', 'Is', '>', ' < ', 'g ', '?']`. Create from `l1` the sentence "Is r > g?" using your knowledge about string formatting. Make sure there is only one space in between worlds.
>
>> _Hint:_ You should be able to combine the above informations to solve this exercise.

In [55]:
l1 = ['r ', 'Is', '>', ' < ', 'g ', '?']
string = l1[1]+l1[0]+l1[2]+l1[4]+l1[5]

print(string)

Isr >g ?


In [59]:
l1 = ['r ', 'Is', '>', ' < ', 'g ', '?']
l1 = [l1[1],l1[0],l1[2],l1[4],l1[5]]
l1 = " ".join(l1)
" ".join(l1.split())

'Is r > g ?'

# Saving as text file
We saw in Assignment 0 that we could output tabular data as CSV file. This file is essentially a text file with specific structure that allows a computer to identify rows and columns. In the example below we will learn how to actually save a string directly as a text file. Note how we make line break using `\n` - this is a string escape sequence, read more [here](https://docs.python.org/3/reference/lexical_analysis.html#literals)

In [61]:
my_str = 'This is important...'
my_str2 = 'Written in Python!'
escape_seq = '\n'

with open('my_file.txt', 'w') as f:
    f.write(my_str+escape_seq+my_str2)

The code below opens the text file and prints the string.

In [62]:
with open('my_file.txt', 'r') as f:    
    print(f.read())

This is important...
Written in Python!


> **Ex. 2.1.5:** Create a .txt file called `to_do_list.txt` with a to-do list by looping over the list `to_do = ['1. Hit the gym', '2. Pay bills', '3. Meet George', '4. Buy eggs', '5. Read a book']` and writing each element on a seperate line.

In [1]:
to_do = ['1. Hit the gym', '2. Pay bills', '3. Meet George', '4. Buy eggs', '5. Read a book']

escape_seq = '\n'

for l in to_do:
    with open('to_do_list.txt', 'a') as f:
        f.write(l+escape_seq)

## Dictionaries

Dictionaries (or simply `dict`) are a central building block of python. Python dicts are constructed from pairs of keys and values making them extremely versatile for data storage. Like list they can contain deep nested structures, e.g. dict of dicts of lists.

Try running the code below, where the keys are strings (names from recent Danish prime ministers) and values are also strings (e.g. political affiliation):

In [22]:
my_dict1 = {'Anders': "Venstre",
            'Helle': "Socialdemokratiet",
            'Lars': "Venstre",
            'Mette': "Socialdemokratiet"}

print(my_dict1['Mette'])


Socialdemokratiet
{'Anders': 'Venstre', 'Helle': 'Socialdemokratiet', 'Lars': 'Venstre', 'Mette': 'Socialdemokratiet'}


Dictionaries can also be constructed from two associated lists. These are tied together with the `zip` function. Try the following code:

In [4]:
keys = ['a', 'b', 'c']
values = list(range(2,5))

key_value_pairs = list(zip(keys, values))

my_dict2 = dict(key_value_pairs)
my_dict2

{'a': 2, 'b': 3, 'c': 4}

> **Ex. 2.1.6**: Create an empty dictionary `words` using the `dict()`function. Then add each of the words in `['animal', 'coffee', 'python', 'unit', 'knowledge', 'tread', 'arise']` as a key, with the value being a boolean indicator for whether the word begins with a vowel. The results should look like `{'bacon': False, 'asynchronous': True ...}`
>
>> _Hint:_ You might want co first construct a function that asseses whether a given word begins with a vowel or not.

In [30]:
W = ['animal', 'coffee', 'python', 'unit', 'knowledge', 'tread', 'arise']

words = dict()
vowel = list()

for word in W:
    if word[0] in 'aeiou':
        vowel.append('True')
    else:
        vowel.append('False')
        
words = list(zip(W, vowel))
words = dict(words)

print(words)

{'animal': 'True', 'coffee': 'False', 'python': 'False', 'unit': 'True', 'knowledge': 'False', 'tread': 'False', 'arise': 'True'}


> **Ex. 2.1.7:** Loop through the dictionary `words`. In each iteration you should print a proper sentence stating if the current word begins with a vowel or not. 

> _Hint:_ You can loop through both keys and values simultaneously with the `.items()` method. [This](https://www.tutorialspoint.com/python/python_dictionary.htm) might help you.

In [49]:
for key,value in words.items():
    if value == 'True':
        print('This word begins with a vowel')
    else:
        print('This word does not begin with a vowel')

This word begins with a vowel
This word does not begin with a vowel
This word does not begin with a vowel
This word begins with a vowel
This word does not begin with a vowel
This word does not begin with a vowel
This word begins with a vowel


## Storing Python containers

You might wonder does there exist a file format for easy storage of Python containers?

Yes, the immensely popular JSON file format, which can store lists and dictionaries. The advantage is that JSON uses the same syntax as Python lists and dictionaries! The only thing we need to add are quotation marks, see example: 

- Python dict: `{"a":1,"b":1}`
- JSON: `'{"a":1,"b":1}'`

The popularity of JSON comes from the fact that it can hold any list or dictionary of any depth which the three fundamental data types: float, int, str. It does not work well with other formats, but in essence it can hold any form of structured data, e.g. text data, spatial data (GeoJSON) etc.

The code example below uses the JSON module to save our dictionary. We use a trick by first converting the JSON file to a string. This can be done with the function `dumps` in the module `json`:

In [52]:
import json
with open('my_file.JSON', 'w') as f:
    my_json_str = json.dumps(my_dict2) # convert dictonary to string with JSON formatting
    f.write(my_json_str) # write the string to file

with open('my_file.JSON', 'r') as f:
    print(f.read()) # read the string from file

{"a": 2, "b": 3, "c": 4}


<br>

# Python and the web

The internet is a massive source for collecting data. Watch the video below to get an overview of the most fundamental protocols and how we work with them.

In [55]:
from IPython.display import YouTubeVideo
YouTubeVideo('vfn6zLmIyUY', width=640, height=360)

## Application Programming Interface (API)

APIs are protocols that allow us to request information and/or services from the provider of the API. In this course, we are mainly interested in APIs that provide data as a response to our requests. Watch the video below to get a sense of what APIs exist and how they work.

In [56]:
YouTubeVideo('abQl_BD-rQo', width=640, height=360)

#### Building the queries
We will now move on to understanding how we can interact with a web API in Python. First we will see how to build a query, which simply a web address. By typing in a specific web address the web server will receive information from us.

In the example below we build a URL that allows us to check out which repositories Andreas BN has publicly available.

In [57]:
server_url = 'https://api.github.com'
endpoint_path = '/users/abjer/repos'
url = server_url + endpoint_path
print(url)

https://api.github.com/users/abjer/repos


#### Sending the query
Python has a smart module, named `requests`, that allows us to interact with the web. When we request a URL we get a response in return. Among other things it allows us to inspect the HTML code. In the example below we query the URL for the GitHub API we made above. 

In [69]:
import requests # import the module requests

response = requests.get(url) # submit query with `get` and save response as object

When we examine the response, we can see the HTML string is pretty long, so we limit the initial output to the first 1,000 characters. 

In [70]:
print(len(response.text),'\n') # print length of HTML
print(response.text[:1000],'\n') # print first 1,000 characters of HTML 

56401 

[{"id":111244798,"node_id":"MDEwOlJlcG9zaXRvcnkxMTEyNDQ3OTg=","name":"abjer.github.io","full_name":"abjer/abjer.github.io","private":false,"owner":{"login":"abjer","id":6363844,"node_id":"MDQ6VXNlcjYzNjM4NDQ=","avatar_url":"https://avatars3.githubusercontent.com/u/6363844?v=4","gravatar_id":"","url":"https://api.github.com/users/abjer","html_url":"https://github.com/abjer","followers_url":"https://api.github.com/users/abjer/followers","following_url":"https://api.github.com/users/abjer/following{/other_user}","gists_url":"https://api.github.com/users/abjer/gists{/gist_id}","starred_url":"https://api.github.com/users/abjer/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/abjer/subscriptions","organizations_url":"https://api.github.com/users/abjer/orgs","repos_url":"https://api.github.com/users/abjer/repos","events_url":"https://api.github.com/users/abjer/events{/privacy}","received_events_url":"https://api.github.com/users/abjer/received_events","type":"

We notice how the structure resembles lists and dictionaries. Therefore, we try to convert it by assuming that it is structured as JSON. Voila! It now makes a lot of sense!

In [71]:
response_json = response.json() # convert response to a list of dicts
response_json[0]

{'id': 111244798,
 'node_id': 'MDEwOlJlcG9zaXRvcnkxMTEyNDQ3OTg=',
 'name': 'abjer.github.io',
 'full_name': 'abjer/abjer.github.io',
 'private': False,
 'owner': {'login': 'abjer',
  'id': 6363844,
  'node_id': 'MDQ6VXNlcjYzNjM4NDQ=',
  'avatar_url': 'https://avatars3.githubusercontent.com/u/6363844?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/abjer',
  'html_url': 'https://github.com/abjer',
  'followers_url': 'https://api.github.com/users/abjer/followers',
  'following_url': 'https://api.github.com/users/abjer/following{/other_user}',
  'gists_url': 'https://api.github.com/users/abjer/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/abjer/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/abjer/subscriptions',
  'organizations_url': 'https://api.github.com/users/abjer/orgs',
  'repos_url': 'https://api.github.com/users/abjer/repos',
  'events_url': 'https://api.github.com/users/abjer/events{/privacy}',
  'received_events

## Introducing the punk API
The [punk API](https://punkapi.com/) serves information about _beers_. It is a well made and well documented API which makes it great for learning about APIs. 

> **Ex. 2.2.1:** Read the documentation on the Punk API available [here](https://punkapi.com/documentation/v2). What is the server url (i.e. root endpoint) of the Punk API? Does it require authentication? Then use the Punk API to make a request for beers brewed before December, 2008 with an ABV of at least 8.

In [80]:
url_beer = 'https://api.punkapi.com/v2/beers?brewed_before=12-2008&abv_gt=7'
beers = requests.get(url_beer)

[{'id': 23,
  'name': 'Storm',
  'tagline': 'Islay Whisky Aged IPA.',
  'first_brewed': '12/2007',
  'description': 'Dark and powerful Islay magic infuses this tropical sensation of an IPA. Using the original Punk IPA as a base, we boosted the ABV to 8% giving it some extra backbone to stand up to the peated smoke imported directly from Islay.',
  'image_url': 'https://images.punkapi.com/v2/23.png',
  'abv': 8,
  'ibu': 60,
  'target_fg': 1010,
  'target_og': 1082,
  'ebc': 12,
  'srm': 6,
  'ph': 4.4,
  'attenuation_level': 86,
  'volume': {'value': 20, 'unit': 'litres'},
  'boil_volume': {'value': 25, 'unit': 'litres'},
  'method': {'mash_temp': [{'temp': {'value': 65, 'unit': 'celsius'},
     'duration': 75}],
   'fermentation': {'temp': {'value': 19, 'unit': 'celsius'}},
   'twist': 'Oak chips soaked in Islay whisky 50g'},
  'ingredients': {'malt': [{'name': 'Extra Pale',
     'amount': {'value': 5.8, 'unit': 'kilograms'}}],
   'hops': [{'name': 'Ahtanum',
     'amount': {'value': 

> **Ex. 2.2.2:** What object type is the API's JSON response? What about the individual items in the container? Convert the response object to a suitable format and answer the following questions:
>> 1) How many beers are in the JSON object?
>>
>> 2) Print the names of the beers in the JSON object using lower case characters.
>>
>> 3) Select the beer called Paradox Islay from the JSON object.
>>
>> 4) Which hop ingredients does the Paradox Islay contain?


In [125]:
beers_json = beers.json()

#1) There are 8 beers

len(beers_json)

#2) Names are given below:

for item in beers_json:
    print(item['name'].lower())
    
#3) 

beers_json[2]

#4) 

beers_json[2]['ingredients']['hops']

storm
zephyr
paradox islay
coffee imperial stout
original dogma (née speedball)
riptide
chaos theory
ab:03


[{'name': 'Columbus',
  'amount': {'value': 75, 'unit': 'grams'},
  'add': 'start',
  'attribute': 'bitter'},
 {'name': 'Columbus',
  'amount': {'value': 25, 'unit': 'grams'},
  'add': 'middle',
  'attribute': 'flavour'},
 {'name': 'Saaz',
  'amount': {'value': 25, 'unit': 'grams'},
  'add': 'end',
  'attribute': 'flavour'},
 {'name': 'First Gold',
  'amount': {'value': 25, 'unit': 'grams'},
  'add': 'end',
  'attribute': 'flavour'}]

> **Ex. 2.2.3:** Save the beers as a JSON file on your machine.

> _Hint:_ you might want to take a look at the [json](https://docs.python.org/3/library/json.html) module.


In [132]:
beers_as_json = json.JSONEncoder().encode(beers_json)

with open('beers.json', 'w') as f:
    f.write(beers_as_json)

<br>

## The API for Statistics Denmark 

Statistics Denmark (DST) provide an API access to their aggregate data. For developers they supply a [console](https://api.statbank.dk/console) for testing. In this exercise we will code up a simple script which can collect data from the DST API. 

> **Ex 2.3.1:** Use the API console to construct a GET request which retrieves the table FOLK1A split by quarter. The return should be in JSON format. We want all available dates.
>
>Then write a function `construct_link()` which takes as inputs: a table ID (e.g. `'FOLK1A'`) and a list of strings like `['var1=*', 'var2=somevalue']`. The function should return the proper URL for getting a dataset with the specified variables (e.g. in this case all levels of var1, but only where var2=somevalue).

> _Hint:_ The time variable is called 'tid'. To select all available values, set the value-id to '*'. Spend a little time with the console to get a sense of how the URLs are constructed.


In [1]:
url_1 = 'https://api.statbank.dk/v1/data/'
tableID = 'FOLK1A'
url_2 = '/JSONSTAT?valuePresentation=Default&timeOrder=Ascending&'
filt = ['TID=*','&K%C3%98N=*']

def construct_link(tableID,filt):
    return url_1 + tableID + url_2 + filt[0] + filt[1]

construct_link(tableID,filt)

'https://api.statbank.dk/v1/data/FOLK1A/JSONSTAT?valuePresentation=Default&timeOrder=Ascending&TID=*&K%C3%98N=*'

When executing the request in the console you should get a json file as output. Next lets write some code to load these json files directly into python. 


> **Ex. 2.3.2:** use the `requests` module (get it with `pip install requests`) and `construct_link()` to request birth data from the "FOD" table. Get all available years (variable "Tid"), but only female births (BARNKON=P) . Unpack the json payload and store the result. Wrap the whole thing in a function which takes an url as input and returns the corresponding output.

> _Hint:_ The `requests.response` object has a `.json()` method. 

In [5]:
import requests

In [6]:
FOD_table = 'FOD'
filt_girls = ['TID=*','&BARNKON=P']
link_girls = construct_link(FOD_table,filt_girls)

def unload_data(link_girls):
    resp = requests.get(link_girls)
    data_dicts = resp.json()
    return data_dicts


> **Ex. 2.3.3:** Extract the number of girls born each year. Store the results as a list.

In [7]:
born_girls = unload_data(link_girls)
girls = born_girls['dataset']['value']

> **Ex. 2.3.4:** Repeat 2.3.2 and 2.3.3 but this time only get boy births (BARNKON=D). Store the numbers in a new list and use the `plot_births` (supplied below) function to plot the data. If you don't already have matplotlib installed run `pip install matplotlib`.

In [8]:
# Just run this once, do not change it.
import matplotlib.pyplot as plt 

def plot_births(boys, girls):
    """ Plots lineplot of the number of births split by gender.
    
    Args: 
        boys: a list of boy births by year
        girls: a list of girl births by year
    """
    if not len(boys) == len(girls):
        raise ValueError('There must be the same number of observations for boys and girls')
    
    labels = [f'{year}' for year in range(1973,2018)]
    
    plt.plot(range(len(boys)), boys, color = 'blue', label = 'boys')
    plt.plot(range(len(boys)), girls, color = 'red', label = 'girls')
    plt.xticks([i for i in range(len(boys)) if i%4 == 0],
               [l for i,l in zip(range(len(boys)),labels) if i%4 == 0],
               rotation = 'vertical')
    plt.legend()
    plt.show()

In [9]:
filt_boys = ['TID=*','&BARNKON=D']
link_boys = construct_link(FOD_table,filt_boys)
born_boys = unload_data(link_boys)
boys = born_boys['dataset']['value']

The final question in this module is optional and only for those curious in learning more.

>**(Bonus question) Ex. 2.3.5:** Go to [https://kristianuruplarsen.github.io/PyDST/](https://kristianuruplarsen.github.io/PyDST/) follow the installation instructions and import PyDST. Try to replicate the birth figure from 2.3.4 using PyDST. Use [the documentation](https://kristianuruplarsen.github.io/PyDST/connection) to learn how the package works.

In [17]:
import PyDST
PyDST.get_data('FOD', variables = {'BARNKON': 'P', 'Tid': '*'}).json()
PyDST.get_data('FOD', variables = {'BARNKON': 'D', 'Tid': '*'}).json()

{'dataset': {'dimension': {'BARNKON': {'label': 'sex of child',
    'category': {'index': {'D': 0}, 'label': {'D': 'Boys'}}},
   'ContentsCode': {'label': 'Indhold',
    'category': {'index': {'FOD': 0},
     'label': {'FOD': 'Live births'},
     'unit': {'FOD': {'base': 'number', 'decimals': 0}}}},
   'Tid': {'label': 'time',
    'category': {'index': {'1973': 0,
      '1974': 1,
      '1975': 2,
      '1976': 3,
      '1977': 4,
      '1978': 5,
      '1979': 6,
      '1980': 7,
      '1981': 8,
      '1982': 9,
      '1983': 10,
      '1984': 11,
      '1985': 12,
      '1986': 13,
      '1987': 14,
      '1988': 15,
      '1989': 16,
      '1990': 17,
      '1991': 18,
      '1992': 19,
      '1993': 20,
      '1994': 21,
      '1995': 22,
      '1996': 23,
      '1997': 24,
      '1998': 25,
      '1999': 26,
      '2000': 27,
      '2001': 28,
      '2002': 29,
      '2003': 30,
      '2004': 31,
      '2005': 32,
      '2006': 33,
      '2007': 34,
      '2008': 35,
      '2009'