# Dealing with http requests

## The `requests` module

Using this module is simply the most popular way to fetch data from a website or to talk to REST endpoints of some webservice. The only current limitation is the fact that it **cannot handle asynchronous requests** out of the box. If speed is of concern and you would like to fetch data from many sources at the same time, have a look at the [HTTPX Module](https://www.python-httpx.org). It offers an almost identical interface as the Requests module, with `asynch` options.

Installation is a simple pip install away:

```bash
$ pip install requests
```

The most simple way to fetch the content of a given endpoint is like this:

In [None]:
import getpass
username = 'pulsargranular'
resp = requests.get(f'https://api.github.com/users/{username}', auth=(username, getpass.getpass()))

print(resp.status_code)              # returns 200 if ok
print(resp.headers['content-type'])  # 'application/json; charset=utf8'
print(resp.encoding)                 # 'utf-8'
print(resp.text)                     # '{"type":"User"...'
print(resp.json())                   # decode the string as json

In reality, you might need to pass the correct headers and maybe ignore the SSL certificates, because you used self-signed certificates.

```python
resp = requests.post(
    url = 'https://my.endpoint.com',
    data = json.dumps(payload),
    headers={
        'Content-Type': 'application/json',
         'Accept': 'application/json'
    },
    auth=('username', 'password'),
    verify=False,   # do not verify SSL certificates
    timeout=10,     # timeout in seconds
)
```

The **timeout** can also contain a tuple, e.g. `(3,8)`. The request will then wait 3 seconds to establish a connection and wait another 8 seconds for the result.

### GET, POST, PUT, DELETE requests

As you might have guessed, the requests module not only offers `get` requests, but any kind of HTTP request. The most popular come with their own method, include the typical CRUD (create, read, update delete) requests:

```python
requests.post(url, data)                # CREATE data. Posted data is usually a string containing json
requests.get(url, params, args)         # READ data
requests.put(url, data, args)           # UPDATE data
requests.delete(url, args)              # DELETE data
requests.head(url, args)
requests.patch(url, data, args)
```

Typically, programmers json-encode their data structure themselves and pass the content as `data`:

```python
import json
import requests

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
response = requests.post(
    url = url,
    data = json.dumps(payload),
    headers = headers
)
```

You can also pass the payload with the `json` parameter; the headers `content-type: application/json` get set automatically:

```python
response = requests.post(url = url, json = payload)
```




The **response object** offers a number of useful methods and status values:

```python
response.ok       # HTTP status code is 2xx
response.headers  # all header information as a dict
response.content  # raw binary content
response.text     # the raw text
response.apparent_encoding  # utf-8 etc.
response.json()   # decode JSON string into Python data structure
```

## Exercise 1: get info, decode json

1. Get information about a given Kostenstelle. The endpoint is: `http://n-vermeul.ethz.ch/sap_info?kostl=6005` (or any Kostenstelle you are interested in)
2. make sure the response has a status code of 2xx by checking `response.ok` is `True`
3. compare content, text and json

## Exercise 2: handle errors

1. Provoke an exception, e.g. provide an innvalid enpoint
2. write a try-except block that handles the error
3. hint: the error is part of the `requests` module: `requests.ConnectionError`
4. Provoke other exceptions (eg. `something://` instead of `http://`) 
5. try to figure out where these exceptions are raised
6. handle these exceptions, using additional `except` blocks

## Exercise 3: extract and re-arrange data

You might be only interested in the following attributes: 

```python
attributes = ['kostl','fonds', 'beschr']
```

The structure should remain, an array of dictionaries, e.g.

```python
[{'kostl': '6005',
  'fonds': '0-21408-18',
  'beschr': 'PHRT-Driver-Project PRECISE'},
 {'kostl': '6005', 'fonds': '2-67400-09', 'beschr': 'SysX_SyBIT-Rinn'},
 ...
]
```

0. start with the code below
1. decode the content of the response into a Python structure, using `response.json()` and assign it to a variable `all_costcentres`
2. loop over all costcentres, create a new dictionary
3. inside that loop, create another loop over the attributes
4. build your dictionary for the three attributes.
5. hint: the `.get('key', '')` method avoids KeyErrors that might occur: `costcentre["this_key_does_not_exist"]`
6. append the dictionary to the `costcentre_infos` array, using the `.append()` method

**Bonus**

7. Try to use a (nested) list comprehension instead of a nested `for` loop

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

costcentre_infos = []
attributes = ['kostl','fonds', 'beschr']
if response.ok:
    # your code goes here

## Excercise 4: filtering

1. Read about the [filter functions in Python](https://gitlab.ethz.ch/vermeul/python-best-practices/-/blob/master/07-Built-in_Functions.md#filtering-the-filter-function)
2. create a filter function which selects certain records, e.g. fonds `.startswith('2-70')`
3. apply the filter to the outer loop (`for costcentre in all_costcentres`)

**Bonus:**

4. apply the filter to the outer list comprehension

## Excercise 5: sorting

1. Read about the [sort functions in Python](https://gitlab.ethz.ch/vermeul/python-best-practices/-/blob/master/07-Built-in_Functions.md#sorting-the-sorted-function)
2. start with the code below
3. create a `def my_sort(costcentre)` function which uses the regex above
4. hint: `match.groupdict()['middle_number']` will access the capture group item
5. hint: the `my_sort` should return 0 if no match, or return int(middle_number) otherwise

**Bonus**

6. try to achieve the same thing using the nested list comprehensions

In [None]:
import requests
import json
import re

response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

extract_middle_number = re.compile(r'''
    ^\d+                       # start with at least a number
    \-                         # followed by a dash
    (?P<middle_number>\d+)     # followed by numbers, capture these
    \-                         # followed by a dash
    \d+                        # followed by the last digits
    $                          # end of string
    ''', 
    re.X
)

def my_filter(costcentre):
    if costcentre.get('fonds', '').startswith('2-70'):
        return costcentre
    
def my_sort(costcentre):
    """This function first extracts the middle number, using a regular expression.
    If we have a match, return the integer representation of that number to enforce number comparison.
    If we do not have a match, return 0
    """
    ### your code here
    

selected_and_sorted_costcentres = []
interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    selected_and_sorted_costcentres = [
        {
            attr: costcentre.get(attr, '') for attr in interesting_attributes
        }
        for costcentre in sorted(
            filter(my_filter, all_costcentres),
            key = my_sort
        )
    ]
    
selected_and_sorted_costcentres

## Solution to Exercise 1

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

if response.ok:
    binary = response.content
    text   = response.text
    data_structure = response.json()

In [None]:
binary

In [None]:
text

In [None]:
data_structure

In [None]:
response.json()

## Solution to Exercise 2

**Solution to 2.1 - 2.3**

In [None]:
import requests
import json

try: 
    response = requests.get('http://n-vermeul.notreallythere.ch/sap_info?kostl=6005')
except requests.ConnectionError as exc:
    print(f"This url does not exist: {exc}")

if response.ok:
    binary = response.content
    text   = response.text
    data_structure = response.json()

**Solution to 2.4 - 2.6**

In [None]:
import requests
import json

try: 
    response = requests.get('not_a_valid_schema://n-vermeul.notreallythere.ch/sap_info?kostl=6005')
    
except requests.ConnectionError as exc:
    print(f"This url does not exist: {exc}")
    
except requests.sessions.InvalidSchema as exc:
    print(f"This endpoint is not even a valid schema: {exc}")
    
if response.ok:
    binary = response.content
    text   = response.text
    data_structure = response.json()

## Solution to Exercise 3

**Solution 3.1 - 3.6**

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

costcentre_infos = []
attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    
    for costcentre in all_costcentres:
        costcentre_info = {}
        for attribute in attributes:
            costcentre_info[attribute] = costcentre.get(attribute,'')
        costcentre_infos.append(costcentre_info)
    
costcentre_infos

**Bonus:** using a nested _list comprehension_ instead of a nested for-loop

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    costcentre_infos = [
        {
            attr: costcentre.get(attr, '')             # inner list comprehension
            for attr in interesting_attributes         # creates a dictionary (inside { } )
        }
        for costcentre in all_costcentres              # outer list comprehension creates an array (inside [ ] )
    ]
    
costcentre_infos

## Solution to Exercise 4

**Solution to 4.1 - 4.3**

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

def my_filter(costcentre):
    if costcentre.get('fonds', '').startswith('2-70'):
        return costcentre

costcentre_infos = []
attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    
    for costcentre in filter(my_filter, all_costcentres):
        costcentre_info = {}
        for attribute in attributes:
            costcentre_info[attribute] = costcentre.get(attribute,'')
        costcentre_infos.append(costcentre_info)
    
costcentre_infos

**Bonus** same thing, using list comprehension

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

def my_filter(costcentre):
    if costcentre.get('fonds', '').startswith('2-70'):
        return costcentre

interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    selected_costcentres = [
        {
            attr: costcentre.get(attr, '') for attr in interesting_attributes
        }
        for costcentre in filter(my_filter, all_costcentres)
    ]
    
selected_costcentres

## Solution to Exercise 5

**Solutions to 5.1 - 5.5**

In [None]:
import requests
import json
import re

response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

extract_middle_number = re.compile(r'''
    ^\d+                       # start with at least a number
    \-                         # followed by a dash
    (?P<middle_number>\d+)     # followed by numbers, capture these
    \-                         # followed by a dash
    \d+                        # followed by the last digits
    $                          # end of string
    ''', 
    re.X
)

def my_filter(costcentre):
    if costcentre.get('fonds', '').startswith('2-70'):
        return costcentre
    
def my_sort(costcentre):
    """This function first extracts the middle number, using a regular expression.
    If we have a match, return the integer representation of that number to enforce number comparison.
    If we do not have a match, return 0
    """
    fonds = costcentre.get('fonds','')
    match = extract_middle_number.search(fonds)
    if match:
        return int(match.groupdict()['middle_number'])  # return int() to enforce number comparison
    else:
        return 0
    

selected_and_sorted_costcentres = []
interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    
    for costcentre in sorted(
        filter(my_filter, all_costcentres),
        key = my_sort
    ):
        costcentre_info = {}
        for attribute in attributes:
            costcentre_info[attribute] = costcentre.get(attribute,'')
        selected_and_sorted_costcentres.append(costcentre_info)
    
selected_and_sorted_costcentres

**Bonus:** the same as above, using list comprehension

In [None]:
import requests
import json
import re

response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

extract_middle_number = re.compile(r'''
    ^\d+                       # start with at least a number
    \-                         # followed by a dash
    (?P<middle_number>\d+)     # followed by numbers, capture these
    \-                         # followed by a dash
    \d+                        # followed by the last digits
    $                          # end of string
    ''', 
    re.X
)

def my_filter(costcentre):
    if costcentre.get('fonds', '').startswith('2-70'):
        return costcentre
    
def my_sort(costcentre):
    """This function first extracts the middle number, using a regular expression.
    If we have a match, return the integer representation of that number to enforce number comparison.
    If we do not have a match, return 0
    """
    fonds = costcentre.get('fonds','')
    match = extract_middle_number.search(fonds)
    if match:
        return int(match.groupdict()['middle_number'])  # return int() to enforce number comparison
    else:
        return 0
    

selected_and_sorted_costcentres = []
interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    selected_and_sorted_costcentres = [
        {
            attr: costcentre.get(attr, '') for attr in interesting_attributes
        }
        for costcentre in sorted(
            filter(my_filter, all_costcentres),
            key = my_sort
        )
    ]
    
selected_and_sorted_costcentres