# Dealing with http requests

## The `requests` module

Using this module is simply the most popular way to fetch data from a website or to talk to REST endpoints of some webservice. The only current limitation is the fact that it **cannot handle asynchronous requests** out of the box. If speed is of concern and you would like to fetch data from many sources at the same time, have a look at the [HTTPX Module](https://www.python-httpx.org). It offers an almost identical interface as the Requests module, with `asynch` options.

Installation is a simple pip install away:

```bash
$ pip install requests
```

The most simple way to fetch the content of a given endpoint is like this:

```python
r = requests.get('https://api.github.com/user', auth=('username', 'password'))
print(r.status_code)   
# returns 200 if ok
print(r.headers['content-type'])  
# 'application/json; charset=utf8'
r.encoding
# 'utf-8'
r.text
# '{"type":"User"...'
r.json()  # decode the string as json
# {'private_gists': 419, 'total_private_repos': 77, ...}
```


In reality, you need to pass the correct headers and maybe ignore the SSL certificates, because you used self-signed certificates.

```python
resp = requests.post(
    url = 'https://my.endpoint.com',
    data = json.dumps(payload),
    headers={
        'Content-Type': 'application/json',
         'Accept': 'application/json'
    },
    auth=('username', 'password'),
    verify=False,   # do not verify SSL certificates
    timeout=10,     # timeout in seconds
)
```

The **timeout** can also contain a tuple, e.g. `(3,8)`. The request will then wait 3 seconds to establish a connection and wait another 8 seconds for the result.

### GET, POST, PUT, DELETE requests

As you might have guessed, the requests module not only offers `get` requests, but any kind of HTTP request. The most popular come with their own method, include the typical CRUD (create, read, update delete) requests:

```python
requests.post(url, data)                # CREATE data. Posted data is usually a string containing json
requests.get(url, params, args)         # READ data
requests.put(url, data, args)           # UPDATE data
requests.delete(url, args)              # DELETE data
requests.head(url, args)
requests.patch(url, data, args)
```

Typically, programmers json-encode their data structure themselves and pass the content as `data`:

```python
import json
import requests

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
response = requests.post(
    url = url,
    data = json.dumps(payload),
    headers = headers
)
```

You can also pass the payload with the `json` parameter; the headers `content-type: application/json` get set automatically:

```python
response = requests.post(url = url, json = payload)
```




The **response object** offers a number of useful methods and status values:

```python
response.ok       # HTTP status code is 2xx
response.headers  # all header information as a dict
response.content  # raw binary content
response.text     # the raw text
response.apparent_encoding  # utf-8 etc.
response.json()   # decode JSON string into Python data structure
```

## Exercise 1

1. Get information about a given Kostenstelle. The endpoint is: `http://n-vermeul.ethz.ch/sap_info?kostl=6005` (or any Kostenstelle you are interested in)
2. make sure the response has a status code of 2xx by checking `response.ok`
3. compare content, text and json

## Exercise 2

1. Use a **list comprehension** to only retrieve these attributes: kostl, fonds, beschr
2. create an **array of dictionaries**

## Excercise 3

1. Read about the [filter functions in Python](https://gitlab.ethz.ch/vermeul/python-best-practices/-/blob/master/07-Built-in_Functions.md)
2. create a filter function which selects certain records, e.g. fonds `startswith('2-70')`
3. apply the filter to the already created list comprehension

## Excercise 4

1. Read about the [sort functions in Python](https://gitlab.ethz.ch/vermeul/python-best-practices/-/blob/master/07-Built-in_Functions.md)
2. create a `sorted` function which extracts the middle number in the `fonds` attribute, e.g. `70110` in `2-70110-13`
3. use a regular expression, use the re.X (extended syntax) parameter
4. return 0 if no match, return int(middle_number) if match
5. apply the `sorted` function to the already created list comprehension

## Solution to Exercise 1

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

if response.ok:
    binary = response.content
    text   = response.text
    data_structure = response.json()

In [None]:
binary

In [None]:
text

In [None]:
data_structure

In [None]:
response.json()

## Solution to Exercise 2

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

costcentre_infos = []
interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    costcentre_infos = [
        {
            attr: costcentre.get(attr, '') for attr in interesting_attributes
        }
        for costcentre in all_costcentres
    ]
    
costcentre_infos

## Solution to Exercise 3

In [None]:
import requests
import json
response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

def my_filter(costcentre):
    if costcentre.get('fonds', '').startswith('2-70'):
        return costcentre

selected_costcentres = []
interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    selected_costcentres = [
        {
            attr: costcentre.get(attr, '') for attr in interesting_attributes
        }
        for costcentre in filter(my_filter, all_costcentres)
    ]
    
selected_costcentres

## Solution to Exercise 4

In [None]:
import requests
import json
import re

response = requests.get('http://n-vermeul.ethz.ch/sap_info?kostl=6005')

extract_middle_number = re.compile(r'''
    ^\d+                       # start with at least a number
    \-                         # followed by a dash
    (?P<middle_number>\d+)     # followed by numbers, capture these
    \-                         # followed by a dash
    \d+                        # followed by the last digits
    $                          # end of string
    ''', 
    re.X
)

def my_filter(costcentre):
    if costcentre.get('fonds', '').startswith('2-70'):
        return costcentre
    
def my_sort(costcentre):
    """This function first extracts the middle number, using a regular expression.
    If we have a match, return the integer representation of that number to enforce number comparison.
    If we do not have a match, return 0
    """
    fonds = costcentre.get('fonds','')
    match = extract_middle_number.search(fonds)
    if match:
        return int(match.groupdict()['middle_number'])  # return int() to enforce number comparison
    else:
        return 0
    

selected_and_sorted_costcentres = []
interesting_attributes = ['kostl','fonds', 'beschr']
if response.ok:
    all_costcentres = response.json()
    selected_and_sorted_costcentres = [
        {
            attr: costcentre.get(attr, '') for attr in interesting_attributes
        }
        for costcentre in sorted(
            filter(my_filter, all_costcentres),
            key = my_sort
        )
    ]
    
selected_and_sorted_costcentres