# Lunch Time Python
## Lunch 1: Requests
*Scientific Software Center, Heidelberg University*  
*October 2021*  
*Visit on [GitHub](https://github.com/ssciwr/lunch-time-python)*  

Welcome to Lunch Time Python! This is the notebook for [session 1](https://ssciwr.github.io/lunch-time-python/lunchtime1/) - the [requests](https://docs.python-requests.org/en/latest/) library.

The requests library provides an elegant and simple way to send HTTP requests. Connect to the server of your choice, and download websites, stream data or upload content. Requests is [one of the most downloaded python packages](https://pypi.org/project/requests/) with about 14 Million downloads per week, and half a million of repositories that depend on requests as of October 2021.

# Requests: HTTP for humans

Carry out HTTP/1.1 requests using python! An HTTP request is made by a client to a server. For example, when you open a web page in your browser, your device sends a GET request to the web server hosting the page.

The HTTP request contains three elements in the start line: An HTTP method; the request target; and the HTTP version.

For example, when you open the page [ssc.iwr.uni-heidelberg.de](https://ssc.iwr.uni-heidelberg.de/), this is the message that is sent from the client to the server:

GET https://ssc.iwr.uni-heidelberg.de/ HTTP/1.1

The above request contains the request method, GET, the URI of the target, https://ssc.iwr.uni-heidelberg.de/, and the protocol version, HTTP/1.1.

**These are the [main methods](https://www.tutorialspoint.com/http/http_methods.htm) for HTTP/1.1:**
1. GET  
The GET method is used to retrieve information from the given server using a given URI. Requests using GET should only retrieve data and should have no other effect on the data.

1. HEAD  
Same as GET, but transfers the status line and header section only.

1. POST  
A POST request is used to send data to the server, for example, customer information, file upload, etc. using HTML forms.

1. PUT  
Replaces all current representations of the target resource with the uploaded content.

1. DELETE  
Removes all current representations of the target resource given by a URI.

1. CONNECT  
Establishes a tunnel to the server identified by a given URI.

1. OPTIONS  
Describes the communication options for the target resource.

1. TRACE  
Performs a message loop-back test along the path to the target resource.

*Let's start requesting!  
To install requests on your local machine, simply use* `python -m pip install requests`.

In [4]:
import requests as rq
import json # to pretty-print JSON responses

We will start with the above example -  
GET https://ssc.iwr.uni-heidelberg.de/ HTTP/1.1

In [5]:
targetURI = 'https://ssc.iwr.uni-heidelberg.de/'
r = rq.get(url = targetURI)

This did something! Let's check the object that we obtained.

In [6]:
r.status_code

200

There are a couple of status codes that are important. You are probably familiar with 404 Not Found; status codes starting with 2 stand for successful requests; status codes starting with 3 stand for redirections; codes starting with 4 stand for client-side errors.

In [7]:
targetURI = 'https://en.wikipedia.org/wiki/Monty_Python'
r = rq.get(url = targetURI)

In [8]:
r.text

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Monty Python - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"f40525e7-a1f3-49cd-9f72-f68f2e27365b","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Monty_Python","wgTitle":"Monty Python","wgCurRevisionId":1051636274,"wgRevisionId":1051636274,"wgArticleId":18942,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using the EasyTimeline extension","CS1 German-language sources (de)","CS1 maint: unfit URL","Cite iucn maint","Articles with short description","Short descripti

## The HTTP response
The response that you receive from the server contains the status line (as per `r.status_code`), the HTTP headers and a body. 

### The response header

In [9]:
r.headers

{'Date': 'Thu, 28 Oct 2021 06:42:16 GMT', 'Server': 'mw1354.eqiad.wmnet', 'X-Content-Type-Options': 'nosniff', 'P3p': 'CP="See https://en.wikipedia.org/wiki/Special:CentralAutoLogin/P3P for more info."', 'Content-Language': 'en', 'Vary': 'Accept-Encoding,Cookie,Authorization', 'Last-Modified': 'Thu, 28 Oct 2021 06:33:02 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Encoding': 'gzip', 'Age': '9515', 'X-Cache': 'cp3064 miss, cp3058 hit/12', 'X-Cache-Status': 'hit-front', 'Server-Timing': 'cache;desc="hit-front", host;desc="cp3058"', 'Strict-Transport-Security': 'max-age=106384710; includeSubDomains; preload', 'Report-To': '{ "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'NEL': '{ "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}', 'Permissions-Policy': 'interest-cohort=()', 'Set-

In [10]:
r.headers['content-type'] # the dictionary is case-insensitive!

'text/html; charset=UTF-8'

In [11]:
r.encoding # the type of compression that is used

'UTF-8'

The headers contain information in the response headers (like host), the general headers (i.e. information about the connection), and representation headers (ie. content length).
You can also see what cookies were sent back, and how much time elapsed for the processing of the request.

In [12]:
r.cookies # the cookies that the server sent back

<RequestsCookieJar[Cookie(version=0, name='GeoIP', value='DE:BW:Karlsruhe:49.01:8.42:v4', port=None, port_specified=False, domain='.wikipedia.org', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=True, expires=None, discard=True, comment=None, comment_url=None, rest={}, rfc2109=False), Cookie(version=0, name='WMF-Last-Access-Global', value='28-Oct-2021', port=None, port_specified=False, domain='.wikipedia.org', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=True, expires=1638144000, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='WMF-Last-Access', value='28-Oct-2021', port=None, port_specified=False, domain='en.wikipedia.org', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=True, expires=1638144000, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)]>

In [13]:
r.elapsed # time between request send and receiving the response

datetime.timedelta(microseconds=129458)

### The response body
Not all requests come with a body (the payload) - if for example you PUT data on a server, the response does not necessarily entail a body. You can look at the request's body using `r.text` (this one looks at textual data) or `r.content` (automatically detects the encoding also for non-text response content).

In [14]:
r.text

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Monty Python - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"f40525e7-a1f3-49cd-9f72-f68f2e27365b","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Monty_Python","wgTitle":"Monty Python","wgCurRevisionId":1051636274,"wgRevisionId":1051636274,"wgArticleId":18942,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using the EasyTimeline extension","CS1 German-language sources (de)","CS1 maint: unfit URL","Cite iucn maint","Articles with short description","Short descripti

In [15]:
r.content

b'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Monty Python - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"f40525e7-a1f3-49cd-9f72-f68f2e27365b","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Monty_Python","wgTitle":"Monty Python","wgCurRevisionId":1051636274,"wgRevisionId":1051636274,"wgArticleId":18942,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using the EasyTimeline extension","CS1 German-language sources (de)","CS1 maint: unfit URL","Cite iucn maint","Articles with short description","Short descript

### Side note
This doesn't look too pretty - you can use BeautifulSoup (`pip install beautifulsoup4`) to improve it's appearance, but that library can fill up a whole other lunch time.

In [16]:
from bs4 import BeautifulSoup

In [17]:
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Monty Python - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"f40525e7-a1f3-49cd-9f72-f68f2e27365b","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Monty_Python","wgTitle":"Monty Python","wgCurRevisionId":1051636274,"wgRevisionId":1051636274,"wgArticleId":18942,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using the EasyTimeline extension","CS1 German-language sources (de)","CS1 maint: unfit URL","Cite iucn maint","Articles with short description","Sho

In [18]:
print(soup.text)





Monty Python - Wikipedia
document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"f40525e7-a1f3-49cd-9f72-f68f2e27365b","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Monty_Python","wgTitle":"Monty Python","wgCurRevisionId":1051636274,"wgRevisionId":1051636274,"wgArticleId":18942,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using the EasyTimeline extension","CS1 German-language sources (de)","CS1 maint: unfit URL","Cite iucn maint","Articles with short description","Short description matches Wikidata","Use British English from June 2017","Use dmy dates from July 2017",
"Official website different i

### Back to requests
Requests also has a built-in JSON decoder.

In [19]:
r = rq.get('https://api.github.com/events')
r.json()

[{'id': '18637582915',
  'type': 'PullRequestEvent',
  'actor': {'id': 55978474,
   'login': 'ft9dipesh',
   'display_login': 'ft9dipesh',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/ft9dipesh',
   'avatar_url': 'https://avatars.githubusercontent.com/u/55978474?'},
  'repo': {'id': 343661995,
   'name': 'NeatPlus/client',
   'url': 'https://api.github.com/repos/NeatPlus/client'},
  'payload': {'action': 'closed',
   'number': 192,
   'pull_request': {'url': 'https://api.github.com/repos/NeatPlus/client/pulls/192',
    'id': 767096643,
    'node_id': 'PR_kwDOFHvdq84tuPdD',
    'html_url': 'https://github.com/NeatPlus/client/pull/192',
    'diff_url': 'https://github.com/NeatPlus/client/pull/192.diff',
    'patch_url': 'https://github.com/NeatPlus/client/pull/192.patch',
    'issue_url': 'https://api.github.com/repos/NeatPlus/client/issues/192',
    'number': 192,
    'state': 'closed',
    'locked': False,
    'title': 'Bump lint-staged from 11.2.3 to 11.2.6',
    'use

# GET request with parameters
Now let's try to get something useful using requests (apart from that you can use it to crawl the web and download pages!). Let's find out the geographic position of Heidelberg University using [google's geocoding API](https://developers.google.com/maps/documentation/geocoding/overview?_gl=1*oagjnc*_ga*MTk0NjcwNTg2Ni4xNjM1MTUzNjc5*_ga_NRWSTWS78N*MTYzNTE1MzY3OC4xLjAuMTYzNTE1MzY3OC4w). For this, you can generate a trial account on google's website to obtain an API key.

In [20]:
# api-endpoint
URI = 'https://maps.googleapis.com/maps/api/geocode/json'
# API key
key = 'XXXXXXXXXXXXXXXXXXX'

The better practice is to store the key securely outside of the notebook (and adding the configuration file to .gitignore).

In [21]:
import yaml

with open("config.yml", 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)
key = cfg['google_api']['secret_code']

In [22]:
# location to geocode
location = 'university of heidelberg'
country = 'germany'
# defining a params dict for the parameters to be sent to the API
parameters = {'key':key, 'address':location, 'country':country}
# sending get request and saving the response as response object
r = rq.get(url = URI, params = parameters)

In [23]:
r.status_code

200

In [24]:
# extracting data in json format
data = r.json()

In [25]:
print(data)

{'results': [{'address_components': [{'long_name': '1', 'short_name': '1', 'types': ['street_number']}, {'long_name': 'Grabengasse', 'short_name': 'Grabengasse', 'types': ['route']}, {'long_name': 'Heidelberg', 'short_name': 'Heidelberg', 'types': ['locality', 'political']}, {'long_name': 'Heidelberg', 'short_name': 'Heidelberg', 'types': ['administrative_area_level_3', 'political']}, {'long_name': 'Karlsruhe', 'short_name': 'KA', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Baden-Württemberg', 'short_name': 'BW', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'Germany', 'short_name': 'DE', 'types': ['country', 'political']}, {'long_name': '69117', 'short_name': '69117', 'types': ['postal_code']}], 'formatted_address': 'Grabengasse 1, 69117 Heidelberg, Germany', 'geometry': {'location': {'lat': 49.4190991, 'lng': 8.6702507}, 'location_type': 'ROOFTOP', 'viewport': {'northeast': {'lat': 49.42044808029149, 'lng': 8.671599680291502}, 'south

In [26]:
# print this a little prettier
print(json.dumps(data, indent=4, sort_keys=True))

{
    "results": [
        {
            "address_components": [
                {
                    "long_name": "1",
                    "short_name": "1",
                    "types": [
                        "street_number"
                    ]
                },
                {
                    "long_name": "Grabengasse",
                    "short_name": "Grabengasse",
                    "types": [
                        "route"
                    ]
                },
                {
                    "long_name": "Heidelberg",
                    "short_name": "Heidelberg",
                    "types": [
                        "locality",
                        "political"
                    ]
                },
                {
                    "long_name": "Heidelberg",
                    "short_name": "Heidelberg",
                    "types": [
                        "administrative_area_level_3",
                        "political"
                 

In [27]:
address_out = data['results'][0]['formatted_address']
# printing the output
print('Address is {}.'.format(address_out))

Address is Grabengasse 1, 69117 Heidelberg, Germany.


In [28]:
latitude = data['results'][0]['geometry']['location']['lat']
longitude = data['results'][0]['geometry']['location']['lng']
# printing the output
print('Latitude is {} and longitude {}.'.format(latitude, longitude))

Latitude is 49.4190991 and longitude 8.6702507.


# Making a POST request
Again we need an account for this example. This time, we are using the service [pastebin](https://pastebin.com/). You can send text to this address and it will be publicly visible. It serves as a storage for textual data.

In [29]:
# defining the api-endpoint 
api_endpoint = 'https://pastebin.com/api/api_post.php'
# API key
key = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX'

In [30]:
key = cfg['pastebin_api']['secret_code']

In [32]:
# the API option
option = 'paste'
# name/title of your paste
api_paste_name = 'lunch time python'
# syntax highlighting
api_format = 'python'
# this makes a paste public, unlisted or private, public = 0, unlisted = 1, private = 2
private = 0
# the text you want to paste, for example, a code snippet in python
text = '''
print("Hello, lunch time!")
x = 'my lunch'
y = 'your lunch'
print('{} {}'.format(x, y))
'''
# data dictionary, to be sent to api
data = {'api_dev_key':key,
        'api_option':option,
        'api_paste_code':text,
        'api_paste_format':api_format,
       'api_paste_private':private}
  
# sending post request and saving response as response object
r = rq.post(url = api_endpoint, data = data)

In [33]:
r.status_code

200

In [34]:
# extracting response text 
pastebin_url = r.text
print('The pastebin URL is {}'.format(pastebin_url))

The pastebin URL is https://pastebin.com/sR9jhLsY


# Making a PUT request
A PUT request is similar to a POST request, but it is *idempotent*. This means, that in a PUT request the target is replaced. In a POST request, the target appears multiple times. In the above example from pastebin, a POST request generates a new paste, while a PUT request would replace/alter a paste. For the differences between HTTP methods, see [here](https://www.w3schools.com/tags/ref_httpmethods.asp).

For the PUT example, we will use [httpbin](https://httpbin.org/). This is an open service that allows you to test API calls and authetication methods.

In [35]:
# the api-endpoint
api_endpoint = 'https://httpbin.org/put'
# the data to send - we want to receive a JSON response
data_type = 'application/json'
# storing in a dictionary
data = {'accept':data_type}
# Making a PUT request
r = rq.put(url = api_endpoint, data = data)

In [36]:
# check status code for response received
print(r)
print('*************************')
print(r.status_code)
print('*************************')
# print content of request
print(r.content)
print('*************************')
# print recognizing the json response of the request
print(r.json())
print('*************************')
# print this a little prettier
print(json.dumps(r.json(), indent=4, sort_keys=True))

<Response [200]>
*************************
200
*************************
b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "accept": "application/json"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate", \n    "Content-Length": "25", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.22.0", \n    "X-Amzn-Trace-Id": "Root=1-617a6be4-7028fba84c180c351c9f973c"\n  }, \n  "json": null, \n  "origin": "46.223.162.23", \n  "url": "https://httpbin.org/put"\n}\n'
*************************
{'args': {}, 'data': '', 'files': {}, 'form': {'accept': 'application/json'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '25', 'Content-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0', 'X-Amzn-Trace-Id': 'Root=1-617a6be4-7028fba84c180c351c9f973c'}, 'json': None, 'origin': '46.223.162.2

# Advanced topics
There is so much more you can do with requests - for example:
- [sessioning](https://docs.python-requests.org/en/latest/user/advanced/#session-objects) which allows you to re-use the connection to the server (through session pooling, leading to faster requests); 
- [SSL certificate verification](https://docs.python-requests.org/en/latest/user/advanced/#ssl-cert-verification) which allows you to validate the requests;
- [streaming](https://docs.python-requests.org/en/latest/user/advanced/#streaming-requests); 
- and [much more](https://docs.python-requests.org/en/latest/user/advanced/)!