<div style="position: relative;">
<img src="https://user-images.githubusercontent.com/7065401/98728503-5ab82f80-2378-11eb-9c79-adeb308fc647.png"></img>

<h1 style="color: white; position: absolute; top:27%; left:10%;">
    Introduction to HTTP using Python
</h1>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:56%; left:10%;">
    David Mertz, Ph.D.
</h3>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:63%; left:10%;">
    Data Scientist
</h3>
</div>

# Requests

The library called `requests` is widely used in Python as an HTTP client, but is not part of the Python standard libary.  The standard libary module `urllib.request` is capable of performing the same actions, but the API for `requests` is simply cleaner, more robust, and more modern. 

As an organizational or project matter, the `requests` project remains separate from Python itself to allow a different and more rapid schedule for improvements to the library.  However, Python distributions such as Anaconda always include `requests`, and it is easy to install with `pip` or `conda`.

We look briefly at different levels of abstraction available in different modules, but the main of this section will discuss the interfaces for `requests`.

For backwards compatiblity, the standard library provides interfaces that are compatible back to Python 1.x even, so even **very** old code can run on modern Python versions with minimal changes.

## Module: `http`

In the standard library, at a level slightly higher than `telnetlib` is a collection of interfaces in `http`.  This is still very basic, and the documentation warns that you will rarely wish to use this module directly.  However, in an abstraction beyond `telnetlib`, it supports both HTTP and HTTPS, and also has an interface that shows some HTTP specificity.

In [1]:
import http
host = 'popbox.kdm.local'
port = 2502

A small server is running locally that produces a new CSV file each time it is called.

In [4]:
conn = http.client.HTTPConnection(host, port=port)
conn.request("GET", "/data")
resp = conn.getresponse()
print(resp.status, resp.reason, end='\n\n')
print(resp.read().decode())

200 OK

GREETING,NUMBER,VALUE
God dag,678,63.25352978291776
Nǐ hǎo,88,16.25814579556293
Cześć,338,99.25874597513292
Ahlan,730,1.3803433119996544
Hei,32,59.23039588230479
Anyoung haseyo,206,23.040065196890914
Halo,286,47.42533996871001
Habari,750,51.99738661469423
Salve,974,11.956373483746086
Olá,200,71.71176779208518
Hi,825,33.15209826369285
Helo,419,9.590077105133354
Ciao,499,34.678828533429495
Yassas,999,98.35161322448815
Hei,486,47.773038874504



As long as your Python executable was compiled with SSL/TLS support, the module `http` can support encrypted connections.  This is done by instantiating a difference class than for plain HTTP connections.

In [5]:
conn = http.client.HTTPSConnection('docs.python.org', port=443)
conn.request("GET", "/3/library/http.client.html")
resp = conn.getresponse()
data = resp.read().decode()

print(resp.status, resp.reason)
print(data[:500])
print('\n<!-- LOTS MORE -->\n')
print(data[-300:])

200 OK

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta charset="utf-8" />
    <title>http.client — HTTP protocol client &#8212; Python 3.9.5 documentation</title>
    <link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
    <script src="../_static/jquery.js">

<!-- LOTS MORE -->

 <br />

    Last updated on Jun 09, 2021.
    <a href="https://docs.python.org/3/bugs.html">Found a bug</a>?
    <br />

    Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 2.4.4.
    </div>

    <script type="text/javascript" src="../_static/switchers.js"></script>
  </body>
</html>


At this level, the module does not handle contingencies automatically.  For example, if your page is redirected, you would need to write code to handle that yourself (perhaps in your own higher-level module or library).

In [6]:
conn = http.client.HTTPConnection(host, port)
conn.request("GET", "/redirect")
resp = conn.getresponse()
print(resp.status, resp.reason)
print(resp.headers, end='')
print(resp.read().decode())

301 MOVED PERMANENTLY
Content-Type: text/html; charset=utf-8
Content-Length: 244
Location: http://kdm.training
Server: Werkzeug/2.0.0 Python/3.8.10
Date: Thu, 10 Jun 2021 00:56:58 GMT

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="http://kdm.training">http://kdm.training</a>. If not click the link.


It is not too difficult to use the provided header field `Location` along with the status code to try a new URL for the resource.  However, in this case, even doing that would not be the end of your journey.  Multiple redirects are not uncommon among real-world web pages.

In [7]:
conn = http.client.HTTPConnection('kdm.training')
conn.request("GET", "/")
resp = conn.getresponse()
print(resp.status, resp.reason)
print(resp.headers, end='')
print(resp.read().decode())

302 Found
Location: http://www.gnosis.cx/kdm/
Date: Thu, 10 Jun 2021 00:58:47 GMT
Content-Type: text/html; charset=UTF-8
Server: ghs
Content-Length: 222
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.gnosis.cx/kdm/">here</A>.
</BODY></HTML>



## Module `urllib.request`

Moving up in abstraction, the standard library contains a fairly high-level interface as `urllib.request`.  If `requests` is not available for you, this is a reasonable level of abstraction for communicating with HTTP servers as a client.  Moroever, a function called `urlopen()` has been a feature of Python since 1.x days, with a backward compatible interface.  The namespace where the function liveds has changed, but its basic functionality has not.

In [9]:
from urllib.request import urlopen, Request

resp = urlopen('http://popbox.kdm.local:2502/data')
print(resp.headers, end='')
csv = resp.read()
print(csv.decode())

Content-Type: text/csv; charset=utf-8
Content-Length: 519
X-INE-Course: HTTP using Python
Server: Werkzeug/2.0.0 Python/3.8.10
Date: Thu, 10 Jun 2021 01:25:10 GMT

GREETING,NUMBER,VALUE
Hej,15,18.072358757516692
Hej,749,78.5879823181439
Yā,587,52.40747930679911
Asalaam alaikum,375,43.50397292753433
God dag,461,20.555109155721286
Tjena,408,22.659251102871657
Zdravstvuyte,634,55.465223711850456
Salut,331,18.361883493688435
Selam,334,1.5244974003879475
Halo,902,1.3263046788196342
Hej,2,24.313384029364617
Merhaba,777,66.97691374555943
Hej,30,62.43334079549031
Nǐn hǎo,66,52.3790590960114
Hola,378,91.02782106855184
Nǐn hǎo,636,11.889578738986806
Yassas,424,54.708859936012374



With the CSV we now have in a local variable, let's perform a small calculation; i.e. the sum of the third column.

In [10]:
sum(float(line.split(',')[2]) for line in csv.decode().splitlines()[1:])

676.19302026331

The server running has a path that will perform this same sum of an uploaded CSV file.  This lets us illustrate another HTTP method other than GET.  Either PUT or POST would make sense here, we choose the latter.  When this request is made, the request line indicates the POST method, then the request header is followed by a body that has additional content that the server will operate on.

This path on the server returns an answer with the content type `text/plain`, but others like `application/json` are also plausible for more structured return data.

In [11]:
req = Request(url='http://popbox.kdm.local:2502/add', data=csv, method='POST')
with urlopen(req) as resp:
    print(resp.headers, end='')
    print(resp.read().decode())

Content-Type: text/plain; charset=utf-8
Content-Length: 15
Server: Werkzeug/2.0.0 Python/3.8.10
Date: Thu, 10 Jun 2021 01:28:32 GMT

676.19302026331


Let's also try to access that URL that gave us redirect messages earlier, from the low-level `http` module.

In [12]:
try:
    resp = urlopen('http://popbox.kdm.local:2502/redirect')
except Exception as err:
    print(err)

HTTP Error 403: Forbidden


Let us look at the full and ugly traceback to try to understand where `urlopen()` went wrong.

In [13]:
urlopen('http://popbox.kdm.local:2502/redirect')

HTTPError: HTTP Error 403: Forbidden

## Library: `requests`

The library `requests` handle many details for us in a higher-level, friendlier, and more robust way than does `http.client`.

## Robust redirects

For example, let's try the redirected URL that failed to resolve properly before.  The `requests.get()` call manages to sort through all the redirections among reverse proxy servers, switching from HTTP to HTTPS over the redirects, and so on.

In [14]:
import requests
respA = requests.get('http://popbox.kdm.local:2502/redirect')
print(respA.status_code, respA.reason)
for k, v in respA.headers.items():
    print(f'{k}: {v}')

200 OK
Date: Thu, 10 Jun 2021 01:36:12 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Last-Modified: Tue, 23 Oct 2018 16:02:22 GMT
Vary: Accept-Encoding
CF-Cache-Status: DYNAMIC
cf-request-id: 0a95291aa9000010a5340e1000000001
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v2?s=GvHpm%2B9lXH7KmuhcMhMT0Fs%2B%2F4EOvj%2FmvezMTHwQQHPMgneiuNqiHC90aeCbHrpJvT%2BDf13%2BhT8mOYYnYccoO15cZXPxSh8H%2FUM5vlJ2OR2LVOvLSeYgxZJ7Fw%3D%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"report_to":"cf-nel","max_age":604800}
Server: cloudflare
CF-RAY: 65cede0a989010a5-ORD
Content-Encoding: gzip
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400


In [15]:
print(respA.content.decode()[:463])

<!DOCTYPE html>
<html lang="en">
<head>
<title>KDM Training</title>
<link rel="icon" type="image/png" href="https://gnosis.cx/kdm/favicon.png">
</head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="description" content="Description of KDM Training services and principals">
<meta name="author" content="David Mertz">

<link href="vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">


## Response attributes and methods

Let's connect to a resource that starts out as HTTPS.

In [16]:
url = 'https://docs.python.org/3/library/urllib.request.html'
respB = requests.get(url)
print(respB.status_code, respB.reason)
for k, v in respB.headers.items():
    print(f'{k}: {v}')

200 OK
Connection: keep-alive
Content-Length: 26739
Server: nginx
Content-Type: text/html
Last-Modified: Wed, 09 Jun 2021 18:45:42 GMT
ETag: "60c10c56-2d950"
X-Clacks-Overhead: GNU Terry Pratchett
Strict-Transport-Security: max-age=315360000; includeSubDomains; preload
Content-Encoding: gzip
Via: 1.1 varnish, 1.1 varnish
Accept-Ranges: bytes
Date: Thu, 10 Jun 2021 01:59:18 GMT
Age: 25980
X-Served-By: cache-lga21982-LGA, cache-mdw17347-MDW
X-Cache: MISS, HIT
X-Cache-Hits: 0, 1
X-Timer: S1623290358.011437,VS0,VE1
Vary: Accept-Encoding


In [17]:
print(respB.text[:500])
print('\n<!-- LOTS MORE -->\n')
print(respB.text[-300:])


<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta charset="utf-8" />
    <title>urllib.request â Extensible library for opening URLs &#8212; Python 3.9.5 documentation</title>
    <link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
    <script src="..

<!-- LOTS MORE -->

 <br />

    Last updated on Jun 09, 2021.
    <a href="https://docs.python.org/3/bugs.html">Found a bug</a>?
    <br />

    Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 2.4.4.
    </div>

    <script type="text/javascript" src="../_static/switchers.js"></script>
  </body>
</html>


The response object is quite rich.  The last cell used the `.text` attribute for the decoded binary content.  But, for example, we also still have the body as binary content available.  If the body could not be decoded as text—e.g. if it is binary data—the `.text` attribute would simply not have content, nothing would fail.

In [18]:
respB.content[:210].decode()

'\n<!DOCTYPE html>\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n  <head>\n    <meta charset="utf-8" />\n    <title>urllib.request — Extensible library for opening URLs &#8212; Python 3.9.5 documentation</title>\n '

A nice capability is that URL remains an attribute of the response object.  In the case where that is identical the URL provided, this doesn't matter so much; but in other cases it may reflect the resolution of multiple redirects.

In [19]:
respB.url

'https://docs.python.org/3/library/urllib.request.html'

In [20]:
respA.url

'https://www.gnosis.cx/kdm/'

## Cookies and sessions

A variety of other mechanisms for HTTP interactions are common.  For example, "cookies" are a mechanism by which a server can ask a client to store some state about the relationship between the ends.  This information is communicated in a header called `Set-Cookie` from the server, and may be provided by the client in a header called `Cookie`.  Let's look at an example:

In [21]:
respC = requests.get('https://ine.com')
for cookie, val in respC.cookies.items():
    print(f'{cookie:>20}: {val}')

       _landing_page: %2F
      _orig_referrer: 
                  _s: 5b0a4414-1d56-4082-8a55-b19940b23a2e
          _shopify_s: 5b0a4414-1d56-4082-8a55-b19940b23a2e
          _shopify_y: b04ad885-ff54-4ba1-8500-ee246573f67a
                  _y: b04ad885-ff54-4ba1-8500-ee246573f67a
       cart_currency: USD
 secure_customer_sig: 


Sometimes cookies are human-interpretable values, such as "USD" in the INE example.  Very often they are UUIDs or other random-like values that simply identify the client as the same one across interactions.  Web browsers often maintain cookies for particular sites, which allows a continuity of the relationship, even if the web browser or entire computer is shut down.

Within a standardized use of HTTP headers, cookies consist of mandatory values and optional modifiers, separated by semicolons.  Multiple `Set-Cookie` headers can be used, but if they are, `requests` concatenates them in one field, separated by commas.  Many servers also provide this comma separated format in the responses.  Let's look at what INE sent in raw form.

In [22]:
respC.headers['Set-Cookie']

'secure_customer_sig=; path=/; expires=Fri, 10 Jun 2022 02:06:49 GMT; secure; HttpOnly, cart_currency=USD; path=/; expires=Thu, 24 Jun 2021 02:06:49 GMT, _orig_referrer=; Expires=Thu, 24-Jun-21 02:06:49 GMT; Domain=ine.com; Path=/; HttpOnly; SameSite=Lax, _landing_page=%2F; Expires=Thu, 24-Jun-21 02:06:49 GMT; Domain=ine.com; Path=/; HttpOnly; SameSite=Lax, _y=b04ad885-ff54-4ba1-8500-ee246573f67a; Expires=Fri, 10-Jun-22 02:06:49 GMT; Domain=ine.com; Path=/; SameSite=Lax, _s=5b0a4414-1d56-4082-8a55-b19940b23a2e; Expires=Thu, 10-Jun-21 02:36:49 GMT; Domain=ine.com; Path=/; SameSite=Lax, _shopify_y=b04ad885-ff54-4ba1-8500-ee246573f67a; Expires=Fri, 10-Jun-22 02:06:49 GMT; Domain=ine.com; Path=/; SameSite=Lax, _shopify_s=5b0a4414-1d56-4082-8a55-b19940b23a2e; Expires=Thu, 10-Jun-21 02:36:49 GMT; Domain=ine.com; Path=/; SameSite=Lax'

This format is a bit cumbersome to work with manually, since commas can both separate cookies and occur within some date formats for the `expires` modifer.  Happily, `requests` does this for you, as we see above.

In [23]:
for cookie in respC.cookies:
    print(f'{cookie.name}: {cookie.value}')
    print(f'  expires: {cookie.expires}, expired {cookie.is_expired()}')

_landing_page: %2F
  expires: 1624500409, expired False
_orig_referrer: 
  expires: 1624500409, expired False
_s: 5b0a4414-1d56-4082-8a55-b19940b23a2e
  expires: 1623292609, expired False
_shopify_s: 5b0a4414-1d56-4082-8a55-b19940b23a2e
  expires: 1623292609, expired False
_shopify_y: b04ad885-ff54-4ba1-8500-ee246573f67a
  expires: 1654826809, expired False
_y: b04ad885-ff54-4ba1-8500-ee246573f67a
  expires: 1654826809, expired False
cart_currency: USD
  expires: 1624500409, expired False
secure_customer_sig: 
  expires: 1654826809, expired False


Let's say we'd like to change the currency used to BTC (not actually supported by INE).  Unfortunately, just indexing to the relevant key gives us the string rather than the richer data structure carrying modifiers.  For quick access to values, this is better, but it makes modification a litle bit more roundabout.

In [24]:
respC.cookies['cart_currency']

'USD'

In [25]:
currency = [c for c in respC.cookies if c.name == 'cart_currency'][0]
currency

Cookie(version=0, name='cart_currency', value='USD', port=None, port_specified=False, domain='ine.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=1624500409, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)

In [26]:
currency.value = 'BTC'
currency

Cookie(version=0, name='cart_currency', value='BTC', port=None, port_specified=False, domain='ine.com', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=1624500409, discard=False, comment=None, comment_url=None, rest={}, rfc2109=False)

Now we might make a new request to INE—perhaps to a different page within the site—sending back the (partially modified) cookies we got from the server.

In [27]:
respD = requests.get('https://ine.com/pages/plans', cookies=respC.cookies)
print(respD, respD.raw.version)  # Funny HTTP/1.1 representation
for k, v in respD.headers.items():
    print(f'{k}: {v}')

<Response [200]> 11
Date: Thu, 10 Jun 2021 02:16:41 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Sorting-Hat-PodId: 59
X-Sorting-Hat-ShopId: 17217507
X-Storefront-Renderer-Rendered: 1
Set-Cookie: secure_customer_sig=; path=/; expires=Fri, 10 Jun 2022 02:16:41 GMT; secure; HttpOnly, cart_currency=USD; path=/; expires=Thu, 24 Jun 2021 02:16:41 GMT, _y=b04ad885-ff54-4ba1-8500-ee246573f67a; Expires=Fri, 10-Jun-22 02:16:41 GMT; Domain=ine.com; Path=/; SameSite=Lax, _s=5b0a4414-1d56-4082-8a55-b19940b23a2e; Expires=Thu, 10-Jun-21 02:46:41 GMT; Domain=ine.com; Path=/; SameSite=Lax, _shopify_y=b04ad885-ff54-4ba1-8500-ee246573f67a; Expires=Fri, 10-Jun-22 02:16:41 GMT; Domain=ine.com; Path=/; SameSite=Lax, _shopify_s=5b0a4414-1d56-4082-8a55-b19940b23a2e; Expires=Thu, 10-Jun-21 02:46:41 GMT; Domain=ine.com; Path=/; SameSite=Lax
Link: <https://cdn.shopify.com>; rel=preconnect, <https://cdn.shopify.com>; rel=preconnect; crossorigin
ETag: cacheable:37

Often when a client interacts with a server repeatedly, the cookies are carried around in each request.  It is certainly feasible to pass a `cookie=` parameter to each new request, but a concept called *sessions* make this somewhat easier and more automatic.

In [28]:
session = requests.Session()
# Use the session rather than the requests module
respE = session.get('https://ine.com/')
respE

<Response [200]>

The session maintains cookies across requests. For example:

In [29]:
session.cookies['_s']

'2b8ba258-8dcd-409c-a8e0-6d16a14d5bd1'

In [30]:
respF = session.get('https://ine.com/pages/plans')
respF

<Response [200]>

In [31]:
respF.cookies['_s']

'2b8ba258-8dcd-409c-a8e0-6d16a14d5bd1'

### Forms

A common way of interacting with web pages is by filling in forms.  This can be done within a web browser, of course, but it can also be done programmatically in `requests`.  

Notice that in `requests` using an alternate HTTP method is reflected in the function name rather than as an argument (`requests.post()` is also available, for example).

In [32]:
from IPython.core.display import HTML

info = {'name': 'David', 'color': 'Stygian Blue', 'bday': 'September 12'}
resp = requests.put('http://popbox.kdm.local:2502/form', data=info)
HTML(resp.text)

In [33]:
info = {'name': 'Marjorie', 'color': 'Blood Red', 'bday': 'June 31'}
resp = requests.put('http://popbox.kdm.local:2502/form', data=info)
HTML(resp.text)

## Streaming

In an earlier lesson, we streamed slowly arriving HTTP data using the raw telnet interface.  Using `request` makes this easier; the essential element is just to indicate `stream=True` in the request.

The example below also presents a good manner of arranging `requests` code, using a context manager to handle the life of the connection. This could be used in all the other examples; the current example would work fine in the prior `resp = requests.get(...)` style also.

In [34]:
from time import time

start = time()
with requests.get('http://popbox.kdm.local:2502/stream', stream=True) as respS:
    for line in respS.iter_lines(chunk_size=1):
        if line:
            print(f'{time()-start:04.1f}s: {line.decode()}')

00.1s: God dag
02.0s: Ahlan
05.0s: Witaj
06.6s: Hej
08.3s: Halløj
10.1s: Hej
12.6s: Hujambo
13.9s: Yā
16.8s: Hi
17.2s: Oi


## Authentication

The `requests` libary can handle authentication using Basic Authentication, Digest Authentication, OAuth 1, and OAuth 2 (OpenID).  The architecture is designed to allow additional methods, and third-party plugins exist for some, including Kerberos and NTLM.  Let's demonstrate one login.  First we attempt to get a protected resource without authentication.

In [35]:
resp = requests.get('http://popbox.kdm.local:2502/secure')
resp

<Response [401]>

We try again, but with incorrect credentials.

In [36]:
from requests.auth import HTTPBasicAuth
resp = requests.get('http://popbox.kdm.local:2502/secure', 
                    auth=HTTPBasicAuth('David', 'badpass'))
resp

<Response [401]>

In [37]:
resp = requests.get('http://popbox.kdm.local:2502/secure', 
                    auth=HTTPBasicAuth('David', '4bYaDZCFsTY4'))
resp, resp.text

(<Response [200]>, 'Hello, David!')

## Wrapping up

This section first looked at some lower-level Python standard library modules for client-side operations with HTTP. Then we turned to the third-party `requests` module which has rich and friendly capabilities.  Given the option, `requests` will be your *go to* tool for client-side HTTP.  There may be situations where you will need only to use the standard library.

As was mentioned in earlier lessons, some details of programmatic HTTP services are addressed in in more depth in the course *Secure RESTful APIs using Python*. For example, we have used several `Content-Type` headers, and received several different status codes, but those are discussed at greater length there.

In the last section of this course, we look at writing servers, including those that clients have communicated with in the last several lessons.