[Table of Contents](../../index.ipynb)

# FRC Analytics with Python - Session 12
# Hyptertext Transfer Protocol
**Last Updated: 3 May 2020**

Hyptertext transfer protocol (HTTP) is the foundation of communications on the World Wide Web. Initial development of HTTP was conducted by [Tim Berners-Lee](https://en.wikipedia.org/wiki/Tim_Berners-Lee) in 1989 while he was a research fellow at the *European Organization for Nuclear Research*, better known as CERN (*Conseil européen pour la recherche nucléaire*).

**First HTTP Server at CERN and Tim Berners-Lee**
![First HTTP Server and Tim Berners-Lee](images/timbl_http.jpg)

You may be wondering why we are studying HTTP in a Python course.
* This isn't just a Python course -- it's a course on FRC analytics that happens to use Python.
* We will use HTTP to retrieve detailed information on FRC competitions from [The Blue Alliance](https://www.thebluealliance.com/) website. We will then use Python tools to analyze this data.
* HTTP can also be used to download FRC competition data from directly from FIRST. It's a bit more difficult to get an authorization key from FIRST, so we will use *The Blue Alliance* in this class. You can learn more about retrieving data from FIRST at https://usfirst.collab.net/sf/projects/first_community_developers/ and https://frcevents2.docs.apiary.io/.

Our FRC team's scouting system uses HTTP for a couple different things:
* Exchanging data between Android tablets and scouting system server that runs on a Windows laptop. The tablets are used to enter data during FRC matches.
* Downloading match schedules from the *FIRST API* or *The Blue Alliance* (so we don’t have to enter them manually.
* We are considering additional uses of HTTP for our system, such as automatically downloading match scores or rankings, so we can display this information along with the scouting data that we collect ourselves.

## A. What is HTTP?
HTTP is a communications protocol, which is a set of rules that allows two different computers to transmit and receive information.

HTTP operates at the application layer. This means HTTP doesn’t care about the physics of how the information is transmitted (wireless, serial cable, cat 6 network cable, etc.)

There are many application layer protocols:
* POP3, SMTP, and IMAP are used for email.
* FTP is used to transfer files.
* NTP is used for synchronizing clocks over a network.
* *There are many more… .*

HTTP messages are classified as requests or responses.
* A client sends a request to a server.
* Based on the information in the request, the servers sends a response back to the client.
* For example, you might use Chrome (an HTTP client) to send an HTTP request to the web server that provides The Blue Alliance web page. The Blue Alliance will send the web page back as an HTTP response.

## B. HTTP Requests
Suppose I want to see if my local library has a copy of one of my favorite sience fiction books, *I Robot*, by Isaac Asimov (copyright 1950). I navigate to their search page at `http://kcls.org/catalog/search` and type `asimov` into the author field and `robot` into the title field. The web browser will send an HTTP request message to the library's web server. The request message could look something like this:
```HTTP
GET catalog/search?author=asimov&title=robot&mediatype=book HTTP/1.1
Accept: text/html
Accept-Language: en-US
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36
```

### Request Line
The first line of the message is the request line.
* The first word in the request line is the request method.
* In the example above, we are using the GET request method, which is the most common method.
* POST is another commonly used request method.
* There are many other methods (e.g., PUT, DELETE, HEAD, CONNECT) that are not as common.

The next portion of the request line tells the server what we want. This example tells the server to to find a page called *search* in a folder named *catalog*. Everything after the the "?" is a **GET parameter** or **query parameter**, in the format of  

    ?{variable_name_1}={value_1}&{variable_name_2}={value_2}

Finally, the last part of the request line specified that the request is per HTTP version 1.1.

### Header Lines
The 2nd and 3rd lines are header lines, consisting of a header name, followed by a colon, followed by a value.
* The *Accept* header tells the server to format the response as HTML text.
* The *Accept-Language* header tells the server that our preferred language for the response is English.
* The *User-Agent* header identifies our browswer and operating system.
* HTTP requests can have any number of headers (including zero headers). There are numerous HTTP header fields. [Click here to see a long list.](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields)

### URL
Notice that there is no URL in the request (e.g., kcls.org). HTTP has no idea how to actually get the request to its destination. It depends on the internet protocol (IP) for that.

### Request Content
HTTP requests can include a blank line after the headers, followed by more text. We won't need to include content in our HTTP requests for this class.

## C. HTTP Responses
Upon receiving and processing my HTTP request for information on the book *I Robot*, the library's web server will send an HTTP response containing the web page with the informatin. The response might look something like this:

```HTTP
HTTP/1.1 200 OK
Date: Sat, 21 Mar 2020 15:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Tue, 26 Jun 2018 13:18:00 GMT
Content-Length: 428
Content-Type: text/html
Connection: Closed

<html>
<head><title>Search Results</title></head>
<body>
<h1>Search Results - I Robot</h1>
...
</body>
</html>
```

HTTP responses are similar to HTTP requests.
* **Status Line:** Responses start with a status line that includes the HTTP version and status code. Every status code has a short name that is included at the end of the line. A status code of 200 (name is *OK*) is the standard response code that indicates the HTTP request was successful. A code in the 400 range indicates there was an error with the request (like 404, *Not Found*) and a code in the 500 range means the request was valid, but the server had an error. [Click here for a list of HTTP codes.](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
* **Header Fields:** Similar to requests, the response contains several header fields. In this example, the response provides the date, identifies the type of server, and reports when the data provided was last modified, reports that the response data is formated as HTML, reports how many characters are being sent, and states that no connection is being maintained between my web browser and the libary's server.
* **Content:** This response contains content, which is included after the headers, following a blank line. In this example, the content is HTML text (i.e., a web page). HTTP Responses provided by the *The Blue Alliance* API or FIRST API will contain text formatted as [Javascript Object Notation (JSON)](https://www.w3schools.com/js/js_json_intro.asp) or [Exensible Markup Language (XML)](https://www.w3schools.com/xml/default.asp), which will be covered in a later session.

## D. Using HTTP in Python
The [Python Standard Library](https://docs.python.org/3/library/index.html), which is included with every Python installation, contains a package, [`urllib`](https://docs.python.org/3/library/urllib.html) for creating and sending HTTP requests and reading the HTTP response.

### 1. Simple HTTP Request
First, let's submit a simple HTTP request to http://httpbin.org/. httpbin.org is a web site that developers can use to test HTTP requests. Run the code below:

In [1]:
# Very Simple HTTP Request
import urllib.request

resp = urllib.request.urlopen('http://httpbin.org/get')
response_content = resp.read()
resp.close()
print(response_content)

b'{\n  "args": {}, \n  "headers": {\n    "Accept-Encoding": "identity", \n    "Host": "httpbin.org", \n    "User-Agent": "Python-urllib/3.8", \n    "X-Amzn-Trace-Id": "Root=1-5f1cf9ec-19862e849506a54863502232"\n  }, \n  "origin": "67.168.114.52", \n  "url": "http://httpbin.org/get"\n}\n'


#### What just happened?
httpbin.org sent us some JSON text that contains information about the HTTP request that we sent. The JSON text is a bit difficult to read - let's clean it up a bit. Run the next cell.

In [2]:
# Very Simple HTTP Request - Nicer Output
import json
import urllib.request

resp = urllib.request.urlopen('http://httpbin.org/get')
response_content = resp.read()
resp.close()
json.loads(response_content)

{'args': {},
 'headers': {'Accept-Encoding': 'identity',
  'Host': 'httpbin.org',
  'User-Agent': 'Python-urllib/3.8',
  'X-Amzn-Trace-Id': 'Root=1-5f1cfb4f-af70aa5dadede0c7c9509eb6'},
 'origin': '67.168.114.52',
 'url': 'http://httpbin.org/get'}

In [7]:
data = json.loads(response_content)
data['headers']['User-Agent']

'Python-urllib/3.8'

That's better. The call to `json.loads()` just converts the JSON text to a Python dictionary, which prints out more nicely in a Jupyter notebook. Don't worry if you don't fully understand it - JSON will be covered in another session.

The `urlopen()` method accepted a uniform resource locater (URL) as a parameter and sent us back the following information:
* The empty braces after `'args':` indicates we did not pass any GET parameters with our request.
* The *headers* section lists all of the headers we included with our request. The *User-Agent* header reveals that the request was generaged by Python's `urllib` package and that we are using version 3.7 of Python.
* httpbin.org also reports the IP address of our client computer and the URL to which we sent our HTTP request.

Note that `urlopen()` returned a response object, which we saved to the variable `resp`.

We used the `read()` method on the [response object](https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse) to get the content of the response. The response object has several other methods, including `geturl()`, `info()`, and `getcode()`.

Finally, we closed the response object with `resp.close()`, converted the response text to a Python object using the `json` package, and displayed the results. Calling the `close()` method helps Python to free up resources that are devoted to the response object.

OK, I know I said that HTTP doesn't care about the URL of the server, yet here we are, passing the URL to the `urlopen()` method. The `urllib` package is using a combination of HTTP and internet protocol (IP) to process our HTTP request.

### 2. Using `with` Syntax With urllib.request
Programmers don't like having to remember to close things. We're like small children, racing into the front yard after our Mom said we can play outside, leaving the front door wide open. Or leaving the refrigerator open. Or the car door. Or cabinets.

Programmers don't have to remember to close things when they use the Python `with` statement. Here is the call to urlopen() from the preceding code cell, rewritten so the call to `resp.close()` is unnecessary.

In [None]:
# Using the 'with' syntax to send a HTTP request.
with urllib.request.urlopen('http://httpbin.org/get') as resp:
    response_content = resp.read()
json.loads(response_content)

The `with` statement will makes sure that the response object, stored in the `resp` variable, is closed when we are finished with it. Pay careful attention to the indentation. The `with` statement will assign the results of the `urlopen()` call to the variable `resp`. All code that accesses the response object should be indented below the `with` statement.

Most online examples for the `urllib` package will use the `with` statement syntax.

### 3. HTTP Request with Parameters
Now let's try adding some GET parameters. Write the code to submit an HTTP request and read the response just like in section D.1, but add two GET parameters:  
team = 1318 and city = issaquah.

Review the GET Request section above if you don't remember how to do this. You do not need to repeat the imports as long as you ran the cell above. (Run the above cell again if you get an import error.)

In [None]:
# Send a GET request with parameters in this code block.



If you did it right, your response should look something like this:
```JSON
{'args': {'city': 'issaquah', 'team': '1318'},
 'headers': {'Accept-Encoding': 'identity',
  'Host': 'httpbin.org',
  'User-Agent': 'Python-urllib/3.7',
  ...
 ```

### 4. Adding Headers
Now let's add a header. Let's send an authorization key to the server. An authorization key is very similar to a password. But unlike standard user passwords, which are intended to be memorizable and manually typed into a password field, authorization keys are generally very long hexadecimal or random character strings.

The standard header for sending an authorization key is *Authorization*. Run the code cell below:

In [None]:
# Adding an authorization header to an HTTP request.
req = urllib.request.Request('http://httpbin.org/get')
req.add_header('Authorization', 'my_insecure_auth_key')

with urllib.request.urlopen(req) as resp:
    response_content = resp.read()
json.loads(response_content)

OK, so we can't just pass a URL into `urlopen()` when we want to add or modify headers. We have to create a `urllib.request.Request` object, add the headers to that object, and then we can pass the `Request` object to `urlopen()` instead of the URL string.

We can add multiple headers at one time by passing the headers as a list ot tuples. Run the code cell below:

In [None]:
# Adding or modifying multiple HTTP headers.
headers = {'Authorization': 'my_insecure_auth_key', 'User-Agent': 'Pyclass-FRC Notebook'}
req = urllib.request.Request('http://httpbin.org/get', headers=headers)

with urllib.request.urlopen(req) as resp:
    response_content = resp.read()
json.loads(response_content)

Instead of using the `add_header()` method, we passed a dictionary of headers to the `headers` argument of the `Request` object constructor. This is a useful technique for adding several headers at one time, but we could have just called `.add_header()` twice, once for the *Authorization* header and once for the *User-Agent* header.

Also note that the `urlopen()` had already been including a *User-Agent* header by default. We were able to change the value of this header by adding it to the headers dictionary.

Now you try. Send an HTTP request that includes an *Authorization* header, an *Accept* header telling the server we want the result formatted as JSON (set value to *application/json*), and overwrite the *User-Agent* header with a value of your choice.

In [None]:
# Send an HTTP request with multiple HTTP Headers here.



If you did it right, your response should look something like this:
```JSON
{'args': {},
 'headers': {'Accept': 'application/json',
  'Accept-Encoding': 'identity',
  'Authorization': 'my_insecure_auth_key',
  'Host': 'httpbin.org',
  'User-Agent': 'Pyclass_multi_header_exercise',
  'X-Amzn-Trace-Id': 'Root=1-5e766683-01bd4cc86717ddf71c29f27b'},
 'origin': '192.168.13.18',
 'url': 'http://httpbin.org/get'}
```

In [None]:
# Adding or modifying multiple HTTP headers.
headers = {'Authorization': 'my_insecure_auth_key', 'User-Agent': 'Pyclass_multi_header_exercise',
          'Accept': 'application/json'}
req = urllib.request.Request('http://httpbin.org/get', headers=headers)

with urllib.request.urlopen(req) as resp:
    response_content = resp.read()
json.loads(response_content)

## E. Exercises

1. Send an HTTP request and use the `getcode()` name to get the response status code. What method of the response object will get the name of the code (e.g., "OK" for code 200).

In [None]:
# Using getcode()



2. Send an HTTP request with three query parameters of your choice. Use the `info()` method on your response.

In [None]:
# Practice sending an HTTP request with multiple query parameters



3. Send an HTTP request with the additional headers *Accept-Charset* and *Accept-Language*. Appropriate values for these headers are *utf-8* and *en-US*.

In [None]:
# Practice adding headers to HTTP requests.



## F. Concept and Terminology Review
You should be able to define the following terms or drescribe the concept. 
* URL
* HTTP Request
* HTTP Response
* HTTP Header
* Query Parameter
* HTTP Response code
* `with` statement
* HTTP Content

## G. Further Study

This session covers the bare minimum. Students are encouraged to review other HTTP references.
* [TutorialsPoint HTTP Tutorial](https://www.tutorialspoint.com/http/http_quick_guide.htm)
* [Wikipedia Article](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)
* http://httpbin.org/

[Table of Contents](../../index.ipynb)