[Table of Contents](../index.ipynb)

# Analyzing Blue Alliance Data with Python - Part I
[The Blue Alliance](http://www.thebluealliance.com) is an extensive online repository of FIRST Robotics Competition (FRC) data. The Blue Alliance (TBA) contains detailed information on every single FRC match. [Click here for an example.](https://www.thebluealliance.com/match/2020wasno_qf2m1)

In addition providing information website, TBA makes all FRC competition data available via hyptetext transfer protocol (HTTP). The FRC data is provided as javascript object notation (JSON) text, which can be read by Python. This allows us to find new ways to analyze and visualize FRC data.

In this notebook, we will walk through retrieving FRC JSON data via HTTP, reading that data into Python, and generating a graph to visualize the data.

## A. Preparations
1. Get a *Blue Alliance* API authorization key. See [See Instructions here](blue_alliance_api_key.ipynb).
2. Review [session 07 on Hypertext Transfer Protocol](../sessions/s07_http/s07_http.ipynb)

## B. Getting Started - Retrieve a List of Districts
### 1. Getting the Format of the URL Command
Users can retrieve data from *The Blue Alliance* API (TBA-API) by sending a specially formatted HTTP request. The data that is returned by TBA-API depends on the URL that is included in the HTTP request that is sent to TBA-API. [A complete list of URLs that are accepted by TBA-API is available here.](https://www.thebluealliance.com/apidocs/v3)

There is a section on FRC districts about two-thirds of the way to the bottom of the API documentation page. Here is an extract from that section that lists three of the many URL formats accepted by TBA-API:
![TBA API District Instructions](images/tba_api_district1.png)

Let's try the last command in the list: `/districts/{year}`. 

First of all, we need the root portion of the URL for TBA-API. It's actually at the [TBA-API documentation page.](https://www.thebluealliance.com/apidocs/v3)

In [None]:
# Root portion of TBA-API URL
root_url = 'https://www.thebluealliance.com/api/v3'

### 2. TBA Authorization Key and Header
Next we need our TBA authorization key. For this project, save the authorization key in a Python file named *auth.py*. The key should be assigned as a string value to a variable called `key`:
```Python
key = 'uytfuiguytaiufygsoigyiyidfsiufyiaduyfaiudfy` # This is not a real key.```

When the key is saved in this manner, we can access it by importing the *auth* module.

In [None]:
# Getting the authorization key from an external file
import auth
key = auth.key

Now we can start assembling the HTTP request. Per the instructions on the [TBA-API overview page](https://www.thebluealliance.com/apidocs), the authorization key must be passed to TBA-API in a custom header called *X-TBA-Auth-Key*.

In [None]:
# Assembling The authorization header
hdrs = {'X-TBA-Auth-Key': key}
hdrs

### 3. Building the URL
Finally we are ready to assemble the full URL. TBA-API documentation uses curly braces to denote user-provided parameters. The URL command `/districts/{year}` means we should replace `{year}` with a four-digit year correspondong to an FRC season, such as 2019 or 2020. We'll build the full URL by joining the root portion with the command portion, replacing the `{year}` parameter with the current year:

In [None]:
# Building the Full URL
year = 2020
url = root_url + "/districts/" + str(year)
url

### 4. Assembling and Sending the HTTP Request
Now it's time to assemble and send the HTTP request.

In [None]:
# Assembling the HTTP request.

# Importing a module for sending HTTP requests
import urllib.request

# Adding a user agent header
hdrs['User-Agent'] = 'HTTP Training Project'

req = urllib.request.Request(url, headers = hdrs)

Finally we are ready to send the request.

In [None]:
import json

with urllib.request.urlopen(req) as resp:
    resp_text = resp.read()
    districts = json.loads(resp_text)
districts[:3]

If data on three FRC districts was displayed, then everything worked as planned. The `urlopen()` method sent the HTTP request, TBA-API responded with infomration on all 2020 FRC districts, formatted as JSON text. We used the `json.loads()` function from the Standard Library's *json* module to convert the JSON text into a Python list of dictionary objects, with one dictionary for each FRC district. Then we displayed the first three districts from the list. It's easy to see how many districts were returned by using Pthon's `len()` function:

In [None]:
# Get number of districts returnd from TBA-API
len(districts)

## C. Structure of TBA-API URLs
Now we'll take an in-depth look at the HTTP request we just sent. First, let's disect the URL to whhich we submitted our HTTP request:
```
https://www.thebluealliance.com/api/v3/districts/2020
```

The URL can be broken down into the following sections:

#### The Scheme - `https:`
The first portion of the URL, `https:`, is called the *scheme*. The abbreviation *https* stands for the secure version of Hypertext Transfer Protocol Secure (HTTPS), which is just a secure version of HTTP. Fortunately we can submit an HTTPS request the same way we would submit an HTTP request. The *https:* portion of the URL signals to Python (or a browser) that an added layer of encryption should be used to send the request and receive the response. From our perspective, all of the encryption and decryption work is done automatically. Other popular schemes include plain old HTTP, mailto, and ftp (File Transfer Protocol).

#### The Slashy Thingies -  `//`
As far as I can tell, the two slashes don't really mean anything. They just separate the scheme from the rest of the URL.

#### The Domain Name - `www.thebluealliance.com`
*Thebluealliance.com* is the domain name that the owners of TBA registered in order to set up their website. The *www* portion of the domain name might signify a specific server, or the TBA developers might have added it just for looks.

#### The Path - `/api/v3/districts/2020`
Here is where things get interesting. This part of the URL is the *path*. In the old days (like the 90s), the path portion of a URL often represented the file structure on a server. A URL such as `http://nineties.com/collections/tamagotchi/index.html` told the web server that it should find a file called *index.html* in the *tamagotchi* subdirectory of the *collections* directory. That's still true today for many websites, including the website that is hosting the notebook you are reading right now (look at the URL). 

For security reasons, most popular websites prefer for their users to NOT be able to see their server's directory structure. The parts of the path that look like directories and subdirectories have nothing to do with the server's file structure. Instead they are just parameters that the server uses to generate the HTTP response.

My guess is that `api/v3` tells the web server to forward the HTTP request to a program that accesses a database and formulates the HTTP response. The `districts` and `2020` portions of the URL represent parameters. The TBA developers could have set up the server to use traditional query parameters so the URL would have looked like `https://apiv3.thebluealliance.com?cmd=districts&year=2020`, ..., but they didn't. There is a commonly-accepted set of best practices for designing URL syntaxes for APIs and this URL would violate those best practices. We won't discuss that now because API design is beyond the scope of this project (but if you must know more, search for *RESTful APIs*).

The bottom line is that for TBA-API, all items that follow `api/v3/` are arguments that are passed to the TBA-API server, much like you would pass arguments to a Python function. Slashes are used to separate different arguments.

One final note - I am proud to say that I never owned a Tamagotchi. Not that there's anything wrong with that.
![Tamagotchi](images/tamagotchi.jpg)

## D. Dissecting the HTTP Response
We retrieved the content of the HTTP response by calling the `read()` method on the response object. TBA-API sent us the following text:

```json
b'[\n {\n  "abbreviation": "chs", \n  "display_name": "FIRST Chesapeake", \n  "key": "2020chs", \n  "year": 2020\n }, \n {\n  "abbreviation": "fim", \n  "display_name": "FIRST In Michigan", \n  "key": "2020fim", \n  "year": 2020\n }, \n {\n  "abbreviation": "fma", \n  "display_name": "FIRST Mid-Atlantic", \n  "key": "2020fma", \n  "year": 2020\n }, \n {\n  "abbreviation": "fnc", \n  "display_name": "FIRST North Carolina", \n  "key": "2020fnc", \n  "year": 2020\n }, \n {\n  "abbreviation": "in", \n  "display_name": "Indiana FIRST", \n  "key": "2020in", \n  "year": 2020\n }, \n {\n  "abbreviation": "isr", \n  "display_name": "FIRST Israel", \n  "key": "2020isr", \n  "year": 2020\n }, \n {\n  "abbreviation": "ne", \n  "display_name": "New England", \n  "key": "2020ne", \n  "year": 2020\n }, \n {\n  "abbreviation": "ont", \n  "display_name": "Ontario", \n  "key": "2020ont", \n  "year": 2020\n }, \n {\n  "abbreviation": "pch", \n  "display_name": "Peachtree", \n  "key": "2020pch", \n  "year": 2020\n }, \n {\n  "abbreviation": "pnw", \n  "display_name": "Pacific Northwest", \n  "key": "2020pnw", \n  "year": 2020\n }, \n {\n  "abbreviation": "tx", \n  "display_name": "FIRST In Texas", \n  "key": "2020tx", \n  "year": 2020\n }\n]'
```

It looks right - it clearly contains information on FRC districts, but it's not too easy to read is it? Also, since it's a string, extracting useful information is tedious. Suppose I want the key value for the Pacific Northwest district. I could pull it out using a slicing technique, but first I would need to figure out that the key is located at positions 1037 - 1044 withing the string.

In [None]:
# Run this cell to see the key value for the PNW district.
resp_text[1037:1044]

#### Converting to JSON
Fortunately, the Python Standard Library includes a package for reading [JSON](https://docs.python.org/3/library/json.html). We can conver the string into a useful Python object with the line `districts = json.loads(resp_text)`. Run the next cell to see the results.

In [None]:
districts

But what is this exactly? Let's ask Python!

In [None]:
# Run this cell to see the data type.
type(districts)

The `json.loads()` method gave us a list, with each element of the list representing the data for one district. We can see that the object is a list because the first and last characters are square brackets. Run the cell below to see information for one of the districts.

In [None]:
# Run this cell
print("Object type for first element in list:", type(districts[0]))
districts[0]

Each element in the list is a Python dictionary. It's now easy to get the TBA key for the PNW district.

In [None]:
# Run this cell to see the TBA key for the 10th district in the list
districts[9]["key"]

Or we can use a list comprehension if we don't know where the PNW district is located in the list.

In [None]:
# Run this cell to see a method for searching a list
[x['key'] for x in districts if x['display_name'] == 'Pacific Northwest'][0]

In this instance, our JSON text transformed into a Python list of Python dictionaries. But that won't always be the case. JSON could also consist of just a dictionary, or just a list, or a dictionary where every value is a list, or a dictionary that contains other dictionaries. In short, importing JSON text into Python can result in a structure with dictionaries and/or lists that are nested to any depth.

## E. Converting to a Pandas Dataframe
The Python list is a big improvement over plain JSON text, but we can do better. We can convert the Python list to a [Pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/).

In [None]:
# Run this text to create a dataframe
import pandas as pd
dist_df = pd.DataFrame(districts)
dist_df

That's a lot better.

## F. Blue Alliance Keys
Pay special attention to the *key* column in the dataframe printout. TBA assigns unique codes, called keys, to every FRC district, competition, and team. Many TBA-API commands require the user to submit a key as a path argument in order to retrieve more information. Also note that TBA-API often includes the season (i.e., year) in the key>

#### The Next Step
Go to [blue_alliance_part2](blue_alliance_part2.ipynb) to continue with this project guide. But run the last cell to save the dataframe to a pickle file first - we'll need it for the next notebook.

In [None]:
# Run this cell before closing the notebook.
import pickle
with open('districts.pickle', 'wb') as p_file:
    pickle.dump(districts, p_file)

[Table of Contents](../index.ipynb)