[Table of Contents](../index.ipynb)

# Project 1: Analyzing Blue Alliance Data with Python
# Notebook 3: Creating Datraframes from Nested Data

## I. Preparations
### A. Other Notebooks in Pyclass FRC
1. Get a *Blue Alliance* API authorization key. See [See Instructions here](../../procedures/pc03_tba_api_key/pc03_tba_api_key.ipynb).
2. Review [session 09 on Hypertext Transfer Protocol](../../sessions/s09_http/s09_http.ipynb)
3. Review [Analyzing Blue Alliance Data with Pyton - Notebook 1](pj01_nb01_tba_http.ipynb)
4. Review [Analyzing Blue Alliance Data with Pyton - Notebook 2](pj01_nb02_tba_flat_data.ipynb)

### B. Official Python Documentation
This notebook brings together many different Python techniques. These links to the official Python documentation may be helpful when working through this notebook.
1. [List comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)
2. [Python dictionaries, including dictionary comprehensions](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)
3. [Built-in `filter()` function](https://docs.python.org/3/library/functions.html#filter).
4. [Dictionary's `items()` method](https://docs.python.org/3/library/stdtypes.html?highlight=items#dict.items)
5. [Built-in `isinstance()` function](https://docs.python.org/3/library/functions.html#isinstance)
6. [Lamda Expressions](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions)
7. [PEP 448 on Dictionary Unpacking](https://www.python.org/dev/peps/pep-0448/)
8. [Built in `enumerate()` function](https://docs.python.org/3/library/functions.html#enumerate)
9. [Python f strings](https://realpython.com/python-f-strings/#f-strings-a-new-and-improved-way-to-format-strings-in-python). 

### C. Package Imports

In [2]:
import datetime
import json
import pickle

import pandas as pd

### D. Load JSON Data

In [3]:
with open('matches.json', 'r') as j_file:
    matches = json.load(j_file)

with open('districts.json', 'rb') as p_file:
    districts = json.load(p_file)

## II. Overview
In this notebook we will create a flat (well, ... almost flat) dataframe from highly nested JSON data that contains detailed FRC match scores.
* Our dataframe will have two rows per match, one for each alliance.
* There will be an *alliance* column that specifies whether the row is for the blue or red alliance.
* The dataframe will have three columns that identify the teams that compete in the match, one column for each operator station.

In prior notebooks, we determined that the matches JSON data consists of a list of dictionaries, with one dictionary for every match in the FRC competition. Review the data for the first two matches below:

In [4]:
matches[0:2]

[{'actual_time': 1583104716,
  'alliances': {'blue': {'dq_team_keys': [],
    'score': 136,
    'surrogate_team_keys': [],
    'team_keys': ['frc2930', 'frc2976', 'frc4918']},
   'red': {'dq_team_keys': [],
    'score': 148,
    'surrogate_team_keys': [],
    'team_keys': ['frc4911', 'frc2910', 'frc4173']}},
  'comp_level': 'f',
  'event_key': '2020wasno',
  'key': '2020wasno_f1m1',
  'match_number': 1,
  'post_result_time': 1583105046,
  'predicted_time': 1583104822,
  'score_breakdown': {'blue': {'adjustPoints': 0,
    'autoCellPoints': 40,
    'autoCellsBottom': 0,
    'autoCellsInner': 2,
    'autoCellsOuter': 7,
    'autoInitLinePoints': 15,
    'autoPoints': 55,
    'controlPanelPoints': 0,
    'endgamePoints': 55,
    'endgameRobot1': 'Hang',
    'endgameRobot2': 'Hang',
    'endgameRobot3': 'Park',
    'endgameRungIsLevel': 'NotLevel',
    'foulCount': 0,
    'foulPoints': 0,
    'initLineRobot1': 'Exited',
    'initLineRobot2': 'Exited',
    'initLineRobot3': 'Exited',
    'rp

## III. Lambda Functions and the Built-in `filter()` Function
Many of the top-level keys in each dictionary contain atomic data. Each of these keys can be converted into a database column with a simple list comprehension.

In [5]:
top_level_keys = [{'actual_time': mtch['actual_time'], 'comp_level': mtch['comp_level'],
                  'event_key': mtch['event_key']} for mtch in matches]
top_level_keys[:3]

[{'actual_time': 1583104716, 'comp_level': 'f', 'event_key': '2020wasno'},
 {'actual_time': 1583106972, 'comp_level': 'f', 'event_key': '2020wasno'},
 {'actual_time': 1583096632, 'comp_level': 'qf', 'event_key': '2020wasno'}]

In [6]:
pd.DataFrame(top_level_keys).head(3)

Unnamed: 0,actual_time,comp_level,event_key
0,1583104716,f,2020wasno
1,1583106972,f,2020wasno
2,1583096632,qf,2020wasno


There are 13 top-level dictionary keys, and with this approach each key needs to be typed twice. I got tired of typing the keys after typing only three of them. There are many drawbacks to this approach.
* It's tiring.
* It's prone to errors. There are 26 opportunities to type the name of the key incorrectly.
* The code will break if TBA-API changes one of the keys.

Fortunately, there is a better way. Python has a built-in function called `filter()` that can be used to extract only the dictionary keys that we want.3. The [`filter()` documentation is here](https://docs.python.org/3/library/functions.html#filter).

The `filter()` function takes two arguments. The first argument passed to filter is a function that itself takes one argument and returns either `True` or `False`. The second argument is an iterable object like a Python list. `filter()` will pass every element of the list to the function in the first argument, and will return only those elements for wich the function returns True. That probably doesn't make much sense, but an example should clarify the situation.

We will first consider the second argument to `filter()`. We will pass a top-level dictionary to `filter()` as the second argument, but we first must convert the dictionary to a list of sorts using the dictionary's [`items()` method](https://docs.python.org/3/library/stdtypes.html?highlight=items#dict.items).

In [7]:
top_level_match_dictionary = matches[0]
list(top_level_match_dictionary.items())[:5]

[('actual_time', 1583104716),
 ('alliances',
  {'blue': {'dq_team_keys': [],
    'score': 136,
    'surrogate_team_keys': [],
    'team_keys': ['frc2930', 'frc2976', 'frc4918']},
   'red': {'dq_team_keys': [],
    'score': 148,
    'surrogate_team_keys': [],
    'team_keys': ['frc4911', 'frc2910', 'frc4173']}}),
 ('comp_level', 'f'),
 ('event_key', '2020wasno'),
 ('key', '2020wasno_f1m1')]

The `items()` method allows us to convert a dictionary's key-value pairs into a list of tuple objects, with the dictionary key as the first element of the tuple and the dictionary value as the second element.

Next, we will take advantage of the fact that all non-atomic values in the top-level matches dictionaries are lists or are themselves dictionaries. We will write a custom function that takes a single key-value tuple, as returned by `items()` above, and returns True if the second element of the tuple is anything other than a list or dictionary. Otherwise the function will False.

In [8]:
def is_not_dict_or_list(tpl):
    return not isinstance(tpl[1], (dict, list))

The [built-in `isinstance()` function](https://docs.python.org/3/library/functions.html#isinstance) returns True if the second element of the tuple is a list or dictionary. The `not` keyword reverses the result so `is_not_dict_or_list()` will return True of the second tuple element is anything *other than* a list or dictionary.

Now let's put this all together:

In [9]:
dict(filter(is_not_dict_or_list, top_level_match_dictionary.items()))

{'actual_time': 1583104716,
 'comp_level': 'f',
 'event_key': '2020wasno',
 'key': '2020wasno_f1m1',
 'match_number': 1,
 'post_result_time': 1583105046,
 'predicted_time': 1583104822,
 'set_number': 1,
 'time': 1583103960,
 'winning_alliance': 'red'}

Voila. We have filtered the top-level dictionary such that it contains only atomic values. And we did it without too much code. But we can do better.

We can define the custom filtering function within the `filter()` function call using a [lamda expression](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions).

In [10]:
dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)),
                                       top_level_match_dictionary.items()))

{'actual_time': 1583104716,
 'comp_level': 'f',
 'event_key': '2020wasno',
 'key': '2020wasno_f1m1',
 'match_number': 1,
 'post_result_time': 1583105046,
 'predicted_time': 1583104822,
 'set_number': 1,
 'time': 1583103960,
 'winning_alliance': 'red'}

`lambda` is a Python keyword that allows us to create short anonymous functions. Consider the lambda expression we just used:
```Python
lambda tpl: not isinstance(tpl[1], (dict, list))
```

The `lambda` keyword tells Python to create a function from the statement that follows. The items between the `lambda` keyword and the colon are the arguments to the function. Our lambda function takes only one argument, but lambda expressions can take multiple arguments separated by commas. The statement after the colon will be returned by the lambda function.Lambda functions have a few advantages:
* They improve code maintainability because everything needed for the statement to work is included in the statement itself. We don't have to worry that the `top_level_match_dictionary()` function will get deleted or otherwise messed up.
* They improve readability in the sense that no cross referencing to a separate function is needed to understand what the statement does.
* The programmer does not have to think up a name for a function that is only going to get used in one place.

Lambda functions can only consist of a single statement. For some readers of this code, lambda functions can degrade readability in the sense that they result in a lot of functionality being packed into just one line. But for readers who are proficient with the lambda syntax, this disadvantage is minor.

Lamda functions are not unique to Python. Many other languages, such as Java and C++ also allow similar anonymous functions, although they don't use the `lambda` keyword.

We passed the filter-lambda statement to the built-in `dict()` function to convert the output of `filter()` to a dictionary object.

Next we'll place the dict-filter-lambda statement in a list comprehension.

In [11]:
match_data_list = [dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items()))
                   for mtch in matches]
match_data_list

[{'actual_time': 1583104716,
  'comp_level': 'f',
  'event_key': '2020wasno',
  'key': '2020wasno_f1m1',
  'match_number': 1,
  'post_result_time': 1583105046,
  'predicted_time': 1583104822,
  'set_number': 1,
  'time': 1583103960,
  'winning_alliance': 'red'},
 {'actual_time': 1583106972,
  'comp_level': 'f',
  'event_key': '2020wasno',
  'key': '2020wasno_f1m2',
  'match_number': 2,
  'post_result_time': 1583107173,
  'predicted_time': 1583107040,
  'set_number': 1,
  'time': 1583104380,
  'winning_alliance': 'red'},
 {'actual_time': 1583096632,
  'comp_level': 'qf',
  'event_key': '2020wasno',
  'key': '2020wasno_qf1m1',
  'match_number': 1,
  'post_result_time': 1583096863,
  'predicted_time': 1583096645,
  'set_number': 1,
  'time': 1583096400,
  'winning_alliance': 'red'},
 {'actual_time': 1583098723,
  'comp_level': 'qf',
  'event_key': '2020wasno',
  'key': '2020wasno_qf1m2',
  'match_number': 2,
  'post_result_time': 1583098931,
  'predicted_time': 1583098747,
  'set_number':

And now we'll create a dataframe from the list.

In [12]:
pd.DataFrame(match_data_list).head()

Unnamed: 0,actual_time,comp_level,event_key,key,match_number,post_result_time,predicted_time,set_number,time,winning_alliance
0,1583104716,f,2020wasno,2020wasno_f1m1,1,1583105046,1583104822,1,1583103960,red
1,1583106972,f,2020wasno,2020wasno_f1m2,2,1583107173,1583107040,1,1583104380,red
2,1583096632,qf,2020wasno,2020wasno_qf1m1,1,1583096863,1583096645,1,1583096400,red
3,1583098723,qf,2020wasno,2020wasno_qf1m2,2,1583098931,1583098747,1,1583098080,red
4,1583097215,qf,2020wasno,2020wasno_qf2m1,1,1583097388,1583097192,2,1583096820,blue


Very cool.

## IV. Dictionary Unpacking
In the [previous notebook](pj01_nb02_tba_flat_data.ipynb#row_approach) we used a list comprehension to extract the sub-dictionary with the key *score_breakdown* into it's own list.

In [13]:
blue_rows = [mtch['score_breakdown']['blue'] for mtch in matches]
blue_rows[:5]

[{'adjustPoints': 0,
  'autoCellPoints': 40,
  'autoCellsBottom': 0,
  'autoCellsInner': 2,
  'autoCellsOuter': 7,
  'autoInitLinePoints': 15,
  'autoPoints': 55,
  'controlPanelPoints': 0,
  'endgamePoints': 55,
  'endgameRobot1': 'Hang',
  'endgameRobot2': 'Hang',
  'endgameRobot3': 'Park',
  'endgameRungIsLevel': 'NotLevel',
  'foulCount': 0,
  'foulPoints': 0,
  'initLineRobot1': 'Exited',
  'initLineRobot2': 'Exited',
  'initLineRobot3': 'Exited',
  'rp': 0,
  'shieldEnergizedRankingPoint': False,
  'shieldOperationalRankingPoint': False,
  'stage1Activated': True,
  'stage2Activated': False,
  'stage3Activated': False,
  'stage3TargetColor': 'Unknown',
  'tba_numRobotsHanging': 2,
  'tba_shieldEnergizedRankingPointFromFoul': False,
  'techFoulCount': 0,
  'teleopCellPoints': 26,
  'teleopCellsBottom': 0,
  'teleopCellsInner': 0,
  'teleopCellsOuter': 13,
  'teleopPoints': 81,
  'totalPoints': 136},
 {'adjustPoints': 0,
  'autoCellPoints': 38,
  'autoCellsBottom': 0,
  'autoCellsI

We would like to combine the score columns from this list comprehension with the match data columns that we created with the `filter()` function. Just sticking both code snippets into a list comprehension does not work - the following code will result in a syntax error:
```Python
blue_rows = [dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items())),
             mtch['score_breakdown']['blue']
             for mtch in matches]
```
The problem is that the expression in a list comprehension befor the `for` keyword is supposed to return a single list element. But the expression in the snippet above is returning *two* dictionaries, and the list comprehension can't figure out what to do with that.

If you are thinking that the list comprehension would work if we enclose both the `mtch['score...` and `dict(filter(lambda...` dictionary expressions in parenthesis to create a tuple, you would be correct.

In [14]:
tuple_list = [(dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items())),
              mtch['score_breakdown']['blue'])
              for mtch in matches]
tuple_list[:5]

[({'actual_time': 1583104716,
   'comp_level': 'f',
   'event_key': '2020wasno',
   'key': '2020wasno_f1m1',
   'match_number': 1,
   'post_result_time': 1583105046,
   'predicted_time': 1583104822,
   'set_number': 1,
   'time': 1583103960,
   'winning_alliance': 'red'},
  {'adjustPoints': 0,
   'autoCellPoints': 40,
   'autoCellsBottom': 0,
   'autoCellsInner': 2,
   'autoCellsOuter': 7,
   'autoInitLinePoints': 15,
   'autoPoints': 55,
   'controlPanelPoints': 0,
   'endgamePoints': 55,
   'endgameRobot1': 'Hang',
   'endgameRobot2': 'Hang',
   'endgameRobot3': 'Park',
   'endgameRungIsLevel': 'NotLevel',
   'foulCount': 0,
   'foulPoints': 0,
   'initLineRobot1': 'Exited',
   'initLineRobot2': 'Exited',
   'initLineRobot3': 'Exited',
   'rp': 0,
   'shieldEnergizedRankingPoint': False,
   'shieldOperationalRankingPoint': False,
   'stage1Activated': True,
   'stage2Activated': False,
   'stage3Activated': False,
   'stage3TargetColor': 'Unknown',
   'tba_numRobotsHanging': 2,
   't

Careful inspection of the output above shows that we successfully created a list of tuples, and that each tuple in the list contains two dictionaries. The problem is that the Pandas `DataFrame` constructor will not handle this input in the manner we would like:

In [15]:
pd.DataFrame(tuple_list).head(3)

Unnamed: 0,0,1
0,"{'actual_time': 1583104716, 'comp_level': 'f',...","{'adjustPoints': 0, 'autoCellPoints': 40, 'aut..."
1,"{'actual_time': 1583106972, 'comp_level': 'f',...","{'adjustPoints': 0, 'autoCellPoints': 38, 'aut..."
2,"{'actual_time': 1583096632, 'comp_level': 'qf'...","{'adjustPoints': 0, 'autoCellPoints': 20, 'aut..."


Pandas created only two columns, one for each dictionary in the tuples. We need a mechanism to extract the key value pairs from both dictionarys and combine them into a single dictionary. Fortunately Python provides a simple syntax for just this operation. Double-astarisk operator to the rescue. Run the code cell below to see how the double-star operator combines two dictionaries into a single dictionary.

In [16]:
# Dictionary unpacking with the double-asterisk operator
{**{'key1': 1, 'key2': 2}, **{'key3': 3, 'key4': 4}}

{'key1': 1, 'key2': 2, 'key3': 3, 'key4': 4}

The technique used here is called dictionary unpacking. You are probably familiar with using two asterisks for exponentiation, e.g., `3**3 == 27`. But when placed in front of a dictionary, two asterisks instruct Python to extract the key values pairs from the dictionary. Dictionary unpacking with the double-asterisk operator can be used in function call, literal dictionaries (like in the preceeding example) and in list or dictionary comprehensions. [Python Enhancment Proposal (PEP) 448](https://www.python.org/dev/peps/pep-0448/) explains the possible uses of dictionary unpacking. Dictionary unpacking in function calls has been around for many years, but using dictionary unpacking in literal dictionaries and comprehensions has only been allowed since Python version 3.5.

Instead of enclosing the two dictionary expressions in parenthesis to create a tuple, we should enclose them in curly braces and preceed each one with a double-asterisk operator. This will extract the key-value pairs of both dictionaries into a single dictionary.

In [17]:
dnary_list = [{**dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items())),
               **mtch['score_breakdown']['blue']}
              for mtch in matches]
dnary_list[:5]

[{'actual_time': 1583104716,
  'comp_level': 'f',
  'event_key': '2020wasno',
  'key': '2020wasno_f1m1',
  'match_number': 1,
  'post_result_time': 1583105046,
  'predicted_time': 1583104822,
  'set_number': 1,
  'time': 1583103960,
  'winning_alliance': 'red',
  'adjustPoints': 0,
  'autoCellPoints': 40,
  'autoCellsBottom': 0,
  'autoCellsInner': 2,
  'autoCellsOuter': 7,
  'autoInitLinePoints': 15,
  'autoPoints': 55,
  'controlPanelPoints': 0,
  'endgamePoints': 55,
  'endgameRobot1': 'Hang',
  'endgameRobot2': 'Hang',
  'endgameRobot3': 'Park',
  'endgameRungIsLevel': 'NotLevel',
  'foulCount': 0,
  'foulPoints': 0,
  'initLineRobot1': 'Exited',
  'initLineRobot2': 'Exited',
  'initLineRobot3': 'Exited',
  'rp': 0,
  'shieldEnergizedRankingPoint': False,
  'shieldOperationalRankingPoint': False,
  'stage1Activated': True,
  'stage2Activated': False,
  'stage3Activated': False,
  'stage3TargetColor': 'Unknown',
  'tba_numRobotsHanging': 2,
  'tba_shieldEnergizedRankingPointFromFoul

This list converts to a dataframe in the manner we desire. One drawback of this method is that we are only extracting blue alliance data, due to using `['blue']` as one of our dictionary keys in the comprehension statement. We'll address that later.

In [18]:
pd.DataFrame(dnary_list).head(3)

Unnamed: 0,actual_time,comp_level,event_key,key,match_number,post_result_time,predicted_time,set_number,time,winning_alliance,...,stage3TargetColor,tba_numRobotsHanging,tba_shieldEnergizedRankingPointFromFoul,techFoulCount,teleopCellPoints,teleopCellsBottom,teleopCellsInner,teleopCellsOuter,teleopPoints,totalPoints
0,1583104716,f,2020wasno,2020wasno_f1m1,1,1583105046,1583104822,1,1583103960,red,...,Unknown,2,False,0,26,0,0,13,81,136
1,1583106972,f,2020wasno,2020wasno_f1m2,2,1583107173,1583107040,1,1583104380,red,...,Unknown,2,False,3,34,0,0,17,84,137
2,1583096632,qf,2020wasno,2020wasno_qf1m1,1,1583096863,1583096645,1,1583096400,red,...,Unknown,2,False,0,19,0,1,8,84,122


## V. Nested Comprehensions
Next we would like to add columns for the teams in each match. The teams are included in a list that is a few layers down in the JSON data structure. Make sure you understand why the following code works by reviewing the matches JSON data structure that's included earlier in this notebook.

In [19]:
matches[0]['alliances']['blue']['team_keys']

['frc2930', 'frc2976', 'frc4918']

Assuming the teams are listed in order of their station number, we could add three columns to the dataframe, one for each operator station, and insert applicable team number into that column.

In [20]:
tm_list = [{'station_1': mtch['alliances']['blue']['team_keys'][0],
            'station_2': mtch['alliances']['blue']['team_keys'][1],
            'station_3': mtch['alliances']['blue']['team_keys'][2],
            **dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items())),
            **mtch['score_breakdown']['blue']}
           for mtch in matches]
tm_list[:5]

[{'station_1': 'frc2930',
  'station_2': 'frc2976',
  'station_3': 'frc4918',
  'actual_time': 1583104716,
  'comp_level': 'f',
  'event_key': '2020wasno',
  'key': '2020wasno_f1m1',
  'match_number': 1,
  'post_result_time': 1583105046,
  'predicted_time': 1583104822,
  'set_number': 1,
  'time': 1583103960,
  'winning_alliance': 'red',
  'adjustPoints': 0,
  'autoCellPoints': 40,
  'autoCellsBottom': 0,
  'autoCellsInner': 2,
  'autoCellsOuter': 7,
  'autoInitLinePoints': 15,
  'autoPoints': 55,
  'controlPanelPoints': 0,
  'endgamePoints': 55,
  'endgameRobot1': 'Hang',
  'endgameRobot2': 'Hang',
  'endgameRobot3': 'Park',
  'endgameRungIsLevel': 'NotLevel',
  'foulCount': 0,
  'foulPoints': 0,
  'initLineRobot1': 'Exited',
  'initLineRobot2': 'Exited',
  'initLineRobot3': 'Exited',
  'rp': 0,
  'shieldEnergizedRankingPoint': False,
  'shieldOperationalRankingPoint': False,
  'stage1Activated': True,
  'stage2Activated': False,
  'stage3Activated': False,
  'stage3TargetColor': 'Unk

In [21]:
pd.DataFrame(tm_list).head(3)

Unnamed: 0,station_1,station_2,station_3,actual_time,comp_level,event_key,key,match_number,post_result_time,predicted_time,...,stage3TargetColor,tba_numRobotsHanging,tba_shieldEnergizedRankingPointFromFoul,techFoulCount,teleopCellPoints,teleopCellsBottom,teleopCellsInner,teleopCellsOuter,teleopPoints,totalPoints
0,frc2930,frc2976,frc4918,1583104716,f,2020wasno,2020wasno_f1m1,1,1583105046,1583104822,...,Unknown,2,False,0,26,0,0,13,81,136
1,frc2930,frc2976,frc4918,1583106972,f,2020wasno,2020wasno_f1m2,2,1583107173,1583107040,...,Unknown,2,False,3,34,0,0,17,84,137
2,frc4512,frc949,frc4131,1583096632,qf,2020wasno,2020wasno_qf1m1,1,1583096863,1583096645,...,Unknown,2,False,0,19,0,1,8,84,122


This works, but we can do better. This technique requires us to repeat what is almost the same expression three times, and good programmers don't like to repeat themselves. We are going to convert the three expressions that create the station columns into a single nested dictionary comprehension.

A nested comprehension is a list or dictionary comprehension nested inside another list or dictionary comprehension. Run the following cell to see the output from a dictionary comprehension for the teams.

In [22]:
{team: team for team in matches[0]['alliances']['blue']['team_keys']}

{'frc2930': 'frc2930', 'frc2976': 'frc2976', 'frc4918': 'frc4918'}

Ok, that's a start. But the keys need to be 'station_1' through 'station_3', not the team keys. The good news is that Python has a [built in function, `enumerate()`](https://docs.python.org/3/library/functions.html#enumerate) that will help. Run the code cell below to see what the `enumerate()` function does.

In [23]:
[tpl for tpl in enumerate(matches[0]['alliances']['blue']['team_keys'])]

[(0, 'frc2930'), (1, 'frc2976'), (2, 'frc4918')]

When we pass a list to `enumerate()`, the function returns two-element tuples. The second element of the tuple is the element from the list, and the first element of the tuple is the order of that element within the list. The `enumerate()` function frequently comes in handy because programmers often have to keep track of where they are in a list when they iterate through it.

Let's add the `enumerate()` function to our comprehension.

In [24]:
{team_tpl[0]: team_tpl[1]
 for team_tpl in enumerate(matches[0]['alliances']['blue']['team_keys'])}

{0: 'frc2930', 1: 'frc2976', 2: 'frc4918'}

That's much better, but we want the column names to start with 'Station' and have numbers 1 - 3, not 0 through 2. (If computer programmers had designed FRC, we probably would have station 0 through station 2.) But that's easy to fix. We'll use an f string, which stands for format string. Python f strings have only been around since Python version 3.6, so as of April 2020 when this notebook was drafted, they are not yet addressed in the official Python tutorial. [But you can find a good introduction on the *Real Python* website.](https://realpython.com/python-f-strings/#f-strings-a-new-and-improved-way-to-format-strings-in-python). 

In [25]:
{f'station_{team_tpl[0] + 1}': team_tpl[1]
 for team_tpl in enumerate(matches[0]['alliances']['blue']['team_keys'])}

{'station_1': 'frc2930', 'station_2': 'frc2976', 'station_3': 'frc4918'}

Python f strings will evaluate expressions placed within curly brackets and insert them into the string.

Now let's add the nested team comprehension to the outer list comprehension. We need to use the double-asterisk operator in front of the nested dictionary comprehension to unpack it.

In [26]:
tm_list2 = [{**{f'station_{tm_tpl[0] + 1}': tm_tpl[1] for tm_tpl
                in enumerate(mtch['alliances']['blue']['team_keys'])},
             'dq': mtch['alliances']['blue']['dq_team_keys'],
             **dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items())),
             **mtch['score_breakdown']['blue']}
            for mtch in matches]
tm_list2[:5]

[{'station_1': 'frc2930',
  'station_2': 'frc2976',
  'station_3': 'frc4918',
  'dq': [],
  'actual_time': 1583104716,
  'comp_level': 'f',
  'event_key': '2020wasno',
  'key': '2020wasno_f1m1',
  'match_number': 1,
  'post_result_time': 1583105046,
  'predicted_time': 1583104822,
  'set_number': 1,
  'time': 1583103960,
  'winning_alliance': 'red',
  'adjustPoints': 0,
  'autoCellPoints': 40,
  'autoCellsBottom': 0,
  'autoCellsInner': 2,
  'autoCellsOuter': 7,
  'autoInitLinePoints': 15,
  'autoPoints': 55,
  'controlPanelPoints': 0,
  'endgamePoints': 55,
  'endgameRobot1': 'Hang',
  'endgameRobot2': 'Hang',
  'endgameRobot3': 'Park',
  'endgameRungIsLevel': 'NotLevel',
  'foulCount': 0,
  'foulPoints': 0,
  'initLineRobot1': 'Exited',
  'initLineRobot2': 'Exited',
  'initLineRobot3': 'Exited',
  'rp': 0,
  'shieldEnergizedRankingPoint': False,
  'shieldOperationalRankingPoint': False,
  'stage1Activated': True,
  'stage2Activated': False,
  'stage3Activated': False,
  'stage3Target

In [28]:
pd.DataFrame(tm_list2).head(3)

Unnamed: 0,station_1,station_2,station_3,dq,actual_time,comp_level,event_key,key,match_number,post_result_time,...,stage3TargetColor,tba_numRobotsHanging,tba_shieldEnergizedRankingPointFromFoul,techFoulCount,teleopCellPoints,teleopCellsBottom,teleopCellsInner,teleopCellsOuter,teleopPoints,totalPoints
0,frc2930,frc2976,frc4918,[],1583104716,f,2020wasno,2020wasno_f1m1,1,1583105046,...,Unknown,2,False,0,26,0,0,13,81,136
1,frc2930,frc2976,frc4918,[],1583106972,f,2020wasno,2020wasno_f1m2,2,1583107173,...,Unknown,2,False,3,34,0,0,17,84,137
2,frc4512,frc949,frc4131,[],1583096632,qf,2020wasno,2020wasno_qf1m1,1,1583096863,...,Unknown,2,False,0,19,0,1,8,84,122


It works!

## VI. Including *if* Statements in Comprehensions
Take a look at this line in the matches JSON:
```json
'alliances': {'blue': {'dq_team_keys': [], ...
```
If a team were disqualified durign a match, the team key would be included in the list following *dq_team_keys*. If more than one team was disqualified, there would be more than one team in this list. We need to make a decision on what we want to do with this information. There are several different choices we could make.
1. We could ignore it, and just not include information on disqualified teams in the dataframe. There is nothing wrong with this choice. Teams are rarely disqualified and every dataframe does not need to contain all available data.
2. We could add a single column to the dataframe with the dq information as a string, with no extra processing. For matches where no team is disqualified (most matches), the column will contain the value '[]'. If a single team is disqualified, the column will contain '[frcXXXX]'. This is an easy option and the user of the data frame can easily see if a team is disqualified, but they will have to write special code to interpret the contents of this column.
3. We could add an extra column for every team in the list. If no teams are disqualified, then there is no disqualified column. If there is a match where two teams are disqualified, then there are two columns for disqualified teams. The problems with this approach is that the dataframe will have different numbers of columns for different events, and the user of the dataframe will have to test of the existance of columns. This is not a good approach.

The author prefers a modified version of option #2 that looks like this:

In [31]:
[{'teams_dq': mtch['alliances']['blue']['dq_team_keys']}
 if mtch['alliances']['blue']['dq_team_keys'] else None
 for mtch in matches][:5]

[None, None, None, None, None]

This method uses a one-line `if` statement to check if the *dq_team_keys* list contains any data. If it's empty, we insert the Python value `None` into the column. Otherwise, we insert the teams into the column as a string. It takes advantage of a couple Python-isms.
* When evalauted as a Boolean, an empty string converts to `False`. The value `mtch['alliances']['blue']['dq_team_keys']` that occurrs after the `if` statement will evaluate to True if one or more teams were disqualified, and `False` otherwise.
* Consequently, if no teams were disqualified, the `None` value in the `else` clause will be returned. Otherwise the column will contain a string that identifies the disqualified teams.

We can use the exact same technique for including surrogate teams. Also, did you notice that we had to type the string 'blue' twice in the cell above? And it occurred three times in the cell before that? Let's replace it with a variable that contains the string 'blue'. Putting all of that together looks like this:

In [38]:
alli = 'blue'

mtch_lst = [{**{f'station_{tm_tpl[0] + 1}': tm_tpl[1] for tm_tpl
                in enumerate(mtch['alliances'][alli]['team_keys'])},
             
            'teams_dq': mtch['alliances'][alli]['dq_team_keys']
                if mtch['alliances'][alli]['dq_team_keys'] else None,
             
            'teams_surrogate': mtch['alliances'][alli]['surrogate_team_keys']
                if mtch['alliances'][alli]['surrogate_team_keys'] else None,
             
            **dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items())),
             
            **mtch['score_breakdown'][alli]}
            for mtch in matches]

pd.DataFrame(mtch_lst).head(3)

Unnamed: 0,station_1,station_2,station_3,teams_dq,teams_surrogate,actual_time,comp_level,event_key,key,match_number,...,stage3TargetColor,tba_numRobotsHanging,tba_shieldEnergizedRankingPointFromFoul,techFoulCount,teleopCellPoints,teleopCellsBottom,teleopCellsInner,teleopCellsOuter,teleopPoints,totalPoints
0,frc2930,frc2976,frc4918,,,1583104716,f,2020wasno,2020wasno_f1m1,1,...,Unknown,2,False,0,26,0,0,13,81,136
1,frc2930,frc2976,frc4918,,,1583106972,f,2020wasno,2020wasno_f1m2,2,...,Unknown,2,False,3,34,0,0,17,84,137
2,frc4512,frc949,frc4131,,,1583096632,qf,2020wasno,2020wasno_qf1m1,1,...,Unknown,2,False,0,19,0,1,8,84,122


## VII. Final Touches
There are two pieces of data that are still missing from this dataframe: the alliance score and the video links. We're going to ignore the video links, but the alliance score is easy to add.

In [37]:
alli = 'blue'

mtch_lst2 = [{# Alliance Information
             'alliance_score': mtch['alliances'][alli]['score'], 
    
             # Extract Team Numbers
             **{f'station_{tm_tpl[0] + 1}': tm_tpl[1] for tm_tpl
                 in enumerate(mtch['alliances'][alli]['team_keys'])},
             
             # Disqualified and Surrogate Teams
             'teams_dq': mtch['alliances'][alli]['dq_team_keys']
                 if mtch['alliances'][alli]['dq_team_keys'] else None,
             
             # Match Description
             'teams_surrogate': mtch['alliances'][alli]['surrogate_team_keys']
                 if mtch['alliances'][alli]['surrogate_team_keys'] else None,
            
             # Detailed Match Scores
             **dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)), mtch.items())),
             
             **mtch['score_breakdown'][alli]}
             for mtch in matches]

pd.DataFrame(mtch_lst2).head(3)

Unnamed: 0,alliance_score,station_1,station_2,station_3,teams_dq,teams_surrogate,actual_time,comp_level,event_key,key,...,stage3TargetColor,tba_numRobotsHanging,tba_shieldEnergizedRankingPointFromFoul,techFoulCount,teleopCellPoints,teleopCellsBottom,teleopCellsInner,teleopCellsOuter,teleopPoints,totalPoints
0,136,frc2930,frc2976,frc4918,,,1583104716,f,2020wasno,2020wasno_f1m1,...,Unknown,2,False,0,26,0,0,13,81,136
1,137,frc2930,frc2976,frc4918,,,1583106972,f,2020wasno,2020wasno_f1m2,...,Unknown,2,False,3,34,0,0,17,84,137
2,122,frc4512,frc949,frc4131,,,1583096632,qf,2020wasno,2020wasno_qf1m1,...,Unknown,2,False,0,19,0,1,8,84,122


There is still one big glaring problem. This dataframe only contains detailed scores for the blue alliance. To fix that, we'll run the statement twice, creating one dataframe for blue and one for red, and then we'll combine the dataframes.

In [39]:
alliances = ['red', 'blue']
frames = []
for alli in alliances:
    frames.append(pd.DataFrame(
        [{# Alliance Information
            'alliance': alli,
            'alliance_score': mtch['alliances'][alli]['score'], 

            # Extract Team Numbers
            **{f'station_{tm_tpl[0] + 1}': tm_tpl[1] for tm_tpl
                    in enumerate(mtch['alliances'][alli]['team_keys'])},

            # Disqualified and Surrogate Teams
            'teams_dq': mtch['alliances'][alli]['dq_team_keys']
                    if mtch['alliances'][alli]['dq_team_keys'] else None,

            # Match Description
            'teams_surrogate': mtch['alliances'][alli]['surrogate_team_keys']
                    if mtch['alliances'][alli]['surrogate_team_keys'] else None,

            # Detailed Match Scores
            **dict(filter(lambda tpl: not isinstance(tpl[1], (dict, list)),
                    mtch.items())),

            **mtch['score_breakdown'][alli]}
         for mtch in matches]))
match_scores = pd.concat(frames)
match_scores.head()

Unnamed: 0,alliance,alliance_score,station_1,station_2,station_3,teams_dq,teams_surrogate,actual_time,comp_level,event_key,...,stage3TargetColor,tba_numRobotsHanging,tba_shieldEnergizedRankingPointFromFoul,techFoulCount,teleopCellPoints,teleopCellsBottom,teleopCellsInner,teleopCellsOuter,teleopPoints,totalPoints
0,red,148,frc4911,frc2910,frc4173,,,1583104716,f,2020wasno,...,Unknown,1,False,0,56,0,4,22,101,148
1,red,163,frc4911,frc2910,frc4173,,,1583106972,f,2020wasno,...,Unknown,1,False,0,43,0,3,17,83,163
2,red,193,frc4911,frc2910,frc4173,,,1583096632,qf,2020wasno,...,Unknown,2,False,0,78,0,8,27,143,193
3,red,140,frc4911,frc2910,frc4173,,,1583098723,qf,2020wasno,...,Unknown,1,False,0,38,0,2,16,83,140
4,red,89,frc4089,frc4915,frc4309,,,1583097215,qf,2020wasno,...,Unknown,0,False,0,42,0,2,18,52,89


We made several changes in this most recent iteration of the list comprehension:
1. We added a column to identify the alliance: `'alliance': alli,`
2. We placed the list comprehension in a `for` loop so it will be run twice, once for the red alliance and once for the blue.
3. We put the two resulting dataframes in a list.
4. We called the Pandas `concat` function to combine the two dataframes into a single dataframe.

## VIII. The Next Step.
This dataframe still needs some work. The columns with times are incomprehensible, the order of the dataframe makes little sense, and integer indices (numbers at far left of the dataframe) are arbitrary. We will use Pandas tools to make those fixes [in the next notebook.](pj01_nb04_tba_pandas.ibynb)

[Table of Contents](../../index.ipynb)