_Main topics covered during today's session:_

Previous NB:

1. **Intro to Nested Data**

This NB:
1. **Nested Data Example -- NBA data**

## Nested Data Structures: NBA Player Analysis

_Version 1.2_

This notebook exercises the concepts of nested data structures in Python.

The intent is to demonstrate for the student a methodology for discovering and manipulating these data structures.

The focus of this notebook is solely on understanding the nested data structures, so that they can be manipulated in some manner. 

### We will be using some data from the National Basketball Association's (NBA) statistics API for this exercise. 

The data is called the matchup stats for when a defensive player is guarding an offensive player during the games of an NBA season. The statistics show how the offensive player performed, using various offensive metrics, while being guarded by the defensive player. They include such metrics as the number of games against each other, the total number of minutes that the defensive player guarded the offensive player, points scored, assists, field goal percentage, and the like. The statistics are compiled over the multiple games that the two players' teams played each other over that season.

Analysts compare these matchup statistics with how the offensive player performed over the full season, to assess the defensive player's effectiveness. For example, let's say that the offensive player averaged 20 points per 48 minute game for the season. And then let's say that the offensive player scored 10 points in 48 minutes, while matched up against a particular defender. That defender would be considered very good, because he held the offensive player to half of their points per game.

## Objective

The raw data is in a json file, which comes to us as a dictionary with embedded dictonaries and lists.

We are simply be working to understand the nested data structure of this json input file. 

We assume that this would be the first part of a larger data analysis, in which a client would seek information about NBA players from this dataset.

## The data file

#### nba_json.txt
    This is the raw data, scraped from the NBA API. We will put this into a variable, nested, which will be a list of dictionaries.

Don't worry about the packages or syntax here to import from the json file, as you will learn about these later in the course.

In [None]:
# import the required package for bringing in the json file
import ast

In [None]:
# reading the data from the json file
with open('nba_json.txt') as f:
    data = f.readlines()
    
nested = [] # this is going to be our nested data structure
#put each line(string) into a dictionary
for line in data:
    result = ast.literal_eval(line)
#     print(result)
    nested.append(result)

## We want to understand the elements of the nested data structure, and how to address each of them.

What is the overall data type of the variable nested?

In [None]:
print(type(nested))

#### The variable nested is a list, so let's understand what that means.

We know that a list will have one or more elements, and we use square brackets to address each element in the list. The list indices start at 0 and go up to n-1, where n is the number of elements in the list.

The code below loops through this list and prints the data type of each element in the list.

In [None]:
for i in nested:  #loop over the elements of the list
    print(type(i))

#### We see that the list has 7 elements, and that each element of the list is a dictionary. 

#### So we have a list of dictionaries.

#### Let's look at the dictionary that is the first element in the list. Pay attention to the notation that we are using.

In [None]:
# display the first dictionary in the list
# Using list notation to address this element
print("nested_first_dict=",nested[0])

Let's take this out to Python Tutor to visualize.

[Python Tutor](https://pythontutor.com/visualize.html#mode=display)


#### Note that this first dict in 'nested' is for the player Al-Farouq Aminu, with the player_id 202329.

How would we get the second player in 'nested'? 

In [None]:
# display the second dictionary in the list
# Using list notation to address this element
print(nested[1])

#### We can see that this is a dictionary (the squiggly brackets at the very beginning and end) and that it has other nested data elements within the dictionary.

Let's look at the dictionary to see if we can visualize its structure.

We will do this in two steps.

First, we want to know the keys of the dictionary. This will tell us the "top level" data elements.

Then, we will output an element of the list (the dictionary), put it into a markdown cell, and see if we can use CRLF's and indentations to visualize the dictionary and its nested elements.

In [None]:
# First, what are the keys of the dictionary?
# Remember that each element of "nested" is a dictionary
# So nested[i] defines each dictionary in the list of dicts
print(nested[0].keys())


#### So the dictionary has 3 keys.

Now let's output the first of the list elements from nested, a dictionary.


In [None]:
# Again, note the list notation to address the dictionary
print(nested[0])

In [None]:
# another way of outputting the variable

nested[0]

#### Note the differences in how Python formats the two outputs, using print() versus just calling the variable to output. 

1.  With print(), you just get the output on lines, with the next line just picking up wherever the text is.

2.  With the variable called directly, though, we can see that it formats the output to show the structure of the nested elements, starting a new row for each new element at that level. 

3.  One note on displaying variables in these two manners. The second manner only outputs if it is the last line of the cell. If you want to display a variable in the middle of the cell, with code after it, you must use print().

#### Now let's visualize the dictionary.

We know the three top-level keys, so let's set those first, by putting them onto their own lines.Note that we have truncated the 'resultSets' values, for brevity.

##### resource 
    This defines the NBA API endpoint that we called to bring back this data. Simple key-value pair.

##### parameters 
    Defines the parameters/variables that we passed to the API endpoint, to tell it which data to select and send back to us. We can see that this is a key, with a dictionary as the value (squiggly braces).

##### resultSets 
    This is the player matchup data that we will be putting into our Pandas data frame. We can see that this is a key, with a list as the value (square brackets). We can also see that there appear to be dictionaries and lists within this list, and we will examine this later.


{'resource': 'leagueseasonmatchups', 

'parameters': {'LeagueID': '00', 'Season': '2019-20', 'SeasonType': 'Regular Season', 'PORound': 0, 'PerMode': 'Totals', 'DefTeamID': None, 'OffTeamID': None, 'OffPlayerID': None, 'DefPlayerID': 203500}, 

'resultSets': [{'name': 'SeasonMatchups', 'headers': ['SEASON_ID', 'OFF_PLAYER_ID', 'OFF_PLAYER_NAME', 'DEF_PLAYER_ID', 'DEF_PLAYER_NAME', 'GP', 'MATCHUP_MIN', 'PARTIAL_POSS', 'PLAYER_PTS', 'TEAM_PTS', 'MATCHUP_AST', 'MATCHUP_TOV', 'MATCHUP_BLK', 'MATCHUP_FGM', 'MATCHUP_FGA', 'MATCHUP_FG_PCT', 'MATCHUP_FG3M', 'MATCHUP_FG3A', 'MATCHUP_FG3_PCT', 'HELP_BLK', 'HELP_FGM', 'HELP_FGA', 'HELP_FG_PERC', 'MATCHUP_FTM', 'MATCHUP_FTA', 'SFL', 'MATCHUP_TIME_SEC'], 'rowSet': [['22019', 1629060, 'Rui Hachimura', 202329, 'Al-Farouq Aminu', 1, '6:45', 31.9, 4, 47, 1, 0, 0, 2, 5, 0.4, 0, 0, 0.0, 0, 0, 0, 0, 0, 0, 0, 404.7],}]}

#### Now let's loop over the keys of this dictionary, to display the value of each.

This is just a standard Python loop over a dict.


In [None]:
# loop over the dictionary, display the key and value pair
for k,v in nested[0].items():
    print("key:", k)
    print("value:",v,"\n")


In [None]:
# Let's output individual keys
# Notice the hierarchy of how we are addressing the dict, which is an element of the list
print(nested[0]['resource'])
print("\n")
print(nested[0]['parameters'])
print("\n")
print(nested[0]['resultSets'])

#### We can see that the key 'parameters' has a dictionary as its value. Let's loop over the keys of this dictionary and output the values.

This is practice for us to get our syntax correct for our nesting.

In [None]:
# Note the next level down of syntax
print('key-value pairs in a loop')
for k,v in nested[0]['parameters'].items():
    print("key:", k)
    print("value:",v,"\n")

In [None]:
# Output a single value from a selected key within the dict
# Note that we are THREE levels down to address an individual key
print('single value of a key')
print(nested[0]['parameters']['Season'])

#### Now let's look at the resultSets key, with a list as its value.

1. Output the list that is the value for this key.

2. Loop over the list, to see the individual elements of it.

In [None]:
# Item #1
# output the value of the resultSets key
# again, note the levels to get down to it
print(nested[0]['resultSets'])

In [None]:
# Second way of displaying item #1
nested[0]['resultSets']

In [None]:
# Item #2
# Loop over the list that makes up this value
for i in nested[0]['resultSets']:
    print(i)
    print('\n')
    print("next i")

#### This is very interesting, in that we can see that, although this is a list, there is only a single element in the list. 

#### That element is a dictionary, as evidenced by the squiggly brackets framing the data. 

While we went through a looping syntax to display this dict, let's address it directly.

Since this dict is the first element of the list (that makes up the value in the key-value pair), note the syntax to get down to it. This syntax builds on what we had above.

In [None]:
# address the list directly, notice the square bracket and index syntax, 
# which is standard for addressing the elements of a list.
# this is the first item in the list
nested[0]['resultSets'][0]

#### So what are the keys that make up this dictionary? Use the same syntax as above to list them.


In [None]:
# Note the next level down of syntax
# remember that we are still only looking at the first element of the list
print('key-value pairs in a loop')
for k,v in nested[0]['resultSets'][0].items():
    print("key:", k)
    print("value:",v,"\n")

#### These key-value pairs are getting down to the data that we would be seeking for our analysis.

What is in the value field for the "headers" key (the list) would be statistics names that we are analyzing. Were we to put this data into a row and column format (such as Excel or pandas), we can see that these are the column headers.

Let's directly address this list.

In [None]:
nested[0]['resultSets'][0]['headers'][0:5]

#### Let's look at the "rowset" key, and its values.

The square brackets at the very start and end of the value tell us that this is a list. Additionally, we can see that each of the individual elements is also a list, so this is a list of lists.

How do we address the top level list? Same as we just did for the "headers" key-value pair. Note that the 'S' in 'rowSet' is capitalized, so we don't run into an error that could throw us off.

In [None]:
print(nested[0]['resultSets'][0]['rowSet'])
nested[0]['resultSets'][0]['rowSet']

#### OK, so this gives us the list of lists, now how do we address each of the list elements?

Just address them using the standard index notation. 

For example, the first item in any list is just list[0]. So the first item in the rowset list is:

In [None]:
print(nested[0]['resultSets'][0]['rowSet'][0][0:6])

#### Finally, we might want to look at the individual data points in this list, which is the lowest level in our nested data structure. 



In [None]:
# we can do this in a loop
for data_element in nested[0]['resultSets'][0]['rowSet'][0]:
    print(data_element)
    
# or we can address each data element directly
print("\ndirect address of data elements")
print(nested[0]['resultSets'][0]['rowSet'][0][0])
print(nested[0]['resultSets'][0]['rowSet'][0][1])

### To review what we have done here, let's look at the syntax of the various levels of the 'nested' data structure.

In [None]:
# At the top level, nested is a list of dicts.
for i in nested:  #loop over the elements of the list
    print(type(i))
    
# to address each dict directly, we must use list index notation
# not using print() here, to cut down on the output
nested[0]
nested[1]  # etc etc etc

# each of these elements themselves are dicts, with 3 keys
print(nested[0].keys())
print(nested[1].keys())

# each key-value pair is its own data structure
# loop over the dictionary, display the key and value pair
# commenting out the print() statements to cut down on the output
for k,v in nested[0].items():
#     print("key:", k)
#     print("value:",v,"\n")
    continue

# Now we can address the 3 keys directly
# Notice the hierarchy of how we are addressing the dict, which is an element of the list
print(nested[0]['resource'])
print("\n")
print(nested[0]['parameters'])
print("\n")
print(nested[0]['resultSets'])

# We see that the 'resultSets' key has the values we need
# so output the value of the resultSets key
# again, note the levels to get down to it
print(nested[0]['resultSets'])

# 'resultSets' has 3 keys, and we are interested in two of them
# the 'headers' key contains the column headers that we want for our data analysis
nested[0]['resultSets'][0]['headers']

# the 'rowsets' key has the player full set of player data
nested[0]['resultSets'][0]['rowSet']

# and the value of the 'rowset' key is a list of lists, with each
# of the component lists containing a row of statistical data
# so we use the list index notation to address each element list
nested[0]['resultSets'][0]['rowSet'][0]

# At the lowest level, each of these individual data elements makes up 
# the list of statistics for the player matchup.
# So we again use the list index notation to address each individual statistic.
nested[0]['resultSets'][0]['rowSet'][0][0]


### Finally, to put together this with something from earlier this week,  we could use the zip function to connect the column headers with their statistical values in either a list or a dictionary.

Just putting out there another application of what we are learning.

In [None]:
headers = nested[0]['resultSets'][0]['headers']
stats = nested[0]['resultSets'][0]['rowSet'][0]

zip_list = list(zip(headers, stats))
print('Create a list of tuples:')
print(zip_list)

print('\n')

from collections import defaultdict
zip_dict=defaultdict()
zip_dict = {header:stat_value for header, stat_value in zip(headers, stats)}
print('Create a dictionary of the headers and stats values:')
print(zip_dict)

#### How about a single list of dictionaries for all of the statistics for this player, Al-Farouq Aminu?

In [None]:
# define the final list
player_list = []
# define the headers, again
headers = nested[0]['resultSets'][0]['headers']


for stats_entry in nested[0]['resultSets'][0]['rowSet']:
#     the next two lines do the same thing, just illustrating what assigning the variable does
    player_dict = {header:stat_value for header, stat_value in zip(headers, stats_entry)}
#     player_dict = {header:stat_value for header, stat_value in zip(nested[0]['resultSets'][0]['headers'], stats_entry)}
    player_list.append(player_dict)

In [None]:
print("Here are the individual dictionaries within the list:\n")
for dict in player_list:
    print(dict)
    print('\n')
print('Here is the full list of dictionaries:\n')         
print("player_list=",player_list)

Finally, let's go again to Python Tutor to visualize the list of dictionaries. Use the second printed output above to copy and paste into Python Tutor. 

Notice how we created the artificial variable "player_list" in our printout, to allow Python Tutor to think that we are assigning this data structure to a variable.

[Python Tutor](https://pythontutor.com/visualize.html#mode=display)