# Read JSON Format Data

JSoN stands for JavaScript Object Notation (although it is not JavaScript!), and is a data format similar to Python dictionaries.
`JavaScript Object Notation (JSON)` was inspired by a subset of the JavaScript programming language dealing with object literal syntax.

### Creating JSON Data

In [1]:
# Python comes with a built-in package called json for encoding and decoding JSON data.
import json

In [2]:
# Create Data
# JSON supports primitive types, like strings and numbers, as well as nested lists and objects.

data = {
    "pets": [
    {
        "name": 'Goku',
        "type": 'dog',
        "age": 4,
        "fav_food" : 'bone',
        "hobbies" : ['whining', 'licking butts']   
    },
    {
        "name": 'Mika',
        "type": 'dog',
        "age": 5,
        "fav_food" : 'bacon',
        "hobbies" : ['relaxing', 'ripping toys']
    },
    {
        "name": 'Brunson',
        "type": 'cat',
        "age": 5,
        "fav_food" : 'undecided',
        "hobbies" : 'being outside',
    },
    {
        "name": 'Tiny',
        "type": 'dog',
        "age": 4,
        "fav_food" : 'steak',
        "hobbies" : ['barking', 'sitting outside']
    }

] }

In [3]:
type(data)

dict

## Serialization & deserialization
The process of encoding JSON is usually called `serialization`. This term refers to the transformation of data into a series of bytes (hence serial) to be stored or transmitted across a network. Naturally, `deserialization` is the reciprocal process of decoding data that has been stored or delivered in the JSON standard.

## Serialization

![image.png](attachment:image.png)

## Difference between .dump and dumps
- The json library exposes the `dump()` method for writing data to files
- There is also a `dumps()` method (pronounced as “dump-s”) for writing to a Python string.

In [4]:
# Write data to JSON file or Serialization

with open('petdata.json', 'w') as write_file:
    json.dump(data, write_file)

In [5]:
# or you can write the data to a Python string object
# this is to be able to do something to the JSON objects in-memory

pet_string = json.dumps(data)
pet_string

'{"pets": [{"name": "Goku", "type": "dog", "age": 4, "fav_food": "bone", "hobbies": ["whining", "licking butts"]}, {"name": "Mika", "type": "dog", "age": 5, "fav_food": "bacon", "hobbies": ["relaxing", "ripping toys"]}, {"name": "Brunson", "type": "cat", "age": 5, "fav_food": "undecided", "hobbies": "being outside"}, {"name": "Tiny", "type": "dog", "age": 4, "fav_food": "steak", "hobbies": ["barking", "sitting outside"]}]}'

In [6]:
type(pet_string)

str

In [7]:
#let's make it look more readable

pet_string = json.dumps(data, indent=4)
print(pet_string)


{
    "pets": [
        {
            "name": "Goku",
            "type": "dog",
            "age": 4,
            "fav_food": "bone",
            "hobbies": [
                "whining",
                "licking butts"
            ]
        },
        {
            "name": "Mika",
            "type": "dog",
            "age": 5,
            "fav_food": "bacon",
            "hobbies": [
                "relaxing",
                "ripping toys"
            ]
        },
        {
            "name": "Brunson",
            "type": "cat",
            "age": 5,
            "fav_food": "undecided",
            "hobbies": "being outside"
        },
        {
            "name": "Tiny",
            "type": "dog",
            "age": 4,
            "fav_food": "steak",
            "hobbies": [
                "barking",
                "sitting outside"
            ]
        }
    ]
}


## Load in JSON Data
### Deserialize JSON
`deserialization` is the reciprocal process of decoding data that has been stored or delivered in the JSON standard.
- Technically, this conversion isn’t a perfect inverse to the serialization table. That basically means that if you encode an object now and then decode it again later, you may not get exactly the same object back.
![image.png](attachment:image.png)

In [8]:
#bring back data from petdata.json file
with open("petdata.json", 'r') as read_file:
    dataJ = json.load(read_file)
    
dataJ

{'pets': [{'name': 'Goku',
   'type': 'dog',
   'age': 4,
   'fav_food': 'bone',
   'hobbies': ['whining', 'licking butts']},
  {'name': 'Mika',
   'type': 'dog',
   'age': 5,
   'fav_food': 'bacon',
   'hobbies': ['relaxing', 'ripping toys']},
  {'name': 'Brunson',
   'type': 'cat',
   'age': 5,
   'fav_food': 'undecided',
   'hobbies': 'being outside'},
  {'name': 'Tiny',
   'type': 'dog',
   'age': 4,
   'fav_food': 'steak',
   'hobbies': ['barking', 'sitting outside']}]}

In [9]:
type(dataJ)

dict

In [10]:
#or load from in-memory string

data = json.loads(pet_string)
data

{'pets': [{'name': 'Goku',
   'type': 'dog',
   'age': 4,
   'fav_food': 'bone',
   'hobbies': ['whining', 'licking butts']},
  {'name': 'Mika',
   'type': 'dog',
   'age': 5,
   'fav_food': 'bacon',
   'hobbies': ['relaxing', 'ripping toys']},
  {'name': 'Brunson',
   'type': 'cat',
   'age': 5,
   'fav_food': 'undecided',
   'hobbies': 'being outside'},
  {'name': 'Tiny',
   'type': 'dog',
   'age': 4,
   'fav_food': 'steak',
   'hobbies': ['barking', 'sitting outside']}]}

In [11]:
type(data)

dict

## NBA JSON Data

- The games dataset has been collected by Sports Reference LLC. It contains around 32K nested documents representing NBA games in the period 1985-2013. Each document represents a game between two teams with at least 11 players each. 


Data source: https://data.mendeley.com/datasets/ct8f9skv97/1

In [12]:
import pandas as pd
import json

In [30]:

 filepath = 'nbagames.json'

In [31]:
# data is individual JSON objects not separated by commas
# will load data as JSON and put in bigger list to be able to extract values from keys

data = []

# Each line inside Json file doesn;t have a separator like comma
with open(filepath) as file:
    for line in file:
        data.append(json.loads(line))

In [32]:
#verify that this is a list type
type(data)

list

In [33]:
# check number of items in data list
# this is the total number of games in dataset
# Each small dictionary is a game
len(data)

31686

#### Verify values within the data structure

In [1]:
#each index level is a basketball game
data[0]

NameError: name 'data' is not defined

In [None]:
# The first element within data List is a dictionary
type(data[0])

In [None]:
# Each element in the list is a game who has two different teams in the same date, but remember that each date
# could have differente games in different hours
data[0].keys()

In [None]:
#Second Row in the file
data[1].keys()

In [None]:
#teams within each basketball game
data[0]['teams']

In [None]:
type(data[0]['teams'])

In [None]:
type(data[0]['teams'][0])

In [None]:
data[0]['teams'][0].keys()

In [None]:
data[0]['teams'][0]['name']

In [None]:
#Second team
data[0]['teams'][1]['name']

In [None]:
#get a team's city abbreviation
data[0]['teams'][0]['abbreviation']

In [None]:
#get a team's  home
data[0]['teams'][0]['home']

In [None]:
#get a team's  score
data[0]['teams'][0]['score']

In [None]:
#get a team's won
data[0]['teams'][0]['won']

In [None]:
#get a team's won
data[0]['teams'][1]['won']

In [None]:
data[0]['teams'][0]['players']

In [None]:
data[0]['teams'][0]['players'][0].keys()

In [None]:
#get a player's name
data[0]['teams'][0]['players'][0]['player']

In [None]:
data[0]['teams'][0]['city']

In [None]:
# City of the second team
data[0]['teams'][1]['city']

In [None]:
type(data[0]['date'])

In [None]:
data[0]['date'].keys()

In [None]:
#get the date of a game
data[0]['date']['$date']

In [None]:
#initialize variables for empty lists to hold data

datels = [] #date of game
abrvls = [] #city abbreviation
cityls = [] #name of city
homels = []   #T/F if home game
namels = [] #team name
playersls = [] #list of players that played in the game
scorels = []   #final score for game
wonls = []  #0/1 if won

In [None]:
#fill lists with data

for game in data:  #data[index]
    
    #add date to list twice for each team playing the game that day
    datels.append(game['date']['$date'])
    datels.append(game['date']['$date'])
    
    for team in game['teams']: #data[index]['teams'][index]
        abrvls.append(team['abbreviation']) 
        cityls.append(team['city'])
        homels.append(team['home'])
        namels.append(team['name'])
        scorels.append(team['score'])
        wonls.append(team['won'])
        
        members = [] #hold list of players to add to playerls
        
        for player in team['players']:  #data[index]['teams'][index]['players'][index]
            members.append(player['player'])
        
        playersls.append(members)

In [None]:
#verify amount of informaton in date list
len(datels)

In [None]:
#see first 20 values within date list
datels[:20]

In [None]:
#check that players is a list of lists
#first 5 items in players list
playersls[:5]

In [None]:
#check that values for each list add up to be the same

print(len(datels)) #date of game
print(len(abrvls)) #city abbreviation
print(len(cityls)) #name of city
print(len(homels))  #T/F if home game
print(len(namels)) #team name
print(len(playersls)) #list of players that played in the game
print(len(scorels))  #final score for game
print(len(wonls))  #0/1 if won

#### Make gathered information into dataframe

In [None]:
#zip lists together into one list
#will put in order I want my columns to be
NBAlist = list(zip(datels, namels, abrvls, cityls, homels, scorels, wonls, playersls))

#make list of column names
names = ['date', 'team_name', 'abbrv', 'city', 'home_game', 'score', 'won_game', 'players']

In [None]:
#make the dataframe
df = pd.DataFrame(NBAlist, columns=names)

df.head()