# Watch Me Code 1: Pandas Basics

Using "orders.json" to understand different ways to read JSON in pandas.

- Inspect Orders JSON
- `read_json()`
- `json_normalize()`
- `json_normalize()` with `record_path`



In [1]:
import pandas as pd

In [8]:
fruits = pd.Series(['Apple', 'Banana', 'Cherry', 'Orange', 'Pear'], name = "Fruit")
fruits

0     Apple
1    Banana
2    Cherry
3    Orange
Name: Fruit, dtype: object

In [3]:
!curl  https://raw.githubusercontent.com/mafudge/datasets/master/json-samples/orders.json -o orders.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   672  100   672    0     0   3741      0 --:--:-- --:--:-- --:--:--  3754


```
[
    {
        "Customer" : { "FirstName" : "Abby", "LastName" : "Kuss"}, 
        "Items" : [
            { "Name" : "T-Shirt", "Price" : 10.0, "Quantity" : 3},
            { "Name" : "Jacket", "Price" : 20.0, "Quantity" : 1}
        ]
    },
    {
        "Customer" : { "FirstName" : "Bette", "LastName" : "Alott"}, 
        "Items" : [
            { "Name" : "Shoes", "Price" : 25.0, "Quantity" : 1}, 
            { "Name" : "Jacket", "Price" : 20.0, "Quantity" : 1}
        ]
    },
    {
        "Customer" : { "FirstName" : "Chris", "LastName" : "Peanugget"}, 
        "Items" : [
            { "Name" : "T-Shirt", "Price" : 10.0, "Quantity" : 1}
        ]
    }
]
```

In [5]:
# read_json does not process the depth of nested json
df = pd.read_json("orders.json")
df

Unnamed: 0,Customer,Items
0,"{'FirstName': 'Abby', 'LastName': 'Kuss'}","[{'Name': 'T-Shirt', 'Price': 10.0, 'Quantity'..."
1,"{'FirstName': 'Bette', 'LastName': 'Alott'}","[{'Name': 'Shoes', 'Price': 25.0, 'Quantity': ..."
2,"{'FirstName': 'Chris', 'LastName': 'Peanugget'}","[{'Name': 'T-Shirt', 'Price': 10.0, 'Quantity'..."


In [8]:
# json normalize will break up nested dict but cannot handle list
# json normalize cannot read a file directly
import json
with open ("orders.json","r") as f:
    data = json.load(f)
df = pd.json_normalize(data)
df


Unnamed: 0,Items,Customer.FirstName,Customer.LastName
0,"[{'Name': 'T-Shirt', 'Price': 10.0, 'Quantity'...",Abby,Kuss
1,"[{'Name': 'Shoes', 'Price': 25.0, 'Quantity': ...",Bette,Alott
2,"[{'Name': 'T-Shirt', 'Price': 10.0, 'Quantity'...",Chris,Peanugget


In [9]:
# by setting the `record_path` to the Items key it gets better
# only thing is we lose the parent data in customer
df = pd.json_normalize(data, record_path="Items")
df

Unnamed: 0,Name,Price,Quantity
0,T-Shirt,10.0,3
1,Jacket,20.0,1
2,Shoes,25.0,1
3,Jacket,20.0,1
4,T-Shirt,10.0,1


In [11]:
#final solution includes everything and 
# uses the meta argument to specify paths to the parent data

df = pd.json_normalize(data, record_path="Items", 
                       meta=[["Customer","FirstName"],["Customer","LastName"]])
df

Unnamed: 0,Name,Price,Quantity,Customer.FirstName,Customer.LastName
0,T-Shirt,10.0,3,Abby,Kuss
1,Jacket,20.0,1,Abby,Kuss
2,Shoes,25.0,1,Bette,Alott
3,Jacket,20.0,1,Bette,Alott
4,T-Shirt,10.0,1,Chris,Peanugget


In [12]:
x = [ 
  {"a" : {"b": 1}, "c" : [10,11]},
  {"a" : {"b": 2}, "c" : [21,22,23]}
]

In [14]:
pd.json_normalize(x)

Unnamed: 0,c,a.b
0,"[10, 11]",1
1,"[21, 22, 23]",2


In [15]:
pd.json_normalize(x, record_path="c")

Unnamed: 0,0
0,10
1,11
2,21
3,22
4,23
