# Flattening json 

## Overview
This file contains basic examples about how to flatten json objects

**autor**: Oscar Javier Bastidas Jossa
**email**: oscarjavier.jb@gmail.com

In [1]:
# importing Libraries
import pandas as pd

## 1. When the JSON is a simple dict

In [9]:
a_dict = {
    'school': 'ABC primary school',
    'location': 'London',
    'ranking': 2,
}
df = pd.json_normalize(a_dict)
df

Unnamed: 0,school,location,ranking
0,ABC primary school,London,2


## 2. When the data is a list of dicts

In [10]:
json_list = [
    { 'class': 'Year 1', 'student number': 20, 'room': 'Yellow' },
    { 'class': 'Year 2', 'student number': 25, 'room': 'Blue' },
]
pd.json_normalize(json_list)

Unnamed: 0,class,student number,room
0,Year 1,20,Yellow
1,Year 2,25,Blue


### 2.1 Case when there are NaN values

In [12]:
json_list = [
    { 'class': 'Year 1', 'student number': 20, 'room': 'Yellow' },
    { 'class': 'Year 2', 'room': 'Blue' },
]
pd.json_normalize(json_list)

Unnamed: 0,class,student number,room
0,Year 1,20.0,Yellow
1,Year 2,,Blue


## 3. Flattening a JSON with multiple levels

### 3.1 When the data is a dict

In [17]:
json_obj = {
    'school': 'ABC primary school',
    'location': 'London',
    'ranking': 2,
    'info': {
        'president': 'John Kasich',
        'contacts': {
          'email': {
              'admission': 'admission@abc.com',
              'general': 'info@abc.com'
          },
          'tel': '123456789',
      }
    }
}
pd.json_normalize(json_obj)

Unnamed: 0,school,location,ranking,info.president,info.contacts.email.admission,info.contacts.email.general,info.contacts.tel
0,ABC primary school,London,2,John Kasich,admission@abc.com,info@abc.com,123456789


### 3.2 When the data is a list of dicts


In [24]:
json_list = [
    { 
        'class': 'Year 1', 
        'student count': 20, 
        'room': 'Yellow',
        'info': {
            'teachers': { 
                'math': 'Rick Scott', 
                'physics': 'Elon Mask' 
            }
        }
    },
    { 
        'class': 'Year 2', 
        'student count': 25, 
        'room': 'Blue',
        'info': {
            'teachers': { 
                'math': 'Alan Turing', 
                'physics': 'Albert Einstein' 
            }
        },
    },
]
pd.json_normalize(json_list)

Unnamed: 0,class,student count,room,info.teachers.math,info.teachers.physics
0,Year 1,20,Yellow,Rick Scott,Elon Mask
1,Year 2,25,Blue,Alan Turing,Albert Einstein


## 3. Flattening JSON with a nested list

### 3.1 When the data is a dict

In [25]:
json_obj = {
    'school': 'ABC primary school',
    'location': 'London',
    'ranking': 2,
    'info': {
        'president': 'John Kasich',
        'contacts': {
          'email': {
              'admission': 'admission@abc.com',
              'general': 'info@abc.com'
          },
          'tel': '123456789',
      }
    },
    'students': [
      { 'name': 'Tom' },
      { 'name': 'James' },
      { 'name': 'Jacqueline' }
    ],
}

In [26]:
pd.json_normalize(json_obj)

Unnamed: 0,school,location,ranking,students,info.president,info.contacts.email.admission,info.contacts.email.general,info.contacts.tel
0,ABC primary school,London,2,"[{'name': 'Tom'}, {'name': 'James'}, {'name': ...",John Kasich,admission@abc.com,info@abc.com,123456789


We can see that our nested list is put up into a single column students and other values are flattened. How can we flatten the nested list? To do that, we can set the argument record_path to ['students']:

In [28]:
pd.json_normalize(json_obj, record_path=['students'])

Unnamed: 0,name
0,Tom
1,James
2,Jacqueline


The result looks great but doesn’t include school and tel. To include them, we can use the argument meta to specify a list of metadata we want in the result.

In [29]:
pd.json_normalize(
    json_obj, 
    record_path =['students'],
    meta=['school', ['info', 'contacts', 'tel']],
)

Unnamed: 0,name,school,info.contacts.tel
0,Tom,ABC primary school,123456789
1,James,ABC primary school,123456789
2,Jacqueline,ABC primary school,123456789


### 3.2 When the data is a list of dicts

In [63]:
json_list = [
    { 
        'class': 'Year 1', 
        'student count': 20, 
        'room': 'Yellow',
        'info': {
            'teachers': { 
                'math': 'Rick Scott', 
                'physics': 'Elon Mask' 
            }
        },
        'students': [
            { 
                'name': 'Tom', 
                'sex': 'M', 
                'grades': { 'math': 66, 'physics': 77 } 
            },
            { 
                'name': 'James', 
                'sex': 'M', 
                'grades': { 'math': 80, 'physics': 78 } 
            },
        ]
    },
    { 
        'class': 'Year 2', 
        'student count': 25, 
        'room': 'Blue',
        'info': {
            'teachers': { 
                'math': 'Alan Turing', 
                'physics': 'Albert Einstein' 
            }
        },
        'students': [
            { 'name': 'Tony', 'sex': 'M' },
            { 'name': 'Jacqueline', 'sex': 'F' },
        ]
    },
]
pd.json_normalize(json_list)

Unnamed: 0,class,student count,room,students,info.teachers.math,info.teachers.physics
0,Year 1,20,Yellow,"[{'name': 'Tom', 'sex': 'M', 'grades': {'math'...",Rick Scott,Elon Mask
1,Year 2,25,Blue,"[{'name': 'Tony', 'sex': 'M'}, {'name': 'Jacqu...",Alan Turing,Albert Einstein


All nested lists are put up into a single column students and other values are flattened. To flatten the nested list, we can set the argument record_path to ['students']. Notices that not all records have math and physics, and those missing values are shown as NaN.

In [64]:
pd.json_normalize(json_list, record_path=['students'])

Unnamed: 0,name,sex,grades.math,grades.physics
0,Tom,M,66.0,77.0
1,James,M,80.0,78.0
2,Tony,M,,
3,Jacqueline,F,,


If you would like to include other metadata use the argument meta:

In [65]:
pd.json_normalize(
    json_list, 
    record_path =['students'], 
    meta=['class', 'room', ['info', 'teachers', 'math'], ['info', 'teachers', 'physics']]
)

Unnamed: 0,name,sex,grades.math,grades.physics,class,room,info.teachers.math,info.teachers.physics
0,Tom,M,66.0,77.0,Year 1,Yellow,Rick Scott,Elon Mask
1,James,M,80.0,78.0,Year 1,Yellow,Rick Scott,Elon Mask
2,Tony,M,,,Year 2,Blue,Alan Turing,Albert Einstein
3,Jacqueline,F,,,Year 2,Blue,Alan Turing,Albert Einstein


## 4. The errors argument


In [56]:
data = [
    { 
        'class': 'Year 1', 
        'student count': 20, 
        'room': 'Yellow',
        'info': {
            'teachers': { 
                'math': 'Rick Scott', 
                'physics': 'Elon Mask',
            }
        },
        'students': [
            { 'name': 'Tom', 'sex': 'M' },
            { 'name': 'James', 'sex': 'M' },
        ]
    },
    { 
        'class': 'Year 2', 
        'student count': 25, 
        'room': 'Blue',
        'info': {
            'teachers': { 
                 # no math teacher
                 'physics': 'Albert Einstein'
            }
        },
        'students': [
            { 'name': 'Tony', 'sex': 'M' },
            { 'name': 'Jacqueline', 'sex': 'F' },
        ]
    },
]

pd.json_normalize(
    data, 
    record_path =['students'], 
    meta=['class', 'room', ['info', 'teachers', 'math']],
    errors='ignore',
)

Unnamed: 0,name,sex,class,room,info.teachers.math
0,Tom,M,Year 1,Yellow,Rick Scott
1,James,M,Year 1,Yellow,Rick Scott
2,Tony,M,Year 2,Blue,
3,Jacqueline,F,Year 2,Blue,


To work around it, set the argument errors to 'ignore' and those missing values are filled with NaN.

In [57]:
pd.json_normalize(
    data, 
    record_path =['students'], 
    meta=['class', 'room', ['info', 'teachers', 'math']],
    errors='ignore'
)

Unnamed: 0,name,sex,class,room,info.teachers.math
0,Tom,M,Year 1,Yellow,Rick Scott
1,James,M,Year 1,Yellow,Rick Scott
2,Tony,M,Year 2,Blue,
3,Jacqueline,F,Year 2,Blue,


In [58]:
## 5. Custom Separator using the sep argument

In [69]:
json_list = [
    { 
        'class': 'Year 1', 
        'student count': 20, 
        'room': 'Yellow',
        'info': {
            'teachers': { 
                'math': 'Rick Scott', 
                'physics': 'Elon Mask' 
            }
        },
        'students': [
            { 
                'name': 'Tom', 
                'sex': 'M', 
                'grades': { 'math': 66, 'physics': 77 } 
            },
            { 
                'name': 'James', 
                'sex': 'M', 
                'grades': { 'math': 80, 'physics': 78 } 
            },
        ]
    },
    { 
        'class': 'Year 2', 
        'student count': 25, 
        'room': 'Blue',
        'info': {
            'teachers': { 
                'math': 'Alan Turing', 
                'physics': 'Albert Einstein' 
            }
        },
        'students': [
            { 'name': 'Tony', 'sex': 'M' },
            { 'name': 'Jacqueline', 'sex': 'F' },
        ]
    },
]

pd.json_normalize(
    json_list, 
    record_path =['students'], 
    meta=['class', 'room', ['info', 'teachers', 'math']],
    errors='ignore',
    sep='_'
)

Unnamed: 0,name,sex,grades_math,grades_physics,class,room,info_teachers_math
0,Tom,M,66.0,77.0,Year 1,Yellow,Rick Scott
1,James,M,80.0,78.0,Year 1,Yellow,Rick Scott
2,Tony,M,,,Year 2,Blue,Alan Turing
3,Jacqueline,F,,,Year 2,Blue,Alan Turing


## 6. Adding prefix for meta and record data


In [72]:
pd.json_normalize(
    json_list, 
    record_path=['students'], 
    meta=['class'],
    meta_prefix='meta_',
    record_prefix='student_'
)

Unnamed: 0,student_name,student_sex,student_grades.math,student_grades.physics,meta_class
0,Tom,M,66.0,77.0,Year 1
1,James,M,80.0,78.0,Year 1
2,Tony,M,,,Year 2
3,Jacqueline,F,,,Year 2
