---

<center><h1>📍 📍 JSON FILES📍 📍</h1></center>


---

---

***In this notebook, we will see how to read `JSON` files and what are the different challenges that we face while reading `JSON` files and how to tackle them.***

---


***Install pandas with the command:*** 

If you are using anaconda with python3: ***`!pip install pandas`***

If you are using jupyter with python3: ***`!pip3 install pandas`***

---

#### `TABLE OF CONTENTS`
 - ***Reading the JSON file.***
 - ***Challenges with reading JSON files.***
     - Reading JSON files written as records.
 - ***JSON Library***
     - Reading Nested JSON
     - Filter JSON
     - Wrting JSON files
 - ***Create a dataframe from a python dictionary***
 - ***Create a dataframe from an array of dictionaries.***

---

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format. It is primarily used for transmitting data between a web application and a server. 

---

#### `READING JSON FILES`


---

In [1]:
# import the pandas library
import pandas as pd

In [2]:
# read the json file
data = pd.read_json('datasets/simple.json')

In [3]:
# print the top rows of the dataframe
data.head()

Unnamed: 0,name,age,grade
0,Andew,12,A
1,Bhuvan,18,B
2,Clinton,11,A
3,Drake,12,C
4,Eisha,13,B


#### ***`Challenges with reading JSON files.`***

     - Reading JSON files written as records.

---

#### Some of the json files are written as records i.e each json object is written in a line. 

***For example:***

{ 'name' : 'Lakshay', 'roll_no' : '100' } # line 1

{ 'name' : 'Sanad'  , 'roll_no' : '101' } # line 2

.

.

.

.

.

.

.

.

.

{ name : 'Aravind',  'roll_no' : '200' } # line 101

---

In [5]:
# read the data
data_with_records = pd.read_json('datasets/simple_records.json',lines=True)

---

***If we try to read these type of files direclty, you will get an error. So resolve this error, you need to pass the parameter `lines=True`***

---

In [6]:
# read json files with records 
data_with_records = pd.read_json('datasets/simple_records.json',lines=True)
data_with_records.head()

Unnamed: 0,name,age,grade
0,Andew,12,A
1,Bhuvan,18,B
2,Clinton,11,A
3,Drake,12,C
4,Eisha,13,B


put this in the last

---
---

#### `JSON Module of Standard Library`

Most of the json files are nested and we cannot directly import them into a dataframe. We first need to clean and filter the json file in order to convert it into a dataframe.

---

#### `READING A JSON FILE`

In [11]:
# Can't read nested json in pandas
pd.read_csv('datasets/nested.json')

ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2


In [12]:
# importing the json module of standard library
import json

In [13]:
# open and load the data in json file
with open('datasets/nested.json') as f :
    my_json_data = json.load(f)

print(my_json_data)

[{'student_roll_no': 101, 'details': {'name': 'Andew', 'age': 12, 'grade': 'A'}}, {'student_roll_no': 102, 'details': {'name': 'Bhuvan', 'age': 18, 'grade': 'B'}}, {'student_roll_no': 103, 'details': {'name': 'Clinton', 'age': 11, 'grade': 'A'}}, {'student_roll_no': 104, 'details': {'name': 'Drake', 'age': 12, 'grade': 'C'}}, {'student_roll_no': 105, 'details': {'name': 'Eisha', 'age': 13, 'grade': 'B'}}, {'student_roll_no': 106, 'details': {'name': 'Farhan', 'age': 22, 'grade': 'C'}}, {'student_roll_no': 107, 'details': {'name': 'Garima', 'age': 11, 'grade': 'A'}}, {'student_roll_no': 108, 'details': {'name': 'Himanshu', 'age': 19, 'grade': 'A'}}, {'student_roll_no': 109, 'details': {'name': 'Ishaan', 'age': 10, 'grade': 'D'}}, {'student_roll_no': 110, 'details': {'name': 'Jason', 'age': 9, 'grade': 'B'}}]


***Pretty Print: https://docs.python.org/3/library/pprint.html***

- To view the data in the structured way.

---

In [14]:
# use pprint or (pretty print) to print the data in the structured format
from pprint import pprint
pprint(my_json_data)

[{'details': {'age': 12, 'grade': 'A', 'name': 'Andew'},
  'student_roll_no': 101},
 {'details': {'age': 18, 'grade': 'B', 'name': 'Bhuvan'},
  'student_roll_no': 102},
 {'details': {'age': 11, 'grade': 'A', 'name': 'Clinton'},
  'student_roll_no': 103},
 {'details': {'age': 12, 'grade': 'C', 'name': 'Drake'},
  'student_roll_no': 104},
 {'details': {'age': 13, 'grade': 'B', 'name': 'Eisha'},
  'student_roll_no': 105},
 {'details': {'age': 22, 'grade': 'C', 'name': 'Farhan'},
  'student_roll_no': 106},
 {'details': {'age': 11, 'grade': 'A', 'name': 'Garima'},
  'student_roll_no': 107},
 {'details': {'age': 19, 'grade': 'A', 'name': 'Himanshu'},
  'student_roll_no': 108},
 {'details': {'age': 10, 'grade': 'D', 'name': 'Ishaan'},
  'student_roll_no': 109},
 {'details': {'age': 9, 'grade': 'B', 'name': 'Jason'}, 'student_roll_no': 110}]


---
---
#### `PROBLEM`

***Create a new json file contains the age and name of the people whose age is greater than 15.***

---

In [15]:
# We saw that data in the file is in json list form.
# iterate through json 

data_0 = my_json_data[0]
data_0

{'student_roll_no': 101, 'details': {'name': 'Andew', 'age': 12, 'grade': 'A'}}

In [16]:
data_0['details']

{'name': 'Andew', 'age': 12, 'grade': 'A'}

In [17]:
data_0['details']['age']

12

In [18]:
# iterate through the json data
for data in my_json_data:
    print(data['details']['age'])

12
18
11
12
13
22
11
19
10
9


In [19]:
# create a new empty list to store the filtered data
filtered_data = []

# iterate through the json data
for data in my_json_data:
    
    # create new empty dictionary
    filtered_variable = {}
    
    # check for the condition
    if data['details']['age'] > 15:
        # if condition satisfies, store the age and name
        filtered_variable['age'] = data['details']['age']
        filtered_variable['name']= data['details']['name']
        filtered_data.append(filtered_variable)

In [20]:
# check the filtered data
filtered_data

[{'age': 18, 'name': 'Bhuvan'},
 {'age': 22, 'name': 'Farhan'},
 {'age': 19, 'name': 'Himanshu'}]

In [21]:
filtered_data[0].values()

dict_values([18, 'Bhuvan'])

#### WRITING A JSON FILE

---

In [22]:
# put the filtered data into the new json file
with open('datasets/filtered.json','w') as f:
    
    json.dump(filtered_data, f, indent=4)

---