# Module 3 - Work with JSON data (30 minutes)

JSON (JavaScript Object Notation) is a syntax for storing and exchanging data. JSON is a very efficient and universal way to store data. It also plays nice with Python.

* Please feel free to ask questions at any time!

In [None]:
!pip install pymongo

In [None]:
from pymongo import MongoClient
import pandas as pd

# Read and Process a MongoDB Collection

1. Connect to MongoDB
2. Connect to the MongoDB instance
3. Query cars data
4. Read queried data into a list

In [None]:
# connect to DB

client = MongoClient('localhost', port=27017)

In [None]:
# connect to instance

db = client.test

* Let's use the 'cars' collection since we already created it.

In [None]:
# query 'cars' data

cars = db.cars
q = cars.find()

In [None]:
# create list of dictionary elements

ls = [row for i, row in enumerate(q)]

In [None]:
# verify

print ('number of dict elements:', len(ls))

# Save Data to JSON

1. Import the JSON library
2. Create a function to save data to JSON
3. Save to JSON
4. Create a function to read JSON data
5. Read JSON
6. Verify that all is well by reading from JSON

In [None]:
# import the json library

import json

In [None]:
# create a utility function to save to JSON

def dump_json(f, d):
    with open(f, 'w') as f:
        json.dump(d, f)

* The function opens a new JSON file and dumps (saves) the dictionary elements into the new file.

In [None]:
# save to JSON

json_file = 'data/cars.json'  # path to the JSON file
dump_json(json_file, ls)

In [None]:
# create utility function to read JSON

def read_json(f):
    with open(f) as f:
        return json.load(f)

* This function reads a JSON file from the given path.

In [None]:
# read the JSON file

cars_data_from_json = read_json(json_file)

### Let's display some data

The JSON file holds data as dictionaries in a list. To access values in each dictionary, we use an index value. So, the first dictionary starts at index 0. To get the elements within the dictionary, we use the index value with the feature column. In this case, we access the first 5 dictionaries and display the **Car** and **Origin** from each.

In [None]:
# display some records from the JSON file

for i, row in enumerate(cars_data_from_json):
    if i < 5:
        print (row['Car'], end =' ')
        print (row['Origin'])

Of course, there are other ways to read and write JSON. But, we like this technique because we have complete control of the data.

# Save CSV Data into JSON

1. Read CSV file into Pandas DataFrame
2. Convert DataFrame to list of dictionary elements
3. Import the JSON library (<font color=red>done earlier</font>)
4. Create a function to save data to JSON (<font color=red>done earlier</font>)
5. Save to JSON
6. Create a function to read JSON data (<font color=red>done earlier</font>)
7. Read JSON
8. Verify that all is well by reading from JSON

In [None]:
# read CSV into Pandas

f = 'data/sales.csv'
df = pd.read_csv(f)

* This is the first time that we've worked with this CSV file!

In [None]:
# convert data into a list of dictionary elements

data = df.to_dict('records')

In [None]:
# save to JSON

json_file = 'data/sales.json'  # path to the JSON file
dump_json(json_file, data)

In [None]:
# read the JSON file

sales_data_from_json = read_json(json_file)

In [None]:
# get features

br = '\n'
features = list(sales_data_from_json[0])
print ('Features:', br)
print (features, br)

# display some records from the JSON file

for i, row in enumerate(sales_data_from_json):
    if i < 5:
        print (row['Country'], end=' ')
        print (row['Item Type'], end=' ')
        total_profit = '${:,.2f}'.format(row['Total Profit'])
        print (total_profit)

* We get the feature columns since we haven't worked with this dataset before.

# Save 'sales' data to MongoDB

In [None]:
# we already have 'sales' data in memory thanks to the magic of Jupyter!

sales = db.sales  # establish collection name

sales.drop()  # a good idea, especially when learning ...

# add 'sales' data to 'sales' collection

for i, row in enumerate(sales_data_from_json):
    row['_id'] = i
    sales.insert_one(row)  # create a new document for every record
    
# verify collection was created

q = sales.find().limit(3)
[(row['Country'], row['Item Type']) for row in q]

In [None]:
# count number of documents in collection

cnt = sales.count_documents({})
cnt

Now, we can work with the 'sales' collection for added practice. 

# Read and Peruse an Existing JSON File

* We created a 'movies' JSON file for added practice.

In [None]:
# read the 'movies' JSON file

json_file = 'data/movies.json'  # path to the JSON file

movies_data_from_json = read_json(json_file)

In [None]:
# get size

len(movies_data_from_json)

In [None]:
# get features

features = list(movies_data_from_json[0])
print ('Features:', br)
print (features)

In [None]:
# peruse JSON

for i, row in enumerate(movies_data_from_json):
    if i < 5:
        print (row['movie_id'], end=' ')
        print (row['title'], end=' ')
        print (row['genres'])

# Save 'movies' data to MongoDB

In [None]:
# we already have 'movies' data in memory

movies = db.movies  # establish collection name

movies.drop()  # a good idea, especially when learning ...

# add 'sales' data to 'sales' collection

for i, row in enumerate(movies_data_from_json):
    row['_id'] = i
    movies.insert_one(row)  # create a new document for every record
    
# verify collection was created

n = 3
q = movies.find().limit(n)

['movie_id', 'title', 'genres']

[(row['title'], row['genres']) for row in q]

In [None]:
# count number of documents in collection

cnt = movies.count_documents({})
cnt

# Module 3 Exercise

* Create a small list (at least 3) of dictonary elements
* Save the list to JSON
* Read the JSON
* Create a query to verify

# Our solution

* Create a small list of dictionary elements

In [None]:
# of course, your example will differ

data = [
    {'_id':0, 'name': 'Catcher in the Rye', 'author':'J. D. Salinger'},
    {'_id':1, 'name': 'Moby Dick', 'author':'Herman Melville'},
    {'_id':2, 'name': 'War and Peace', 'author':'Leo Tolstoy'},
    {'_id':3, 'name': 'Candide', 'author':'Voltaire'}
    ]

In [None]:
# save to JSON

json_file = 'data/books.json'
dump_json(json_file, data)

In [None]:
# Read the JSON

books_data_from_json = read_json(json_file)

In [None]:
# get features

books_data_from_json[0].keys()

In [None]:
# verify

# display some records from the JSON file

for i, row in enumerate(books_data_from_json):
    print (row['name'], end =' by ')
    print (row['author'])

# What did we learn?

1. we converted a MongoDB collection into JSON
2. we converted a CSV file into JSON
3. we read and explored an existing JSON file
4. of course we verified our work
5. we worked through an exercise to sharpen our skills

## Questions?

# <font color=red>5 minute break</font>