* Different approaches to process JSON data
* Getting started with json module
* Converting JSON string to dict
* Converting dict to JSON string
* Read JSON Array from file
* Read JSON Documents from file
* Write list of dicts to file
* Write dicts to file (one per line)
* Exercise and Solution

In [None]:
# Different approaches to process JSON data
# json (backend for web or mobile applications)
# pandas
# pyspark

In [None]:
# Getting started with json module
import json

In [None]:
# json.load
# json.loads
# json.dump
# json.dumps

In [None]:
# Converting JSON string to dict
user_str = '{"user_id": 1, "first_name": "Durga", "last_name": "Gadiraju", "salary": 1500.00}'

In [None]:
type(user_str)

In [None]:
user_str

In [None]:
json.loads(user_str)

In [None]:
type(json.loads(user_str))

In [None]:
# Converting dict to JSON string
sd = {
    'id': 1,
    'name': 'Durga Gadiraju',
    'scores': [92, 89, 91]
}

In [None]:
type(sd)

In [None]:
json.dumps(sd)

In [None]:
# Read JSON Array from file
help(json.load)

In [None]:
with open('data/sales/part-00000.json') as fp:
    data = json.load(fp)

In [None]:
type(data)

In [None]:
data

In [None]:
len(data)

In [None]:
[sale['sale_amount'] for sale in data]

In [None]:
sum(sale['sale_amount'] for sale in data)

In [None]:
# Reading JSON Documents from file
with open('data/retail_db_json/departments/part-r-00000-3db7cfae-3ad2-4fc7-88ff-afe0ec709f49') \
    as fp:
    data = fp.read().splitlines()

In [None]:
data

In [None]:
[json.loads(rec) for rec in data]

In [None]:
with open('data/sales/user.json') as fp:
    data = json.load(fp)

In [None]:
type(data)

In [None]:
data

In [None]:
data.keys()

In [None]:
data.values()

In [None]:
data['phoneNumbers'][0]['number']

In [None]:
# Write list of dicts to file
sales = [
    {"sale_id": 1, "sale_rep_id": 101, "sale_amount": 500.00, "commisson_pct": 5},
    {"sale_id": 2, "sale_rep_id": 102, "sale_amount": 250.00, "commisson_pct": 3},
    {"sale_id": 3, "sale_rep_id": 103, "sale_amount": 750.00, "commisson_pct": 8},
    {"sale_id": 4, "sale_rep_id": 104, "sale_amount": 1000.00, "commisson_pct": None},
    {"sale_id": 5, "sale_rep_id": 101, "sale_amount": 300.00, "commisson_pct": -1}
]

In [None]:
type(sales)

In [None]:
help(json.dump)

In [None]:
with open('data/sales/dummy1.json', 'w') as fp:
    json.dump(sales, fp)

In [None]:
# Write dicts to file (one per line)
sales = [
    {"sale_id": 1, "sale_rep_id": 101, "sale_amount": 500.00, "commisson_pct": 5},
    {"sale_id": 2, "sale_rep_id": 102, "sale_amount": 250.00, "commisson_pct": 3},
    {"sale_id": 3, "sale_rep_id": 103, "sale_amount": 750.00, "commisson_pct": 8},
    {"sale_id": 4, "sale_rep_id": 104, "sale_amount": 1000.00, "commisson_pct": None},
    {"sale_id": 5, "sale_rep_id": 101, "sale_amount": 300.00, "commisson_pct": -1}
]

In [None]:
sales_strs = [json.dumps(sale) for sale in sales]

In [None]:
sales_strs

In [None]:
with open('data/sales/dummy2.json', 'w') as fp:
    for sale in sales_strs:
        fp.write(sale)
        fp.write('\n')

In [None]:
# Convert CSV to JSON File (real world example)
import csv
with open('data/sales/part-00000') as fp:
    data = csv.DictReader(fp.read().splitlines())

In [None]:
sales_strs = [json.dumps(sale) for sale in data]

In [None]:
sales_strs

In [None]:
with open('data/sales/dummy3.json', 'w') as fp:
    for sale in sales_strs:
        fp.write(sale)
        fp.write('\n')

* Exercise 1: Read data from below string which contain array of students and compute total score for each student
```python
student_scores = '[{"sid": 1, "scores": [91, 88, 90]}, {"sid": 2, "scores": [75, 79, 65]}, {"sid": 3, "scores": [82, 88, 78]}, {"sid": 4, "scores": [95, 69, 72]}, {"sid": 5, "scores": [88, 81, 85]}]'
```
  * Make sure the list of dicts is created with `sid` and `ts` as keys. The values should be `sid` from student_scores and then sum of scores (result of adding all the scores in `scores`).
  * Sort the data in the newly created list by `ts` (total score) in descending order (using `sorted`)

* Exercise 2: Read data from `data/sales/part-00000` and write in the form of JSON to `data/sales_json/part-00000`. 
  * Filter for valid sales records where commission_pct is not none and greater than zero.
  * Make sure type conversion is taken care.
  * Use **sale_id, sale_rep_id, sale_amount, commission_pct** as column names for respective values read from the file.
  * Make sure to have one valid json per line in the file `data/sales_json/part-00000`.