## About this Notebook

This notebook will take us through:

* refresher on how to read a csv using the csv library
* comparison to using pandas and some exploratory analysis
* writing using the csv library
* working with JSON files

You've already seen how to use the pandas library to access data in specific file formats, such as CSV. 

In this notebook, we will investigate how to use Python's built-in CSV library in a similar way. We'll begin by importing the CSV module, and opening a file.

We will start with the [City of Calgary's School Enrolment Data](https://data.calgary.ca/Demographics/School-Enrolment-Data/9qye-mibh). This data is a subset of a dataset provided by the Government of Alberta (https://open.alberta.ca/opendata) and Alberta Education website (https://education.alberta.ca/).

Contains information licensed under the Open Government Licence – City of Calgary.

In [125]:
import csv

with open('School_Enrolment_Data.csv') as my_csv:
    my_csv_reader = csv.reader(my_csv)
    for row in my_csv_reader:
        print (','.join(row))

School Year,School Authority Category,School Authority Name,School Authority Code,School Name,School Code,ECS,Grade 1,Grade 2,Grade 3,Grade 4,Grade 5,Grade 6,Grade 7,Grade 8,Grade 9,Grade 10,Grade 11,Grade 12,Total
2013_2014,Public,Calgary School District No. 19,3030,Midsun School,348,,,,,,,,249,248,287,,,,784
2013_2014,Public,Calgary School District No. 19,3030,The Hamptons School,387,35,46,42,41,34,,,,,,,,,198
2013_2014,Public,Calgary School District No. 19,3030,Hidden Valley School,535,120,123,110,113,,,,,,,,,,466
2013_2014,Public,Calgary School District No. 19,3030,Crossing Park School,536,112,112,113,115,113,114,113,113,114,105,,,,1124
2013_2014,Public,Calgary School District No. 19,3030,Battalion Park School,537,96,105,107,113,99,91,89,,,,,,,700
2013_2014,Public,Calgary School District No. 19,3030,Monterey Park School,538,80,62,91,97,73,80,73,,,,,,,556
2013_2014,Public,Calgary School District No. 19,3030,Douglasdale School,596,57,74,57,71,58,4,2,,,,,,,323
2013_2014,Public,Calgary

Now use pandas to open the same file. Try exploring the data a little bit. Some suggestions:

* In 2015-16, how many schools had students in Grade 7?
* How many different school authorities are there (use 2020-2021 to come up with your answer).
* In 2018-17, how many ECS students were there?

What is a question you have about this data? Write some code to answer your question.

Python's csv module also has a writer object that can be used to write CSV files. 

In [126]:
with open("mycsvfile.csv", 'w') as my_csv_output:
    my_csv_writer = csv.writer(my_csv_output)
    my_csv_writer.writerow(['Hello', 'World', '!'])

Why might we use this csv module instead of pandas? Do you see any features in this library which may not be included in pandas?

What kinds of things can you do with pandas, that are not possible with this module?

## JSON objects and files

JSON can be used to write certain kinds of Python data structures as a file. To do this natively in Python, we can use the `json` module. To convert these into a JSON object, we can use the `dumps()` or `dump()` methods.

In [127]:
import json

myList = ['1', '2', '3']
myDict = {'Woodbine':57, 'Whitehorn':123}


# open a file for reading, and write both myList and myDict to the file. Then, close the file.


print(json.dumps(myList))
print(json.dumps(myDict))

# now open a new file for writing and use the dump() method to write this object to the file



["1", "2", "3"]
{"Woodbine": 57, "Whitehorn": 123}


Now let's work with a pre-existing dataset. We will use results from the [Government of Canada's Algorithmic Impact Assessment for the ATIP Online Request Service](https://open.canada.ca/data/dataset/cea9985f-5e0f-425e-9b7e-e1d122272c56/resource/5678a163-bfaa-4006-b655-75c5fe421d58/download/atip-digital-services-aia.json). 

In [128]:
with open("atip-digital-services-aia.json") as my_json:
    %time aitp_info = json.load(my_json)
    print (aitp_info)

CPU times: user 441 µs, sys: 610 µs, total: 1.05 ms
Wall time: 813 µs
{'version': 'v0.8', 'currentPage': 12, 'data': {'projectDetailsRespondent': 'W Herbert', 'projectDetailsJob': 'Senior Analyst', 'projectDetailsDepartment-NS': '056', 'projectDetailsBranch': 'CIOB', 'projectDetailsTitle': 'ATIP Digital Services', 'projectDetailsPhase': 'item2', 'projectDetailsDescription': 'Simple central website for Canadians to submit ATIP requests', 'businessDrivers1': ['item2', 'item5'], 'riskProfile1': 'item1-3', 'riskProfile2': 'item2-0', 'riskProfile3': 'item2-0', 'riskProfile4': 'item2-0', 'projectAuthority1': 'item1-2', 'aboutSystem1': ['item2', 'item4'], 'aboutAlgorithm1': 'item1-3', 'aboutAlgorithm2': 'item1-3', 'impact1': 'item1-1', 'impact2': 'item1-3', 'impact3': 'item2-0', 'impact5': 'item1-4', 'impact6': 'item1-1', 'impact7': 'item1-1', 'impact8': 'a misdirected request would immediately be redirected to the appropriate GoC institution', 'impact9': 'item1-1', 'impact10': 'does not prev

What are the equivalent methods you would use in pandas to work with JSON objects? Give them a try in the cell below. 

In [129]:
import pandas as pd

%time json_data = pd.read_json("atip-digital-services-aia.json")

json_data

CPU times: user 4.28 ms, sys: 312 µs, total: 4.59 ms
Wall time: 4.51 ms


Unnamed: 0,version,currentPage,data
aboutAlgorithm1,v0.8,12,item1-3
aboutAlgorithm2,v0.8,12,item1-3
aboutDataSource1,v0.8,12,item2-0
aboutDataSource2,v0.8,12,item1-0
aboutDataSource3,v0.8,12,item2-1
...,...,...,...
projectDetailsTitle,v0.8,12,ATIP Digital Services
riskProfile1,v0.8,12,item1-3
riskProfile2,v0.8,12,item2-0
riskProfile3,v0.8,12,item2-0


What happens when you use a JSON file that has a different structure? Try downloading some data from the JSON Generator (https://www.json-generator.com/), and trying to work with this in Python.

In [130]:
#with open("generated_data.json") as gen_json:
#   generated_info = json.load(gen_json)
#    print (generated_info)


generated_data = pd.read_json("generated_data.json")
generated_data

Unnamed: 0,_id,index,guid,isActive,balance,picture,age,eyeColor,name,gender,...,phone,address,about,registered,latitude,longitude,tags,friends,greeting,favoriteFruit
0,6179815d6e8e5559873076e3,0,3a509b67-885d-4bce-b01d-e96aefee0bf7,True,"$1,157.12",http://placehold.it/32x32,35,brown,Susanna Reynolds,female,...,+1 (811) 555-3654,"482 Danforth Street, Albany, Connecticut, 3698",Do duis dolor ex exercitation esse velit do ex...,2020-09-18T02:20:58 +06:00,3.847462,-3.795569,"[cupidatat, proident, aliqua, irure, ut, id, e...","[{'id': 0, 'name': 'Edna Herman'}, {'id': 1, '...","Hello, Susanna Reynolds! You have 3 unread mes...",strawberry
1,6179815d89fc077e9ced7558,1,5307801a-f651-436c-9dfe-b11ce838e736,True,"$3,127.70",http://placehold.it/32x32,39,brown,Chen West,male,...,+1 (978) 488-2004,"913 Lorimer Street, Grandview, Hawaii, 8756",Duis enim ex labore esse do. In reprehenderit ...,2016-12-26T08:15:32 +07:00,38.252197,120.615941,"[nostrud, dolor, nostrud, ad, voluptate, exerc...","[{'id': 0, 'name': 'Smith Hall'}, {'id': 1, 'n...","Hello, Chen West! You have 10 unread messages.",strawberry
2,6179815eb522c464e1930ef1,2,014cce4d-906d-42af-bd16-8bb1edcccdd7,True,"$2,215.78",http://placehold.it/32x32,36,brown,Nolan Smith,male,...,+1 (814) 491-3162,"337 Evans Street, Canoochee, Puerto Rico, 2864",Duis eiusmod officia non ex amet. Laboris et d...,2015-03-21T11:58:12 +06:00,22.379612,-147.970211,"[tempor, aliqua, irure, velit, officia, sunt, ...","[{'id': 0, 'name': 'Knight Ward'}, {'id': 1, '...","Hello, Nolan Smith! You have 9 unread messages.",strawberry
3,6179815ec1d2142de641395f,3,577577a5-fa80-465a-b549-23e805f8e193,False,"$3,758.68",http://placehold.it/32x32,33,brown,Morris Sanford,male,...,+1 (960) 427-3102,"442 Liberty Avenue, Logan, Illinois, 8643",Non dolor ullamco consectetur qui magna ipsum ...,2018-05-22T06:07:01 +06:00,79.649244,10.84846,"[magna, Lorem, ut, consequat, in, magna, eu]","[{'id': 0, 'name': 'Stephenson Charles'}, {'id...","Hello, Morris Sanford! You have 2 unread messa...",apple
4,6179815e5af1a10268c58f99,4,fe19407d-2e66-4909-9c6e-2327f343ce20,True,"$2,454.98",http://placehold.it/32x32,34,green,Jan Rivers,female,...,+1 (990) 478-3221,"966 Chase Court, Hartsville/Hartley, Palau, 139",Non aliquip ad magna anim aliquip consequat. C...,2020-10-25T03:39:18 +06:00,21.108885,22.746014,"[nulla, Lorem, anim, aliquip, ex, veniam, magna]","[{'id': 0, 'name': 'Katy Wright'}, {'id': 1, '...","Hello, Jan Rivers! You have 9 unread messages.",strawberry
5,6179815e2aa35c5cc8c6cb59,5,4909fb18-6819-492d-9a09-a124415bea4b,False,"$1,723.20",http://placehold.it/32x32,25,green,Terry Hines,female,...,+1 (844) 579-2960,"549 Lawrence Avenue, Eden, Wyoming, 5578",Ipsum et eiusmod pariatur pariatur ipsum conse...,2018-07-16T03:44:19 +06:00,-55.20682,-15.667406,"[adipisicing, laborum, sit, incididunt, volupt...","[{'id': 0, 'name': 'Adams Jensen'}, {'id': 1, ...","Hello, Terry Hines! You have 1 unread messages.",strawberry


Since each person might have a different number of friends, instead we want to create a table where one can find a person's friends.

In [131]:
generated_friends = pd.Series(generated_data['friends'])
generated_friends



0    [{'id': 0, 'name': 'Edna Herman'}, {'id': 1, '...
1    [{'id': 0, 'name': 'Smith Hall'}, {'id': 1, 'n...
2    [{'id': 0, 'name': 'Knight Ward'}, {'id': 1, '...
3    [{'id': 0, 'name': 'Stephenson Charles'}, {'id...
4    [{'id': 0, 'name': 'Katy Wright'}, {'id': 1, '...
5    [{'id': 0, 'name': 'Adams Jensen'}, {'id': 1, ...
Name: friends, dtype: object

Is there something that may be an issue in this table?

The pandas library has methods to `read_json()` and write `to_json()`. Is this any different from reading or writing to and from a csv? If there are differences, when do you think those would happen?

## JSON Scavenger Hunt

Find and print out the `Treasure` in the nested JSON.

In [132]:
df = pd.read_json("expenses.json")
df

Unnamed: 0,WHO,WEEK
0,Joe,"[{'NUMBER': 3, 'EXPENSE': [{'WHAT': 'Beverage'..."
1,Beth,"[{'NUMBER': 3, 'EXPENSE': [{'WHAT': 'Beverage'..."
2,Janet,"[{'NUMBER': 3, 'EXPENSE': [{'WHAT': 'Car', 'AM..."


In [133]:
with open("expenses.json") as my_json:
    nested_json = json.load(my_json)
    print (nested_json)

[{'WHO': 'Joe', 'WEEK': [{'NUMBER': 3, 'EXPENSE': [{'WHAT': 'Beverage', 'AMOUNT': 18.0}, {'WHAT': 'Food', 'AMOUNT': 12.0}, {'WHAT': 'Food', 'AMOUNT': 19.0}, {'WHAT': 'Car', 'AMOUNT': 20.0}]}, {'NUMBER': 4, 'EXPENSE': [{'WHAT': 'Beverage', 'AMOUNT': 19.0}, {'WHAT': 'Beverage', 'AMOUNT': 16.0}, {'WHAT': 'Food', 'AMOUNT': 17.0}, {'WHAT': 'Food', 'AMOUNT': 17.0}, {'WHAT': 'Beverage', 'AMOUNT': 14.0}]}, {'NUMBER': 5, 'EXPENSE': [{'WHAT': 'Beverage', 'AMOUNT': 14.0}, {'WHAT': 'Food', 'AMOUNT': 12.0}]}]}, {'WHO': 'Beth', 'WEEK': [{'NUMBER': 3, 'EXPENSE': [{'WHAT': 'Beverage', 'AMOUNT': 16.0}]}, {'NUMBER': 4, 'EXPENSE': [{'WHAT': 'Treasure', 'AMOUNT': 0.0}, {'WHAT': 'Beverage', 'AMOUNT': 15.0}]}, {'NUMBER': 5, 'EXPENSE': [{'WHAT': 'Food', 'AMOUNT': 12.0}, {'WHAT': 'Beverage', 'AMOUNT': 20.0}]}]}, {'WHO': 'Janet', 'WEEK': [{'NUMBER': 3, 'EXPENSE': [{'WHAT': 'Car', 'AMOUNT': 19.0}, {'WHAT': 'Food', 'AMOUNT': 18.0}, {'WHAT': 'Beverage', 'AMOUNT': 18.0}]}, {'NUMBER': 4, 'EXPENSE': [{'WHAT': 'Car',