# JSON and XML - Lab

## Introduction

In this lab, you'll practice navigating JSON and XML data structures.

## Objectives
You will be able to:
* Effectively use the JSON module to load and parse JSON documents
* Read and access data stored in JSON and XML
* Compare  and contrast the JSON and XML as data interchange types


## XML

In [2]:
import xml.etree.ElementTree as ET

In [3]:
ls

[0m[01;32m2001_Campaign_Contributions.csv[0m*  [01;32mLICENSE.md[0m*                      README.md
[01;32mCONTRIBUTING.md[0m*                  [01;32mnyc_2001_campaign_finance.json[0m*
index.ipynb                       nyc_2001_campaign_finance.xml


### Create an XML tree and retrieve the root tag.

In [4]:
tree = ET.parse('nyc_2001_campaign_finance.xml')
root = tree.getroot()

### How many direct descendents does the root tag have?

In [5]:
for child in root:
    print(child.tag, child.attrib)

row {}


### How many different types of tags are there within the entire XML file?

In [6]:
#Count is added here to limit the number of results
count = 0
for child in root:
    print('Child:\n')
    print(child.tag, child.attrib)
    print('Grandchildren:')
    for grandchild in child:
        count += 1
        if count < 10:
            print(grandchild.tag, grandchild.attrib)
    print('\n\n')
print(count)

Child:

row {}
Grandchildren:
row {'_id': '1', '_uuid': 'E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1', '_position': '1', '_address': 'https://data.cityofnewyork.us/resource/_8dhd-zvi6/1'}
row {'_id': '2', '_uuid': '9D257416-581A-4C42-85CC-B6EAD9DED97F', '_position': '2', '_address': 'https://data.cityofnewyork.us/resource/_8dhd-zvi6/2'}
row {'_id': '3', '_uuid': 'B80D7891-93CF-49E8-86E8-182B618E68F2', '_position': '3', '_address': 'https://data.cityofnewyork.us/resource/_8dhd-zvi6/3'}
row {'_id': '4', '_uuid': 'BB012003-78F5-406D-8A87-7FF8A425EE3F', '_position': '4', '_address': 'https://data.cityofnewyork.us/resource/_8dhd-zvi6/4'}
row {'_id': '5', '_uuid': '945825F9-2F5D-47C2-A16B-75B93E61E1AD', '_position': '5', '_address': 'https://data.cityofnewyork.us/resource/_8dhd-zvi6/5'}
row {'_id': '6', '_uuid': '9546F502-39D6-4340-B37E-60682EB22274', '_position': '6', '_address': 'https://data.cityofnewyork.us/resource/_8dhd-zvi6/6'}
row {'_id': '7', '_uuid': '4B6C74AD-17A0-4B7E-973A-2592D68A687D'

### Create a DataFrame listing the number of each type of tag. 
Sort the DataFrame in descending order by the tag count. The first entry should demonstrate there are 286 row tags in the XML file.   
(Your DataFrame will be a single column, so could also be thought of as a Series.)

In [None]:
import pandas as pd

In [None]:
#Your code here

## JSON

### Open the same dataset from json

In [8]:
import json
f = open('nyc_2001_campaign_finance.json')
data = json.load(f)
print(type(data))

<class 'dict'>


### What is the root data type of the json file?

In [9]:
type(data)

dict

### Navigate to the 'data' key of your loaded json object. What data type is this?

In [23]:
type(data['data'])


list

### Preview the first entry from the value returned by the 'data' key above.

In [24]:
d=data['data']
d[0]

[1,
 'E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1',
 1,
 1315925633,
 '392904',
 1315925633,
 '392904',
 '{\n  "invalidCells" : {\n    "1519001" : "TOTALPAY",\n    "1518998" : "PRIMARYPAY",\n    "1519000" : "RUNOFFPAY",\n    "1518999" : "GENERALPAY",\n    "1518994" : "OFFICECD",\n    "1518996" : "OFFICEDIST",\n    "1518991" : "ELECTION"\n  }\n}',
 None,
 'CANDID',
 'CANDNAME',
 None,
 'OFFICEBORO',
 None,
 'CANCLASS',
 None,
 None,
 None,
 None]

### Preview the Entry under meta -> view -> columns (the keys of three successively nested dictionaries)

In [32]:
data['meta']['view'].

dict_keys(['id', 'name', 'attribution', 'averageRating', 'category', 'createdAt', 'description', 'displayType', 'downloadCount', 'hideFromCatalog', 'hideFromDataJson', 'indexUpdatedAt', 'newBackend', 'numberOfComments', 'oid', 'provenance', 'publicationAppendEnabled', 'publicationDate', 'publicationGroup', 'publicationStage', 'rowClass', 'rowsUpdatedAt', 'rowsUpdatedBy', 'tableId', 'totalTimesRated', 'viewCount', 'viewLastModified', 'viewType', 'columns', 'grants', 'metadata', 'owner', 'query', 'rights', 'tableAuthor', 'tags', 'flags'])

### Create a DataFrame from your json data
The previous two questions previewed one entry from the data object within the json file, as well as the column details associated with that data from the meta entry within the json file. Both should have 19 entries. Create a DataFrame of the data. Be sure to use the information from the meta entry to add appropriate column names to your DataFrame.

In [35]:
import pandas as pd 
df=pd.DataFrame(data['data'])
df.head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
0,1,E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1,1,1315925633,392904,1315925633,392904,"{\n ""invalidCells"" : {\n ""1519001"" : ""TOTA...",,CANDID,CANDNAME,,OFFICEBORO,,CANCLASS,,,,
1,2,9D257416-581A-4C42-85CC-B6EAD9DED97F,2,1315925633,392904,1315925633,392904,{\n},2001.0,B4,"Aboulafia, Sandy",5.0,,44.0,P,45410.0,0.0,0.0,45410.0
2,3,B80D7891-93CF-49E8-86E8-182B618E68F2,3,1315925633,392904,1315925633,392904,{\n},2001.0,445,"Adams, Jackie R",5.0,,7.0,P,11073.0,0.0,0.0,11073.0
3,4,BB012003-78F5-406D-8A87-7FF8A425EE3F,4,1315925633,392904,1315925633,392904,{\n},2001.0,HF,"Addabbo, Joseph P",5.0,,32.0,P,75350.0,73970.0,0.0,149320.0
4,5,945825F9-2F5D-47C2-A16B-75B93E61E1AD,5,1315925633,392904,1315925633,392904,{\n},2001.0,IR,"Alamo-Estrada, Agustin",5.0,,14.0,P,25000.0,2400.0,0.0,27400.0


### What's wrong with the first row of the DataFrame?

In [36]:
df.columns=df[0]

ValueError: Length mismatch: Expected axis has 19 elements, new values have 285 elements

In [37]:
df[0]

0        1
1        2
2        3
3        4
4        5
5        6
6        7
7        8
8        9
9       10
10      11
11      12
12      13
13      14
14      15
15      16
16      17
17      18
18      19
19      20
20      21
21      22
22      23
23      24
24      25
25      26
26      27
27      28
28      29
29      30
      ... 
255    256
256    257
257    258
258    259
259    260
260    261
261    262
262    263
263    264
264    265
265    266
266    267
267    268
268    269
269    270
270    271
271    272
272    273
273    274
274    275
275    276
276    277
277    278
278    279
279    280
280    281
281    282
282    283
283    284
284    285
Name: 0, Length: 285, dtype: int64

#Your answer here

## Summary

Congratulations! You've started exploring some more complicated data structures used for the web and got to practice data munging and exploring!