# Exploring and Transforming JSON Schemas

# Introduction

In this lesson, you'll formalize how to explore a JSON file whose structure and schema is unknown to you. This often happens in practice when you are handed a file or stumble upon one with little documentation.

## Objectives
You will be able to:
* Use the JSON module to load and parse JSON documents
* Load and explore unknown JSON schemas
* Convert JSON to a pandas dataframe

## Loading the JSON file

Load the data from the file `disease_data.json`.

In [2]:
#Your code here 
import json
import pandas as pd
with open('disease_data.json') as f:
    data = json.load(f)

## Explore the first and second levels of the schema hierarchy

In [4]:
#Your code here
print(type(data))

<class 'dict'>


In [5]:
data.keys()

dict_keys(['meta', 'data'])

In [6]:
type(data['meta'])

dict

In [7]:
data['meta'].keys()

dict_keys(['view'])

In [15]:
data['meta']['view'].keys()

dict_keys(['id', 'name', 'attribution', 'attributionLink', 'averageRating', 'category', 'createdAt', 'description', 'displayType', 'downloadCount', 'hideFromCatalog', 'hideFromDataJson', 'indexUpdatedAt', 'licenseId', 'newBackend', 'numberOfComments', 'oid', 'provenance', 'publicationAppendEnabled', 'publicationDate', 'publicationGroup', 'publicationStage', 'rowClass', 'rowsUpdatedAt', 'rowsUpdatedBy', 'tableId', 'totalTimesRated', 'viewCount', 'viewLastModified', 'viewType', 'columns', 'grants', 'license', 'metadata', 'owner', 'query', 'rights', 'tableAuthor', 'tags', 'flags'])

In [20]:
data['meta']['view']['columns'][0].keys()

dict_keys(['id', 'name', 'dataTypeName', 'fieldName', 'position', 'renderTypeName', 'format', 'flags'])

In [8]:
type(data['data'])

list

In [11]:
type(data['data'][0])

list

In [12]:
data['data'][0]

[1,
 'FF49C41F-CE8D-46C4-9164-653B1227CF6F',
 1,
 1527194521,
 '959778',
 1527194521,
 '959778',
 None,
 '2016',
 '2016',
 'US',
 'United States',
 'BRFSS',
 'Alcohol',
 'Binge drinking prevalence among adults aged >= 18 years',
 None,
 '%',
 'Crude Prevalence',
 '16.9',
 '16.9',
 '*',
 '50 States + DC: US Median',
 '16',
 '18',
 'Overall',
 'Overall',
 None,
 None,
 None,
 None,
 [None, None, None, None, None],
 None,
 '59',
 'ALC',
 'ALC2_2',
 'CRDPREV',
 'OVERALL',
 'OVR',
 None,
 None,
 None,
 None]

## Convert to a DataFrame

Create a DataFrame from the JSON file. Be sure to retrive the column names for the dataframe. (Search within the 'meta' key of the master dictionary.) The DataFrame should include all 42 columns.

In [26]:
#Your code here
df = pd.DataFrame(data['data'])
df.columns = [columnData['fieldName'].strip(':') for columnData in data['meta']['view']['columns']]
for columnName in df.columns:
    print(columnName)

sid
id
position
created_at
created_meta
updated_at
updated_meta
meta
yearstart
yearend
locationabbr
locationdesc
datasource
topic
question
response
datavalueunit
datavaluetype
datavalue
datavaluealt
datavaluefootnotesymbol
datavaluefootnote
lowconfidencelimit
highconfidencelimit
stratificationcategory1
stratification1
stratificationcategory2
stratification2
stratificationcategory3
stratification3
geolocation
responseid
locationid
topicid
questionid
datavaluetypeid
stratificationcategoryid1
stratificationid1
stratificationcategoryid2
stratificationid2
stratificationcategoryid3
stratificationid3


## Level-Up
## Create a bar graph of states with the highest asthma rates for adults age 18+

TypeError: unhashable type: 'list'

## Summary

Well done! In this lab you got some extended practice exploring the structure of JSON files, converting json files to pandas DataFrame, and visualizing data!