### Data.gov API

The data.gov catalog is powered by CKAN, a powerful open source data platform that includes a robust API. Please be aware that data.gov and the data.gov CKAN API only contain metadata about datasets. This metadata includes URLs and descriptions of datasets, but it does not include the actual data within each dataset.

The base URL for the Data.gov CKAN API is:

In [0]:
import pandas as pd
import numpy as np
import json
import requests

api_url = 'http://catalog.data.gov/api/3/'
api_key = 'xxxxxxxxxxxxxxxxxxxxxxxxxxx'

Post a JSON dictionary to a URL is using the command-line client Curl

In [0]:
%sh
curl https://demo.ckan.org/api/3/action/group_list

In [0]:
response = urllib.request.urlopen('http://catalog.data.gov/api/3/action/package_search?rows=1000&start=0')
assert response.code == 200

response_dict = json.loads(response.read())

assert response_dict['success'] is True
len(response_dict['result']['results'])


## Air Quality Data

Example using Air quality Data: 
* Main Link: https://catalog.data.gov/dataset/air-quality-ef520
* Data: https://catalog.data.gov/dataset/air-quality-ef520/resource/b6171084-0efa-4c61-b76f-302012be7e05?inner_span=True

In [0]:
import requests

air_quality_url = "https://data.cityofnewyork.us/api/views/c3uy-2p5r/rows.json"
json_data = requests.get(air_quality_url).json()
json_data.keys()

The `json_data` has 2 components: `meta` and `data`. 
* `meta`: Comprises of the metadata information as well as the column names
* `data`: Comprises of the acutal row wise data. (Also contains data on the meta_columns for each row)

### Meta data part

Let us first have a look at the meta data of the json file

In [0]:
# original json data
json_data

In [0]:
# Inspect the meta data to view the keys
json_data['meta']['view'].keys()

In [0]:
# Extract the meta data information in table format
meta_df = pd.json_normalize(json_data['meta']['view']['columns'])
meta_df

Unnamed: 0,id,name,dataTypeName,fieldName,position,renderTypeName,flags,tableColumnId,width,cachedContents.non_null,cachedContents.average,cachedContents.largest,cachedContents.null,cachedContents.top,cachedContents.smallest,cachedContents.sum
0,-1,sid,meta_data,:sid,0,meta_data,[hidden],,,,,,,,,
1,-1,id,meta_data,:id,0,meta_data,[hidden],,,,,,,,,
2,-1,position,meta_data,:position,0,meta_data,[hidden],,,,,,,,,
3,-1,created_at,meta_data,:created_at,0,meta_data,[hidden],,,,,,,,,
4,-1,created_meta,meta_data,:created_meta,0,meta_data,[hidden],,,,,,,,,
5,-1,updated_at,meta_data,:updated_at,0,meta_data,[hidden],,,,,,,,,
6,-1,updated_meta,meta_data,:updated_meta,0,meta_data,[hidden],,,,,,,,,
7,-1,meta,meta_data,:meta,0,meta_data,[hidden],,,,,,,,,
8,172572046,indicator_data_id,number,indicator_data_id,2,number,,22543549.0,304.0,2769.0,137315.4734561213,154618,0.0,"[{'item': '151739', 'count': 20}, {'item': '15...",130355,380226546.0
9,172572047,indicator_id,number,indicator_id,3,number,,22543550.0,244.0,2769.0,651.9566630552546,667,0.0,"[{'item': '646', 'count': 20}, {'item': '647',...",639,1805268.0


Based on the data from the table about `meta_df` we can say that the column `dataTypeName` has 2 categories: `meta_data` and other values such as `number`, `text`. 

From this, we can conclude that our column headers for the main data table are all values in column `name` of dataframe `meta_df` where `dataTypeName != 'meta_data'`

In [0]:
#apply filter to the meta_df
# column_header = meta_df[meta_df['dataTypeName']!='meta_data']
#store col names as series
cols = meta_df['fieldName'] 
cols

### Data part

View the `data` component of the json file:

In [0]:
json_data['data']

In [0]:
# convert to pandas dataframe 
df=pd.json_normalize(json_data,'data')
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,row-pk7c.43bs-gwc3,00000000-0000-0000-4116-48B9A054B355,0,1425754639,,1425754639,,{ },130728,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,1,Bronx,2005,2.8
1,row-4spa~cx9c_3axr,00000000-0000-0000-E19B-62A95D150681,0,1425754639,,1425754639,,{ },130729,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,2,Brooklyn,2005,2.8
2,row-jvza_tbaa~aari,00000000-0000-0000-8B75-66E9EFB794DA,0,1425754639,,1425754639,,{ },130730,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,3,Manhattan,2005,4.7
3,row-5922~r3w9_jizq,00000000-0000-0000-6BD5-1418787D7468,0,1425754639,,1425754639,,{ },130731,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,4,Queens,2005,1.9
4,row-urui~22k2_9zvk,00000000-0000-0000-C0C0-11FB849A8B47,0,1425754639,,1425754639,,{ },130732,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,5,Staten Island,2005,1.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2764,row-4ffr~36ff-cv6c,00000000-0000-0000-59FE-CBDACE9C5981,0,1425754639,,1425754639,,{ },151756,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,410,Rockaways,2005,0.2
2765,row-wkui.sni7~q6ce,00000000-0000-0000-5171-77D873E6500F,0,1425754639,,1425754639,,{ },151757,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,501,Port Richmond,2005,0.3
2766,row-mwku~jt36.ku8i,00000000-0000-0000-7C53-E75F3A9B655D,0,1425754639,,1425754639,,{ },151758,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,502,Stapleton - St. George,2005,0.8
2767,row-hntu-j42q-y65s,00000000-0000-0000-0C9F-DFFCA50AE6EE,0,1425754639,,1425754639,,{ },151759,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,503,Willowbrook,2005,0.8


In [0]:
#Replace column names with the header column derived earlier
df.columns = cols
df

fieldName,:sid,:id,:position,:created_at,:created_meta,:updated_at,:updated_meta,:meta,indicator_data_id,indicator_id,name,measure,geo_type_name,geo_entity_id,geo_entity_name,year_description,data_valuemessage
0,row-pk7c.43bs-gwc3,00000000-0000-0000-4116-48B9A054B355,0,1425754639,,1425754639,,{ },130728,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,1,Bronx,2005,2.8
1,row-4spa~cx9c_3axr,00000000-0000-0000-E19B-62A95D150681,0,1425754639,,1425754639,,{ },130729,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,2,Brooklyn,2005,2.8
2,row-jvza_tbaa~aari,00000000-0000-0000-8B75-66E9EFB794DA,0,1425754639,,1425754639,,{ },130730,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,3,Manhattan,2005,4.7
3,row-5922~r3w9_jizq,00000000-0000-0000-6BD5-1418787D7468,0,1425754639,,1425754639,,{ },130731,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,4,Queens,2005,1.9
4,row-urui~22k2_9zvk,00000000-0000-0000-C0C0-11FB849A8B47,0,1425754639,,1425754639,,{ },130732,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,5,Staten Island,2005,1.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2764,row-4ffr~36ff-cv6c,00000000-0000-0000-59FE-CBDACE9C5981,0,1425754639,,1425754639,,{ },151756,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,410,Rockaways,2005,0.2
2765,row-wkui.sni7~q6ce,00000000-0000-0000-5171-77D873E6500F,0,1425754639,,1425754639,,{ },151757,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,501,Port Richmond,2005,0.3
2766,row-mwku~jt36.ku8i,00000000-0000-0000-7C53-E75F3A9B655D,0,1425754639,,1425754639,,{ },151758,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,502,Stapleton - St. George,2005,0.8
2767,row-hntu-j42q-y65s,00000000-0000-0000-0C9F-DFFCA50AE6EE,0,1425754639,,1425754639,,{ },151759,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,503,Willowbrook,2005,0.8


Drop the columns with meta_data which are not a part of the original data:

In [0]:
#drop the first few columns which represent meta_data
df = df.drop([':sid',':id',':position',':created_at',':created_meta',':updated_at',':updated_meta',':meta'],axis=1)
df

fieldName,indicator_data_id,indicator_id,name,measure,geo_type_name,geo_entity_id,geo_entity_name,year_description,data_valuemessage
0,130728,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,1,Bronx,2005,2.8
1,130729,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,2,Brooklyn,2005,2.8
2,130730,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,3,Manhattan,2005,4.7
3,130731,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,4,Queens,2005,1.9
4,130732,646,Air Toxics Concentrations- Average Benzene Con...,Average Concentration,Borough,5,Staten Island,2005,1.6
...,...,...,...,...,...,...,...,...,...
2764,151756,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,410,Rockaways,2005,0.2
2765,151757,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,501,Port Richmond,2005,0.3
2766,151758,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,502,Stapleton - St. George,2005,0.8
2767,151759,645,Traffic Density- Annual Vehicle Miles Traveled...,Per 100 km2,UHF42,503,Willowbrook,2005,0.8
