### Accessing Geospatial Data Using APIs

In [2]:
import pandas as pd
import numpy as np
import json

In [3]:
data = pd.read_json('tv8u-hswn.json')
data.head()

Unnamed: 0,county,fipscode,year,age,malepopulation,femalepopulation,totalpopulation,datatype
0,Boulder,13,1990,0,1597,1630,3227,Estimate
1,Boulder,13,1990,1,1583,1581,3164,Estimate
2,Boulder,13,1990,2,1593,1564,3157,Estimate
3,Boulder,13,1990,3,1631,1530,3161,Estimate
4,Boulder,13,1990,4,1696,1594,3290,Estimate


In [4]:
data.shape

(1000, 8)

In [5]:
data.isnull().sum()

county              0
fipscode            0
year                0
age                 0
malepopulation      0
femalepopulation    0
totalpopulation     0
datatype            0
dtype: int64

In [6]:
data.nunique()

county                1
fipscode              1
year                 11
age                  91
malepopulation      830
femalepopulation    801
totalpopulation     907
datatype              1
dtype: int64

The Python library json is helpful to convert data from lists or dictonaries into JSON strings and JSON strings into lists or dictonaries. Pandas can also be used to convert JSON data (via a Python dictionary) into a Pandas DataFrame.

In this lesson, you will use the json and Pandas libraries to create and convert JSON objects.

In [7]:
# Create and populate the dictionary
dict = {}
dict["name"] = "Chaya"
dict["age"] = 12
dict["city"] = "Boulder"
dict["type"] = "Canine"

dict

{'name': 'Chaya', 'age': 12, 'city': 'Boulder', 'type': 'Canine'}

In [8]:
json_example = json.dumps(dict, ensure_ascii=False)

json_example

'{"name": "Chaya", "age": 12, "city": "Boulder", "type": "Canine"}'

In [9]:
type(json_example)

str

In [10]:
json_sample =  '{ "name":"Chaya", "age":12, "city":"Boulder", "type":"Canine" }'

type(json_sample)

str

In [11]:
# Load JSON into dictionary
data_sample = json.loads(json_sample)
data_sample

{'name': 'Chaya', 'age': 12, 'city': 'Boulder', 'type': 'Canine'}

In [12]:
type(data_sample)

dict

In [13]:
data_sample = json.loads(json_example)
data_sample

{'name': 'Chaya', 'age': 12, 'city': 'Boulder', 'type': 'Canine'}

In [14]:
type(data_sample)

dict

In [15]:
df = pd.DataFrame.from_dict(data_sample, orient='index')
df

Unnamed: 0,0
name,Chaya
age,12
city,Boulder
type,Canine


In [16]:
sample_json = df.to_json(orient='split')

type(sample_json)

str

### Programmatically Accessing Geospatial Data Using APIs

In [17]:
import requests

import urllib
from pandas.io.json import json_normalize
import pandas as pd
import folium
from geopandas import GeoDataFrame
from shapely.geometry import Point

In [18]:
# Get URL
water_base_url = "https://data.colorado.gov/resource/j5pc-4t32.json?"
water_full_url = water_base_url + "station_status=Active" + "&county=BOULDER"
water_full_url

'https://data.colorado.gov/resource/j5pc-4t32.json?station_status=Active&county=BOULDER'

In [19]:
data = requests.get(water_full_url)
type(data)

requests.models.Response

In [20]:
type(data.json())

list

Remember that the JSON structure supports hierarchical data and can be NESTED. If you look at the structure of the .json file below, you can see that the location object, is nested with three sub objects:

latitude

longitude

needs_recoding

Since data.json() is a list you can print out just the first few items of the list to look at your data as a sanity check.

In [21]:
data.json()[:1]

[{'station_name': 'PALMERTON DITCH',
  'div': '1',
  'location': {'latitude': '40.212505',
   'needs_recoding': False,
   'longitude': '-105.251826'},
  'dwr_abbrev': 'PALDITCO',
  'data_source': 'Cooperative Program of CDWR, NCWCD & SVLHWCD',
  'amount': '1.08',
  'station_type': 'Diversion',
  'wd': '5',
  'http_linkage': {'url': 'https://dwr.state.co.us/Tools/Stations/PALDITCO'},
  'date_time': '2020-10-22T10:00:00.000',
  'county': 'BOULDER',
  'variable': 'DISCHRG',
  'stage': '0.12',
  'station_status': 'Active'}]

### Convert JSON to Pandas DataFrame

In [22]:
from pandas.io.json import json_normalize

In [23]:
result = pd.json_normalize(data.json())

In [24]:
result.head()

Unnamed: 0,station_name,div,dwr_abbrev,data_source,amount,station_type,wd,date_time,county,variable,stage,station_status,location.latitude,location.needs_recoding,location.longitude,http_linkage.url,usgs_station_id,flag
0,PALMERTON DITCH,1,PALDITCO,"Cooperative Program of CDWR, NCWCD & SVLHWCD",1.08,Diversion,5,2020-10-22T10:00:00.000,BOULDER,DISCHRG,0.12,Active,40.212505,False,-105.251826,https://dwr.state.co.us/Tools/Stations/PALDITCO,,
1,"LEFT HAND CREEK AT HOVER ROAD NEAR LONGMONT, CO",1,LEFTHOCO,U.S. Geological Survey,4.18,Stream,5,2020-10-22T10:10:00.000,BOULDER,DISCHRG,,Active,40.134278,False,-105.130819,https://dwr.state.co.us/Tools/Stations/LEFTHOCO,6724970.0,
2,CLOUGH AND TRUE DITCH,1,CLODITCO,"Cooperative Program of CDWR, NCWCD & SVLHWCD",0.0,Diversion,5,2020-10-22T10:00:00.000,BOULDER,DISCHRG,0.0,Active,40.193758,False,-105.21039,https://dwr.state.co.us/Tools/Stations/CLODITCO,,
3,MIDDLE SAINT VRAIN AT PEACEFUL VALLEY,1,MIDSTECO,Co. Division of Water Resources,3.61,Stream,5,2020-10-22T10:00:00.000,BOULDER,DISCHRG,2.28,Active,40.129806,False,-105.517111,https://dwr.state.co.us/Tools/Stations/MIDSTECO,,
4,"MIDDLE BOULDER CREEK AT NEDERLAND, CO.",1,BOCMIDCO,Co. Division of Water Resources,4.3,Stream,6,2020-10-22T10:15:00.000,BOULDER,DISCHRG,0.39,Active,39.961655,False,-105.50444,https://dwr.state.co.us/Tools/Stations/BOCMIDCO,6725500.0,


In [25]:
import folium

In [26]:
result['location.longitude'] = result['location.longitude'].astype(float)
result['location.latitude'] = result['location.latitude'].astype(float)

#### You will use the folium package to visualize the data. One approach you could take would be to convert your Pandas DataFrame to a Geopandas DataFrame for easy mapping

In [27]:
geometry = [Point(xy) for xy in zip(result['location.longitude'], result['location.latitude'])]
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(result, crs=crs, geometry=geometry)

  return _prepare_from_string(" ".join(pjargs))


In [28]:
gdf.head()

Unnamed: 0,station_name,div,dwr_abbrev,data_source,amount,station_type,wd,date_time,county,variable,stage,station_status,location.latitude,location.needs_recoding,location.longitude,http_linkage.url,usgs_station_id,flag,geometry
0,PALMERTON DITCH,1,PALDITCO,"Cooperative Program of CDWR, NCWCD & SVLHWCD",1.08,Diversion,5,2020-10-22T10:00:00.000,BOULDER,DISCHRG,0.12,Active,40.212505,False,-105.251826,https://dwr.state.co.us/Tools/Stations/PALDITCO,,,POINT (-105.25183 40.21251)
1,"LEFT HAND CREEK AT HOVER ROAD NEAR LONGMONT, CO",1,LEFTHOCO,U.S. Geological Survey,4.18,Stream,5,2020-10-22T10:10:00.000,BOULDER,DISCHRG,,Active,40.134278,False,-105.130819,https://dwr.state.co.us/Tools/Stations/LEFTHOCO,6724970.0,,POINT (-105.13082 40.13428)
2,CLOUGH AND TRUE DITCH,1,CLODITCO,"Cooperative Program of CDWR, NCWCD & SVLHWCD",0.0,Diversion,5,2020-10-22T10:00:00.000,BOULDER,DISCHRG,0.0,Active,40.193758,False,-105.21039,https://dwr.state.co.us/Tools/Stations/CLODITCO,,,POINT (-105.21039 40.19376)
3,MIDDLE SAINT VRAIN AT PEACEFUL VALLEY,1,MIDSTECO,Co. Division of Water Resources,3.61,Stream,5,2020-10-22T10:00:00.000,BOULDER,DISCHRG,2.28,Active,40.129806,False,-105.517111,https://dwr.state.co.us/Tools/Stations/MIDSTECO,,,POINT (-105.51711 40.12981)
4,"MIDDLE BOULDER CREEK AT NEDERLAND, CO.",1,BOCMIDCO,Co. Division of Water Resources,4.3,Stream,6,2020-10-22T10:15:00.000,BOULDER,DISCHRG,0.39,Active,39.961655,False,-105.50444,https://dwr.state.co.us/Tools/Stations/BOCMIDCO,6725500.0,,POINT (-105.50444 39.96166)


#### Then, you can plot the data using the folium functions GeoJson() and add_to() to add the data from the Geopandas DataFrame to the map object.

In [29]:
m = folium.Map([40.01, -105.27], zoom_start= 10, tiles='cartodbpositron')
folium.GeoJson(gdf).add_to(m)

<folium.features.GeoJson at 0x17a64d5d400>

In [30]:
m

#### Great! You now have an interactive map in your notebook!

You can also cluster the markers, and add a popup to each marker, so you can give your viewers more information about station: such as its name and the amount of precipitation measured.

For this example below, you will work with the Pandas DataFrame you originally created from the JSON, instead of the Geopandas GeoDataFrame

In [31]:
# Get the latitude and longitude from result as a list
locations = result[['location.latitude', 'location.longitude']]
coords = locations.values.tolist()
print(coords)

[[40.212505, -105.251826], [40.134278, -105.130819], [40.193758, -105.21039], [40.129806, -105.517111], [39.961655, -105.50444], [39.938324, -105.347953], [40.01398, -105.295737], [40.170998, -105.160876], [40.153341, -105.075695], [40.153363, -105.088695], [39.988481, -105.220477], [39.955864, -105.238049], [40.219387, -105.368966], [39.947704, -105.357308], [40.193019, -105.210388], [40.212658, -105.251826], [40.258367, -105.174957], [40.051652, -105.178875], [40.199321, -105.222639], [40.196422, -105.206592], [39.997437, -105.214424], [39.931659, -105.422985], [40.258038, -105.206386], [40.211389, -105.250952], [39.990643, -105.214555], [40.220381, -105.267193], [40.126407, -105.30451], [40.09603, -105.091059], [40.218335, -105.25811], [40.086278, -105.217519], [40.03628, -105.203176], [40.211083, -105.250927], [40.172925, -105.167622], [40.018666, -105.213178], [40.174844, -105.167873], [40.260827, -105.198567], [40.187524, -105.189132], [40.006534, -105.330525], [39.967726, -105.2

In [32]:
from folium.plugins import MarkerCluster

m = folium.Map([40.01, -105.27], zoom_start= 10, tiles='cartodbpositron')

marker_cluster = MarkerCluster().add_to(m)

for point in range(0, len(coords)):
    folium.Marker(location = coords[point], popup= 'Name: ' + result['station_name'][point] + ' ' + 'Precip: ' + str(result['amount'][point])).add_to(marker_cluster)

m