# Data, Metadata and APIs

## Part 5: The Google Maps API and Open Data

Now that you've extracted GPS coordinates from JPEG metadata and mapped it using the Google Maps API, you might be wondering what else you can do with the Google Maps API. The short answer is... a lot. 

In this notebook, you'll see how to combine your knowledge of the Google Maps API with your knowledge of data analysis with Pandas.

### Find an Open Data Set that contains Location Data

Here's a data set that tracks the location of all potholes filled by the City of Chicago for the past 7 days. Chicago is [known for its potholes](https://www.wbez.org/shows/curious-city/city-of-big-potholes-is-asphalt-the-best-choice-for-chicagos-streets/8bbd9e7a-b27e-4e00-a868-aa0b826b53b2), so this should be good. 

We will load this _.csv_ file in from a URL so that it is guaranteed to be the most up-to-date as possible:

In [1]:
# Note: the spike in traffic from Fremd may get us IP-banned by Chicago's Open Data portal.
#       If this happens, your teacher will share a static copy of Potholes_Patched.csv,
#       and you'll need to run the code "potholes_DF = pd.read_csv('Potholes_Patched.csv')"

import pandas as pd

potholes_DF = pd.read_csv("Potholes Patched.csv")

# display the 3 most recent potholes that were filled
potholes_DF[-3:]

Unnamed: 0,ADDRESS,REQUEST DATE,COMPLETION DATE,NUMBER OF POTHOLES FILLED ON BLOCK,LATITUDE,LONGITUDE,LOCATION,Boundaries - ZIP Codes,Community Areas,Zip Codes,Census Tracts,Wards
23017,2040 S MARSHALL BLVD,06/28/2018 08:32:35 AM,07/02/2018 08:40:31 AM,20,,,,,,,,
23018,2814 S SPRINGFIELD AVE,07/02/2018 08:38:05 AM,07/02/2018 08:38:43 AM,20,,,,,,,,
23019,801 E BOWEN AVE,07/02/2018 07:25:56 AM,07/02/2018 07:27:23 AM,32,,,,,,,,


Check how many potholes were filled in the last week:

In [2]:
print(len(potholes_DF))

23020


That's a lot of potholes. Now extract the location data, clean out the "nan" values, and store it as a list of tuples:

In [3]:
import numpy as np

lat = list(potholes_DF["LATITUDE"])

lon = list(potholes_DF["LONGITUDE"])

tuple_list = []

for i in range(len(lat)):
    coord = (lat[i],lon[i])
    tuple_list.append(coord)

tuple_list = [x for x in tuple_list if not np.isnan(x[1])]

Let's compare the length of *potholes_DF* to *tuple_list* to see how many "nan" values we cleaned out:

In [4]:
print(len(potholes_DF),len(tuple_list))

23020 14588


Depending on the week, there may be a handful of "nan" values to clean out. If you were lucky, there were none.

Now let's look at a few of the tuples in the list:

In [5]:
tuple_list[-10:]

[(41.735475832, -87.65759074700001),
 (41.964429079, -87.75854644399999),
 (41.90929414, -87.78015311),
 (41.853323449, -87.67395806200001),
 (41.93832631, -87.68834259200001),
 (41.985650885, -87.71972255),
 (41.700157041, -87.718569641),
 (41.898601561999996, -87.774453568),
 (41.798201115, -87.755403753),
 (41.909428263, -87.788030125)]

### Google Maps API with Markers

Let's put a marker every place we found a pothole.
#### WARNING: Adding more than 500 marker points could potentially crash your kernel!  To combat this, we are creating a list of 500 random entries from the original tuple_list.

In [6]:
import numpy as np

tuple_list_500 = []
indicies_used = []
for i in range(500):                                # Loop 500 times
    random = np.random.randint(0,500)               # Generate random index number
    if random not in indicies_used:                 # Check if number has already been generated
        indicies_used.append(random)                # Add new number to list of used numbers
        tuple_list_500.append(tuple_list[random])   # Add the tuple from that index to the new list of 500
print(tuple_list_500[:10])

[(41.934794000000004, -87.78668), (41.916293074, -87.79461876799999), (41.937552000000004, -87.67326899999999), (41.878915, -87.67490500000001), (41.684441, -87.69794499999999), (42.014132000000004, -87.663066), (41.919458999, -87.760654628), (41.937657, -87.673976), (42.013315000000006, -87.819048), (41.956909, -87.708964)]


In [7]:
# Import the gmaps python module and load in your API Key:
import gmaps
gmaps.configure(api_key="AIzaSyCLla6Q7krE9xNg6SnNMoGNIzjCLddE9EU")

In [8]:
from ipywidgets.embed import embed_minimal_html # Allows us to create a separte file for the Google Maps

markers = gmaps.marker_layer(tuple_list_500)    # Create markers for each tuple/coordinate
markermap = gmaps.Map()                         # Create a GMap variable
markermap.add_layer(markers)                    # Add the layer of markers to GMap

embed_minimal_html('MarkerMap1.html', views=[markermap])
print("*** Check your 'Metadata Part 5' folder to find the new HTML file. ***")

*** Check your 'Metadata Part 5' folder to find the new HTML file. ***


**Question 1:** Look at the marker map at various zoom levels. What do you notice above the graph? Comment on anything interesting you see and try to summarize "the good" and "the bad" in this visualization.

Your Answer: It is entirely possible that 2 people reported the same pothole, meaning that the graph represents each complaint of a pothole rather than each possible. Also, areas with a high population are more likely to have someone report a pothole, which means that areas with high populations are more likely to have someone come out and fix the pothole. This raises the moral question if areas with higher populations are more deserving of having their potholes fixed than areas with lowers populations. Some of the areas in the graph aren't neccessarily in Chicago but in the Chicago suburbs, meaning that these complaints are useless because they're outside of the Chicago administration.

### Google Maps API to Create a Heatmap

Instead of markers, let's make a heat map:
#### WARNING: Adding more than 500 marker points could potentially crash your kernel!  To combat this, we are again using the list of 500 random entries from the original tuple_list.

In [9]:
from ipywidgets.embed import embed_minimal_html # Allows us to create a separte file for the Google Maps

heatm = gmaps.Map()
heatm.add_layer(gmaps.heatmap_layer(tuple_list_500))

embed_minimal_html('MarkerMap2.html', views=[heatm])
print("*** Check your 'Metadata Part 5' folder to find the new HTML file. ***")

*** Check your 'Metadata Part 5' folder to find the new HTML file. ***


**Question 2:** Look at the heatmap at various zoom levels. What do you notice above the graph? Comment on anything interesting you see and try to summarize "the good" and "the bad" in this visualization.

Your Answer: This graph is better than the other one because it is better at conveying the density of complaints in certain areas. For example, the heat map allows the user to know that the Lower West Side has a high density of complaints as compared to other areas. The map also has the same issue as the last one, however, that the map conveys the number of complaints rather than the number of actual potholes.

### Task 1: Find your own dataset!

You are going to create a marker map **and** a heatmap from a dataset you have found. For Task 1, find a dataset with location data (GPS coordinates!). Fill in the following:

_Name:_ Locations of Adult Female Polar Bears

_Date:_ 10 April 2019

_Source for Data Set:_ Alaska Science Center

_URL for Data Set:_ https://alaska.usgs.gov/products/data.php?dataid=174

_Description of Data Set:_ This dataset includes the GPS locations of 9 adult female polar bears tracked by the Alaska Science Center. Each bear's location was tracked numerous times so each bear has more than one location to their name.

_File Format for Data Set:_ .csv

_Age of Data Set: The newest data is 3 years old s the data is from 2014-2016.

### Task 2: Show some entries fom your dataset

Import your data set as a Pandas Data Frame, then show the last 10 entries:

In [None]:
# Your code here
# Import pandas module
import pandas as pd

# Read in the csv file (comma separated values)
bears = pd.read_csv("bears.csv")

# Print the last three entries of the file
bears[:-10]

### Task 3: Create a list of tuples

Use your dataset to create a list of tuples (a list of DD coordinates) representing the locations in your dataset:
#### WARNING: Adding more than 500 marker points could potentially crash your kernel!  To combat this, create a list of 500 random entries from the original list of tuples.

In [66]:
# Your code here
import numpy as np

lat = list(operations["Takeoff Latitude"])

lon = list(operations["Takeoff Longitude"])

tuple_list = []

for i in range(len(lat)):
    coord = (lat[i],lon[i])
    tuple_list.append(coord)

tuple_list = [x for x in tuple_list if not np.isnan(x[1])]



type(random)

for i in range(500):
    random = np.random.choice(178282,500)
    tuple_list_500.append(tuple_list[random])

print(tuple_list_500)

len(tuple_list_500)

# tuple_list_500 = []
# indicies_used = []

# for i in range(500):                                                # Loop over entire spreadsheet
    # random = np.random.randint(0,178282)       
    # if random not in indicies_used:                 # Check if number has already been generated
        # indicies_used.append(random)                # Add new number to list of used numbers
        # tuple_list_500.append(tuple_list[random])   # Add the tuple from that index to the new list of 500
# print(tuple_list_500[:10])


TypeError: only integer scalar arrays can be converted to a scalar index

### Task 4: Create a marker map from your data

Use the Google Maps API to create a marker map using your list of tuples from above.

In [12]:
# Your code here


### Task 5: Create a heatmap from your data

Use the Google Maps API to create a **heatmap** using your list of tuples from above.

*Note: The Google Maps API can struggle with heatmaps that have more than 1000 datapoints. If your map is not working, try reducing your list to fewer tuples (try creating a list with just the most recent 100 entries in the dataset). Once this works, you can always add in a few more tuples!*

In [13]:
# Your code here


### Task 6: Comment on what you see

Look at your marker map and your heatmap at various zoom levels. Comment on anything interesting or notable that you see. 

Your Answer: 

### Task 7: Brainstorm further study

If you had more time and resources, what else would you like to explore using the GPS data in this dataset?

Your Answer: 