<a href="https://colab.research.google.com/github/stjohn/datasci/blob/main/openDataWeek/MappingOpenData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Folium: Mapping NYC OpenData

For this section, we will running Python with the [folium](https://python-visualization.github.io/folium/) library to:
* create an HTML map of the publicly available WIFI across NYC,
* a map of 311 calls tallied by NYC OpenData, and
* a subway stations from the GIS coordinates in NYC OpenData.

##Folium

[Folium](https://python-visualization.github.io/folium/) is a Python package that uses the javascript Leaflet.js library to make beautiful interactive maps. Instead of popping up a matplotlib window, folium creates an .html file that you can open (and view interactively) with a browser. After the program runs, open the corresponding html file in a web browser to see your map.


To get started, we need to install and then load folium:

In [1]:
# Install folium package
!pip install folium



In [2]:
# Import the folium module
import folium

For our first program, let's make a simple map:

In [5]:
#Create a map, centered (0,0), and zoomed out a bit:
mapWorld = folium.Map(location=[0, 0],zoom_start=2)

In [6]:
#Display the map:
mapWorld

## Map of New York City

Let's make another map, focused on New York City. To do that, when we set up the map object, we need to reset the location to New York City and the increase the zoom level:

In [16]:
mapNYC = folium.Map(location=[40.75, -74.125], zoom_start=10)

We will add in a marker for Hunter College:

In [17]:
folium.Marker(location = [40.768731, -73.964915], popup = "Hunter College").add_to(mapNYC)

<folium.map.Marker at 0x7f5f5e19ec50>

And display our map:

In [18]:
mapNYC

##Plotting from Files

We can combine the mapping of folium with the tools we have used for CSV files.


Let's make an interactive map of the WiFi locations across the city. We can use wifiLocations.csv which we downloaded from [NYC OpenData](https://data.cityofnewyork.us/City-Government/NYC-Wi-Fi-Hotspot-Locations/yjub-udmw/data) and store in a Pandas DataFrame:



In [39]:
import pandas as pd
url = 'https://raw.githubusercontent.com/stjohn/datasci/main/openDataWeek/wifiLocations.csv'
wifi = pd.read_csv(url)
#Drop any rows with missing values:
wifi = wifi.dropna()

We'll print out the campus locations to make sure that all were read in:


In [40]:
wifi

Unnamed: 0,OBJECTID,Borough,Type,Provider,Name,Location,Latitude,Longitude,X,Y,...,NTAName,CounDist,Postcode,BoroCD,CT2010,BCTCB2010,BIN,BBL,DOITT_ID,"Location (Lat, Long)"
0,998,MN,Free,LinkNYC - Citybridge,mn-05-123662,179 WEST 26 STREET,40.745968,-73.994039,9.859017e+05,211053.130644,...,Midtown-Midtown South,3,10001,105,95,1009500,0,0,1425,"New York\n(40.74596800000, -73.99403900000)"
1,999,MN,Free,LinkNYC - Citybridge,mn-05-123789,25 EAST 29 STREET,40.744614,-73.985069,9.883873e+05,210559.946684,...,Midtown-Midtown South,2,10016,105,74,1007400,1016929,1008590024,1426,"New York\n(40.74461400000, -73.98506900000)"
3,1001,MN,Free,LinkNYC - Citybridge,mn-05-133359,201 WEST 48 STREET,40.759971,-73.984342,9.885878e+05,216155.033448,...,Midtown-Midtown South,4,10036,105,125,1012500,1076195,1010200046,1428,"New York\n(40.75997100000, -73.98434200000)"
4,1002,MN,Free,LinkNYC - Citybridge,mn-05-133361,1600 Broadway,40.760413,-73.984541,9.885327e+05,216316.036881,...,Midtown-Midtown South,4,10019,105,125,1012500,1087187,1010207502,1429,"New York\n(40.76041300000, -73.98454100000)"
5,1003,MN,Free,LinkNYC - Citybridge,mn-05-133505,1668 Broadway,40.762593,-73.983077,9.889380e+05,217110.488540,...,Midtown-Midtown South,4,10019,105,131,1013100,1024818,1010230029,1430,"New York\n(40.76259300000, -73.98307700000)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2989,3164,MN,Free,LinkNYC - Citybridge,mn-03-123802,237 1 AVENUE,40.731219,-73.982827,9.890095e+05,205679.853521,...,East Village,2,10003,103,40,1004000,1006522,1004550033,2850,"New York\n(40.73121904000, -73.98282700000)"
2993,3168,MN,Free,LinkNYC - Citybridge,mn-04-108085,100 WEST 26 STREET,40.744847,-73.991677,9.865562e+05,210644.607558,...,Hudson Yards-Chelsea-Flatiron-Union Square,3,10001,104,91,1009100,1085978,1008010034,2854,"New York\n(40.74484704000, -73.99167705000)"
2995,3170,QU,Free,LinkNYC - Citybridge,qu-01-108362,29-24 30 AVENUE,40.767433,-73.922687,1.005666e+06,218882.596491,...,Old Astoria,22,11102,401,73,4007300,4007621,4005920025,2856,"Queens\n(40.76743258200, -73.92268747930)"
3002,3177,MN,Free,LinkNYC - Citybridge,mn-06-133709,522 2 AVENUE,40.741711,-73.977973,9.903539e+05,209502.611406,...,Murray Hill-Kips Bay,2,10016,106,70,1007000,1020614,1009350001,2863,"New York\n(40.74171074000, -73.97797283000)"


Note: we saved our CSV file to 'wifiLocations.csv'. If you saved it to a different name, change the input parameters for read_csv() to the name of your file.

Next, let's set up a map, centered on Hunter College, using a different set of tiles, or background map:

In [41]:
# The default tiles are set to OpenStreetMap, but Stamen Terrain, Stamen Toner, Mapbox Bright, 
# and Mapbox Control Room, and many others tiles are built in.
mapWIFI = folium.Map(location=[40.768731, -73.964915],tiles='Stamen Toner',
    zoom_start=13)

Our map with Stamen Toner tiles:

In [42]:
mapWIFI

###Challenge:
* Try some of the other tiles option, such as 'Stamen Watercolor' and 'Stamen Terrain' to change your map presentation.

Now that we have our map, let's add in the locations of WIFI from our DataFrame.

We're going to make a list of latitude, longitude, and names of the WIFI locations and for each one, add a marker to our map:

In [43]:
for lat,lon,name in zip(wifi['Latitude'],wifi['Longitude'],wifi['Location']):
  #Make a marker & include name as a popup:
  newMarker = folium.Marker(location=[lat,lon],popup=name)
  #Add the marker to the map:
  newMarker.add_to(mapWIFI)

The code above takes triples of the latitude, longitude, and its name, creates a marker, and then adds that marker to the map.  It repeats for each row, until we have markers for every WIFI location in our DataFrame.  









Lastly, let's view our map with the markers for WIFI:

In [44]:
mapWIFI

##311 Data

Many of the datasets in NYC OpenData have similar structure. For example, the 311 calls recorded by the city also store the GIS location as "Latitude" and "Longitude". So, our program above can be used, with minimal modification, to make a map of 311 calls.

We have downloaded a small subset of the 311 calls in test311.csv (200 calls from March 5, 2018).


In [46]:
url = 'https://raw.githubusercontent.com/stjohn/datasci/main/openDataWeek/test311.csv'
df311 = pd.read_csv(url)

df311

Unnamed: 0,Unique Key,Created Date,Closed Date,Agency,Agency Name,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,...,Bridge Highway Name,Bridge Highway Direction,Road Ramp,Bridge Highway Segment,Garage Lot Name,Ferry Direction,Ferry Terminal Name,Latitude,Longitude,Location
0,38609671,03/05/2018 12:00:27 AM,03/05/2018 12:36:45 AM,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,10039.0,287 WEST 147 STREET,...,,,,,,,,40.823865,-73.940314,"(40.82386486541371, -73.94031412064453)"
1,38606248,03/05/2018 12:00:46 AM,,HPD,Department of Housing Preservation and Develop...,HEAT/HOT WATER,ENTIRE BUILDING,RESIDENTIAL BUILDING,10458.0,2566 BAINBRIDGE AVENUE,...,,,,,,,,40.863162,-73.892889,"(40.86316224947803, -73.89288947475694)"
2,38605322,03/05/2018 12:00:48 AM,,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Store/Commercial,11225.0,568 ROGERS AVENUE,...,,,,,,,,40.657970,-73.953261,"(40.65796987599603, -73.95326099424778)"
3,38609871,03/05/2018 12:00:49 AM,,HPD,Department of Housing Preservation and Develop...,PLUMBING,STEAM PIPE/RISER,RESIDENTIAL BUILDING,10467.0,3555 OLINVILLE AVENUE,...,,,,,,,,40.879074,-73.867634,"(40.87907449581643, -73.86763383020705)"
4,38605192,03/05/2018 12:00:52 AM,03/05/2018 12:56:59 AM,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11222.0,5 BLUE SLIP,...,,,,,,,,40.735583,-73.959767,"(40.73558338244384, -73.95976658433176)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
207,38607586,03/05/2018 02:03:51 AM,,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10009.0,81 AVENUE A,...,,,,,,,,40.725066,-73.984270,"(40.72506594449667, -73.98426995490264)"
208,38608147,03/05/2018 02:05:04 AM,,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,10075.0,448 EAST 78 STREET,...,,,,,,,,40.771183,-73.952080,"(40.771182550177485, -73.9520803950299)"
209,38606711,03/05/2018 02:09:44 AM,,NYPD,New York City Police Department,Illegal Parking,Blocked Hydrant,Street/Sidewalk,11210.0,531 EAST 23 STREET,...,,,,,,,,40.636957,-73.955560,"(40.636956720790096, -73.9555600427676)"
210,38608336,03/05/2018 02:11:51 AM,,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11226.0,1236 ROGERS AVENUE,...,,,,,,,,40.639530,-73.951289,"(40.639529600748716, -73.95128855589117)"


###Challenges

* Make an HTML map of the 311 calls, in df311, replacing the name that pop-ups 'Location' with 'Complaint Type'.
* Using the pandas' commands from the previous section, what is the most common 311 call for the test data?
* Download 311 data for your favorite neighborhood (filter on 'Incident Zip') and make a map. Where do most complaints occur?

In [None]:
#Make your HTML Map:


In [None]:
#Add in markers from df311:


In [None]:
#Display your map:


##Collision Data



NYC OpenData also has the data for all collisions reported to the police:

https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95.
Since the files are quite large, use the "Filter" option and choose your favorite date and "Export" (in CSV format) all collisions for that day.

Or, you can use a small set of collisions that we have already downloaded:

In [48]:
import pandas as pd
url = 'https://raw.githubusercontent.com/stjohn/datasci/main/openDataWeek/collisionsThHunterBday.csv'
collisions = pd.read_csv(url)
#Drop rows that are missing location:
collisions = collisions.dropna(subset = ['LATITUDE','LONGITUDE'])
collisions

Unnamed: 0,DATE,TIME,BOROUGH,ZIP CODE,LATITUDE,LONGITUDE,LOCATION,ON STREET NAME,CROSS STREET NAME,OFF STREET NAME,...,CONTRIBUTING FACTOR VEHICLE 2,CONTRIBUTING FACTOR VEHICLE 3,CONTRIBUTING FACTOR VEHICLE 4,CONTRIBUTING FACTOR VEHICLE 5,UNIQUE KEY,VEHICLE TYPE CODE 1,VEHICLE TYPE CODE 2,VEHICLE TYPE CODE 3,VEHICLE TYPE CODE 4,VEHICLE TYPE CODE 5
2,10/18/2016,8:10,STATEN ISLAND,10312.0,40.540551,-74.193197,"(40.5405508, -74.1931974)",RATHBUN AVENUE,NIPPON AVENUE,,...,,,,,3542705,PASSENGER VEHICLE,,,,
3,10/18/2016,8:10,BROOKLYN,11238.0,40.686455,-73.968107,"(40.6864547, -73.9681074)",,,107 GREENE AVENUE,...,Unspecified,,,,3542406,PASSENGER VEHICLE,BICYCLE,,,
4,10/18/2016,8:10,BROOKLYN,11234.0,40.626174,-73.922234,"(40.6261739, -73.9222336)",,,5515 AVENUE K,...,,,,,3543613,PASSENGER VEHICLE,,,,
6,10/18/2016,8:10,BROOKLYN,11218.0,40.631276,-73.976049,"(40.6312756, -73.976049)",18 AVENUE,EAST 2 STREET,,...,,,,,3548212,PASSENGER VEHICLE,,,,
7,10/18/2016,8:10,BROOKLYN,11217.0,40.682737,-73.972234,"(40.6827366, -73.9722339)",,,718 ATLANTIC AVENUE,...,Unspecified,,,,3542676,PICK-UP TRUCK,PASSENGER VEHICLE,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
585,10/18/2016,0:10,QUEENS,11412.0,40.692471,-73.759412,"(40.6924712, -73.7594116)",192 STREET,LINDEN BOULEVARD,,...,Driver Inattention/Distraction,,,,3542103,SPORT UTILITY / STATION WAGON,PASSENGER VEHICLE,,,
586,10/18/2016,0:10,MANHATTAN,10019.0,40.767889,-73.981512,"(40.7678891, -73.9815125)",COLUMBUS CIRCLE,CENTRAL PARK SOUTH,,...,Driver Inattention/Distraction,,,,3542262,TAXI,TAXI,,,
587,10/18/2016,0:10,MANHATTAN,10016.0,40.742854,-73.977207,"(40.7428535, -73.977207)",EAST 31 STREET,2 AVENUE,,...,Unspecified,,,,3542608,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
588,10/18/2016,0:10,BROOKLYN,11222.0,40.729501,-73.953958,"(40.7295015, -73.9539577)",MANHATTAN AVENUE,MILTON STREET,,...,Unspecified,,,,3542246,SPORT UTILITY / STATION WAGON,TAXI,,,


###Challenges

* Write a program that uses pandas to find the top three contributing factors for the primary vehichle of collisions ("CONTRIBUTING FACTOR VEHICLE 1") in your file.
A sample run for New Years Day in 2016 is:
```
Enter CSV file name:  collisionsNewYears2016.csv
Top three contributing factors for collisions:
Driver Inattention/Distraction    136
Unspecified                       119
Following Too Closely              37
```
* Modify the folium program above to create a map with markers for all the traffic collisions from the input file.
Hint: For this data set, the names of the columns are "LATITUDE" and "LONGITUDE" (unlike the previous map problem, where the data was stored with "Latitude" and "Longitude").


## What's Next?

Enjoyed the workshop? Come join us at [Hunter College](https://hunter.cuny.edu)
! 
We integrate NYC OpenData into our curriculum, starting with the [first computer science course](https://huntercsci127.github.io/s22.html).