#Background Reading
> [Coordinate Reference System](https://en.wikipedia.org/wiki/Spatial_reference_system) <br>
> [EPSG Codes](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset) <br>
> [New York EPSG](https://epsg.io/2263) <br>
> [geojson](https://en.wikipedia.org/wiki/GeoJSON) <br>
> [Spatial Query](https://en.wikipedia.org/wiki/Spatial_database#Spatial_query) <br>
> [geopandas](https://geopandas.org/) <br>
> [Apache Sedona](http://sedona.apache.org/api/sql/GeoSparkSQL-Function/) <br>
> [Databricks Notebooks and Magics](https://docs.databricks.com/notebooks/notebooks-use.html#language-magic) <br>
> [Databricks Visualisations](https://docs.databricks.com/notebooks/visualizations/index.html)

# Data

|data|url|
|-----|---|
|New York Taxi Zones shapefile|https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip|
|New York Roads geojson|https://data.cityofnewyork.us/api/geospatial/svwp-sbcd?method=export&format=GeoJSON|
|New York Accidents csv|https://data.ny.gov/api/views/e8ky-4vqe/rows.csv?accessType=DOWNLOAD&sorting=true|
|New York Municipalities geojson|https://services5.arcgis.com/GfwWNkhOj9bNBqoJ/arcgis/rest/services/NYC_Municipal_Court_Districts/FeatureServer/0/query?where=1=1&outFields=*&outSR=4326&f=pgeojson|
|New York DoT Road Markers Shapefile|http://gis.ny.gov/gisdata/fileserver/?DSID=1112&file=ReferenceMarker.zip|
|New York Yellow Taxi Journeys Hive Table |taxi_journeys|

# Code Snippets

|description|snippet|
|-----|---|
|install geopandas|```!pip install geopandas pygeos descartes```|
|load taxi zones| ```geopandas.read_file('https://s3.amazonaws.com/nyc-tlc/misc/taxi_zones.zip')```|
|query taxi journeys - SQL|```SELECT * FROM taxi_journeys LIMIT 1```|
|query taxi journeys - pandas|```spark.sql('SELECT * FROM taxi_journeys LIMIT 1').toPandas()```|
|describe taxi journeys schema - sql|```DESCRIBE taxi_journeys```|
|useful magics|```!pip, %conda, %sql, $sh, %scala```|
|smaller taxi journeys table if larger is an issue|```spark.sql('SELECT * FROM taxi_journeys_sm LIMIT 1').toPandas()```|
|plot two spatial datasets|```import geopandas```<br>```ax1 =geopandas.read_file({dataset1}).plot()```<br>```ax2 = geopandas.read_file({dataset2).plot(ax=ax1)```|

# Challenges

# Challenge 1 - Longest Roads
> How many roads are there in the dataset? </br>
> How many results do you get for a query where road name is 3 AV in Brooklyn? </br>
> Can you estimate the length of 3AV in Brooklyn ([wikipedia states approx 6mi, 31680ft](https://en.wikipedia.org/wiki/Third_Avenue_(Brooklyn))<br>
> Can you show a visualisation of the road within the Brooklyn taxi zone, to verify the result? </br>
> Can you comment on how best to accurately verify the length of the length?

# Challenge 2 - Accident Hotspots
> In 2018 and 2019, which municipalities recorded the highest number of accidents? <br>
> Can you estimate which Department of Transport managed roads have the highest number of accidents recorded? (where it is possible to deduce) </br>
> Can you design choropleth visualisations for the answers above? <br>
> Can you show accident hotspots in the top 10 percentile?</br>
> Are accidents correlated with variables such as time of day, and reported conditions?

# Challenge 3 - Yellow Taxi Journeys 2018-2020
> From 2018-2020, in which hour of the day window was there the most recorded demand for yellow taxis? <br>
> Which yellow taxi zones have the best tippers? <br>
> Can you show a choropleth for tipping? <br>
> For data that has been collected, has the COVID-19 outbreak impacted yellow taxi demand? <br>

# Challenge 4 - Cookie Cutter Challenge
[image of manhattan](https://lmvuk.blob.core.windows.net/mapathon/manhattan.tif) <br>
[wkt polygons](https://lmvuk.blob.core.windows.net/mapathon/wkt.txt)
> Can you load the image and describe the image CRS? <br>
> Can you cookie cutter parts of the image that are present in manhattan taxi zones? <br> 
> Can you cookie cutter the polygon composed of the series of wkt? What image does this give you?