# A Casual Analysis of Public Transportation in New York City

### Author: Nathaniel del Rosario, A17562063
[Rubric](file:///Users/nathaniel.delrosario/Downloads/Project_final_template.pdf)

### Introduction, Question(s), & Hypothesis

The New York City public transportation is arguably one of the best in North America, providing many different methods such as metro, ride share, and bike as the most common. However, it is not a perfect system, possessing its own set of shortcomings. For example, compared to Tokyo's public transportation infrastructure, NYC's system is not as expansive and under serves more areas compared to Tokyo. Considering such context, the question arises, "just how under served are parts of New York City in the scope of public transportation?" and furthermore, are there any effects in other domains due to these under served areas?

I hypothesize that there are in fact different factors whose effects that are correlated with some areas being under served specifically by the NYC metro such as these areas being more likely to experience more ride share and bike usage. Upon witnessing any correlation, the next question becomes “is there causation as well?” Answering such uncertainty is the goal of this project.

This question is important because it involves using population, ridership, geo-spatial, and tract data to help people not only understand their commute as well as identify potential causality between different events and transportation accessibility. On average people will spend at least an hour commuting to and from work and school, and this is a huge chunk of our day (1/16 if you get a full 8 hours of sleep!) Additionally, public transportation companies can benefit greatly from this analysis as they can modify their strategy to appeal more to commuters and plan where to expand service to those who are under served. Lastly, the average citizen would benefit from this information because it could convince them to take public transportation instead of contributing to the increasing problem of traffic congestion in major metropolitan areas.

### Related Work

Scarlett T. Jin, Hui Kong & Daniel Z. Sui (2019) Uber, Public Transit, and Urban Transportation Equity: A Case Study in New York City, The Professional Geographer, 71:2, 315-330, DOI: 10.1080/00330124.2018.1531038

distribution of Uber services is highly unequal, Correlation analysis 
shows that there tend to be fewer Uber pickups in low-income areas


Tang, J.; Gao, F.; Liu, F.; Zhang, W.; Qi, Y. Understanding Spatio-Temporal Characteristics of Urban Travel Demand Based on the Combination of GWR and GLM. Sustainability 2019, 11, 5525. https://doi.org/10.3390/su11195525


results suggest that most taxi trips are concentrated in a fraction of the geographical area. Variables including road density, subway accessibility, Uber vehicle, point of interests (POIs), commercial area, taxi-related accident and commuting time have significant effects on travel demand,

### Packages & Libraries

The packages and libraries used for our analysis are ArcGIS Online (including all ArcGIS features and analysis functions, Python, GeoPandas, shapely Point geometry)

### Data Sources

In [None]:
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point

import arcgis
from arcgis.gis import GIS
from arcgis import geometry
from arcgis.features import GeoAccessor, GeoSeriesAccessor, FeatureLayerCollection, FeatureSet, FeatureCollection, FeatureLayer
from arcgis.features.use_proximity import create_buffers
from IPython.display import display
import os

gis = GIS("https://ucsdonline.maps.arcgis.com/home/index.html", "dsc170wi24_", "")

In [None]:
m = gis.map('New York, NY')

In [None]:
# get transport feature layers
metro_stops_fl = gis.content.get('d52e004c3bda4397ae2145257ede1200')
rideshare_fl = gis.content.get('072e86100593482887a99aaaac8b2ada')
bike_lanes_fl = gis.content.get('8aff6fb97ef546679e97b1696bfbf052')
bike_lane_low_income_intersect_fl = gis.content.get('dc2e07a9af82464e94318c7dc71fc084')
bike_station_low_income_intersect_fl = gis.content.get('f0679d1e4ca44350abed2a48eecb7eb9')

# get income layers
low_income_binary_fl = gis.content.get('9bb695ac4b874286ab6645e4196f19bb')
income_dist_fl = gis.content.get('00847778292e466082388a18230f41ba')
gentrification_fl = gis.content.get('f8f47e4166d34862a6d340d8e2dcb55f')

m.add_layer(metro_stops_fl)
m.add_layer(rideshare_fl)
m.add_layer(bike_lanes_fl)
m

In [None]:
# append to each feature service: /0/query?where=1%3D1&outFields=*&f=geojson
# Income: https://services1.arcgis.com/HmwnYiJTBZ4UkySc/arcgis/rest/services/NYCMedianIncomeDistributions_WFL1/FeatureServer/0/query?where=1%3D1&outFields=*&f=geojson
# The rest are uploaded to github.com/natdosan/dsc170finalproj

In [None]:
# Load Data into GeoDataFames

### Analysis

First we define public transportation as metro, bike, and uber. Walking and driving are non-public transporation. Keep this in mind as we go further with each analysis step

In [None]:
# Create Rideshare Choropleth

In [None]:
# Look at Metro, Bike, Rideshare Intersections -> Create Buffers for each

In [None]:
# Look at income vs these intersections, (income vs metro, income vs bike, income vs rideshare)
# Do there apppear to be any correlation

In [None]:
# Design Score / Metric for each Greater NYC adminstrative boundary 
# Low values point to lower income / lower accessibility / farther distance from metro, bike, uber buffers 

In [None]:
# Finally, Choropleth of Score by Administrative Boundary

In [None]:
# If time: regression to predict score

### Summary & Results

### Discussion

### Conclusions & Future Work