All of the following analyses were generated in the jupyter notebookes located in the notebooks
folder. The primary notebook files that have a majority of the analyses are PGH ML Dataset, PGHCenpy, and NYC Analyses. There are other analyses scattered throughout the other notebooks.
Preliminary models that are being trained on a variety of features can be found in the models
folder. You can view some preliminary results of our Random Forest model here.
The bike sharing program in Pittsburgh, PA is called Healthy Ride and is advertised as the public bike share system for Pittsburgh, PA. It is powered by NextBike which is a company in the UK that provides bikes and infrastructure for a bike sharing program. Healthy Ride was first established in Pittsburgh, in May 2015 and has since expanded to 100 stations and 550 bikes. Below you can see various stages of the program over the past few years.
2015 | 2018 | 2020 |
---|---|---|
Several datasets and APIs were used to create the following visualizations. These datasets are not released on this repo, but I can provide access to the datasets I have used upon request. I also provide direct links to the public datasets below.
Datasets
- Healthy Ride Station Locations
- Healthy Ride Trip Data
- ACS 2019 5 year estimates for employment, population, and commuting trends. Note: All census tracts were searched for; here is just a sample of 6 of them
- Points of interest using the OverPass API for OpenStreetMap OverPass Turbo
- Transit Score, Bike Score, and Walk Score from Walk Score API
- Poor housing conditions
A short note about the demographics data for Pittsburgh: I had difficulty accessing Allegheny County census tract data from the cenpy
Python library. I was able to get data for the census tracts that are considered to be a part of the Pittsburgh metro area for the race attributes. As a result, the maps presented for visualizing race by census tract look slighlty different than the rest of the Pittsburgh maps.
ACS 2019 5 year estimate for population 16 years or older
Downtown and the university campuses fall within census tracts that have a large population.
ACS 2019 5 year estimate for race by census tract
The figure on the left shows a high percentage of the population in a majority of the census tracts in Pittsburgh are White. The figure on the right shows a cluster of yellow and orange colored census tracts in the middle of Pittsburgh representing census tracts where a high percentage of the population Black or African American. Out of the 100 bike stations, aprroximately 9 are within marginilized census tracts.
ACS 2019 5 year estimates for commuting to work
The main point to take away from these visualizations is that there is a tendency for bike stations to exist in census tracts where 40% or more of the population travel to work alone by their own vehicle. It is also evident that very few bike stations fall within census tracts very a majority of the population uses some form of public transportation or walks to get to work.
ACS 2019 5 year estimates for mean travel time to work (minutes)
Mean travel time is calculated by including time waiting for buses as well as being stuck in traffic. Overall, it is unrealisitc to expect an individual to substitute a bike ride for a >25 minute car ride in the city. Looking at where stations are allocated, a significant number of stations are located in census tracts that have a mean travel time to work less than 30 minutes.
ACS 2019 5 year estimates for income
It is vital that low-income citizens have reliable access to affordable, on-demand transportation. However, a significant number of bike stations are allocated in census tracts that are well above the poverty line in PA of $50,000.
Allegheny County Poor Housing Conditions
Caption here
Bike Score and Walk Score from walkscore.com
The figure on the left shows regions of Pittsburgh that are suitable or not for biking. The figure on the right shows regions of Pittsburgh that are suitable or not for walking. To see more about how the scores are decided for a particular area, check out Walk Score's methodology.
Visualization Coming Soon
We are training a variety of supervised learning models on a variety of features from our data to identify the best one for demand prediction. You can find some preliminary findings that were unsuccessful in the notebook linked below. Moving forward we will be modifying the dataset and number of features used in the training set to include monthly outflow, weather data, and elevation data.
Relationship between predicted demand and ground truth Preliminary Results
2018 Outflow | 2019 Outflow | 2020 Outflow | Predicted Demand using mean |
---|---|---|---|
It is evident here that if you use solely historical data to predict demand, then it will learn to replicate the biases present in the historical data. Thus, the predicted demand here will look nearly the same as the historical data. But, how do we know that other regions that don't have historical data don't need this resource?
Predicting Outflow using spatial & infrastructural attributes & demographics
Using infrastructural attributes, historical data, and demographics we can identify new regions to explore without being biased.
The bike sharing program in New York City, NY is called Citi Bike and is known as the nation's largest bike sharing program with 19,000 bikes and over 1,000 stations. Citi Bike is powered by Lyft. Citi Bikes were first established in NYC in 2013 with approximately 600 bike stations around Manhatten. Below you can see various stages of the program over the past few yeras.
December 2013 | August 2015 | February 2017 | September 2018 | December 2019 | October 2020 | January 2021 |
---|---|---|---|---|---|---|
Several datasets and APIs were used to create the following visualizations. These datasets are not released on this repo, but I can provide access to the datasets I used upon request. I also provide direct links to the public datasets below.
Datasets
- Citi Bike Trip Data
- ACS 2019 5 year estimates using
cenpy
Python library - WalkScore API (coming soon)
- OverPass API (coming soon)
ACS 2019 5 year estimate for race by census tract
These visualizations follow similar trends seen in Pitsburgh, PA
More demographic analyses will be added soon. In the meantime, check out the NYC Analyses notebook to see all of them.
Coming Soon
Coming Soon
Coming Soon
Coming Soon
Coming Soon
The bike sharing program in Chicago, IL is called Divy Bike and is also powered by Lyft.