Skip to content

Latest commit

 

History

History
189 lines (135 loc) · 12.8 KB

README.md

File metadata and controls

189 lines (135 loc) · 12.8 KB

Understanding the Analyses

All of the following analyses were generated in the jupyter notebookes located in the notebooks folder. The primary notebook files that have a majority of the analyses are PGH ML Dataset, PGHCenpy, and NYC Analyses. There are other analyses scattered throughout the other notebooks.

Preliminary models that are being trained on a variety of features can be found in the models folder. You can view some preliminary results of our Random Forest model here.


Pittsburgh, PA

The bike sharing program in Pittsburgh, PA is called Healthy Ride and is advertised as the public bike share system for Pittsburgh, PA. It is powered by NextBike which is a company in the UK that provides bikes and infrastructure for a bike sharing program. Healthy Ride was first established in Pittsburgh, in May 2015 and has since expanded to 100 stations and 550 bikes. Below you can see various stages of the program over the past few years.

2015 2018 2020
width="40"

Analyses

Several datasets and APIs were used to create the following visualizations. These datasets are not released on this repo, but I can provide access to the datasets I have used upon request. I also provide direct links to the public datasets below.

Datasets

Demographics

A short note about the demographics data for Pittsburgh: I had difficulty accessing Allegheny County census tract data from the cenpy Python library. I was able to get data for the census tracts that are considered to be a part of the Pittsburgh metro area for the race attributes. As a result, the maps presented for visualizing race by census tract look slighlty different than the rest of the Pittsburgh maps.

ACS 2019 5 year estimate for population 16 years or older

Downtown and the university campuses fall within census tracts that have a large population.

ACS 2019 5 year estimate for race by census tract

The figure on the left shows a high percentage of the population in a majority of the census tracts in Pittsburgh are White. The figure on the right shows a cluster of yellow and orange colored census tracts in the middle of Pittsburgh representing census tracts where a high percentage of the population Black or African American. Out of the 100 bike stations, aprroximately 9 are within marginilized census tracts.

Commuting to Work

ACS 2019 5 year estimates for commuting to work

The main point to take away from these visualizations is that there is a tendency for bike stations to exist in census tracts where 40% or more of the population travel to work alone by their own vehicle. It is also evident that very few bike stations fall within census tracts very a majority of the population uses some form of public transportation or walks to get to work.

ACS 2019 5 year estimates for mean travel time to work (minutes)

Mean travel time is calculated by including time waiting for buses as well as being stuck in traffic. Overall, it is unrealisitc to expect an individual to substitute a bike ride for a >25 minute car ride in the city. Looking at where stations are allocated, a significant number of stations are located in census tracts that have a mean travel time to work less than 30 minutes.

Household Income

ACS 2019 5 year estimates for income

It is vital that low-income citizens have reliable access to affordable, on-demand transportation. However, a significant number of bike stations are allocated in census tracts that are well above the poverty line in PA of $50,000.

Poor Housing Conditions

Allegheny County Poor Housing Conditions

Caption here

Infrastructure

Bike Score and Walk Score from walkscore.com

The figure on the left shows regions of Pittsburgh that are suitable or not for biking. The figure on the right shows regions of Pittsburgh that are suitable or not for walking. To see more about how the scores are decided for a particular area, check out Walk Score's methodology.

Points of Interest

Visualization Coming Soon

Demand Prediction

Models (preliminary results - work in progress)

We are training a variety of supervised learning models on a variety of features from our data to identify the best one for demand prediction. You can find some preliminary findings that were unsuccessful in the notebook linked below. Moving forward we will be modifying the dataset and number of features used in the training set to include monthly outflow, weather data, and elevation data.

Historical Data

Relationship between predicted demand and ground truth Preliminary Results

2018 Outflow 2019 Outflow 2020 Outflow Predicted Demand using mean
width="40"

It is evident here that if you use solely historical data to predict demand, then it will learn to replicate the biases present in the historical data. Thus, the predicted demand here will look nearly the same as the historical data. But, how do we know that other regions that don't have historical data don't need this resource?

Spatially Sensitive (preliminary)

Predicting Outflow using spatial & infrastructural attributes & demographics

Using infrastructural attributes, historical data, and demographics we can identify new regions to explore without being biased.


New York City, NY

The bike sharing program in New York City, NY is called Citi Bike and is known as the nation's largest bike sharing program with 19,000 bikes and over 1,000 stations. Citi Bike is powered by Lyft. Citi Bikes were first established in NYC in 2013 with approximately 600 bike stations around Manhatten. Below you can see various stages of the program over the past few yeras.

December 2013 August 2015 February 2017 September 2018 December 2019 October 2020 January 2021
December 2013 bike station locations August 2015 bike station locations February 2017 bike station locations September 2018 bike station locations decmber 2019 bike station locations October 2020 bike station locations January 2021 bike station locations

Analyses

Several datasets and APIs were used to create the following visualizations. These datasets are not released on this repo, but I can provide access to the datasets I used upon request. I also provide direct links to the public datasets below.

Datasets

  • Citi Bike Trip Data
  • ACS 2019 5 year estimates using cenpy Python library
  • WalkScore API (coming soon)
  • OverPass API (coming soon)

Demographics

ACS 2019 5 year estimate for race by census tract

These visualizations follow similar trends seen in Pitsburgh, PA

More demographic analyses will be added soon. In the meantime, check out the NYC Analyses notebook to see all of them.

Infrastructure

Coming Soon

Points of Interest

Coming Soon

Demand Prediction

Coming Soon

Historical Data

Coming Soon

Spatially Sensitive

Coming Soon


Chicago, IL (Working Progress)

The bike sharing program in Chicago, IL is called Divy Bike and is also powered by Lyft.


Chattanooga, TN (New Addition as of March 30th)