Skip to content

feiut/DataHack20190413

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataHack20190413

Data Hackthon - Dataset (BCycle.csv & Beijing Geolife.csv)

Team Members

  • Binglin(Mark) Zhang
  • Zijian(Steven) Wang
  • Fei He

Objectives

Introducing BCycle into Beijing in 2008.
How would it leverage the efficieny?
Would it improve the air condition?
What kind of memeberships will people purchase?

Background Information from Data

Beijing Geolife dataset

Beijing Geolife.csv includes the trajectories of participants. Their locations in measure of Longitude and Latitude would be recorded every 3~5 seconds.

Data File Description

Five random samples from the Geolife dataset
Beijing 5 samples Participants from Beijing were travelling along the roads and might cluster in certain area.

Austin BCycle Dataset

Austin BCycle1 Austin BCycle2 Check out / Return counts density in each Austin Kiosk Austin Checkout Austin Return

Idea I

Beijing might has similar population who need BCycle as Austin BCyclers have.
Why not introduce BCycle into Beijing.

Algorithm

By using K Means on the data of Beijing Geolife, we cluster the footprints into 100 groups. And each group has a center which represents the potential kiosk station.

100 stations
100 kiosks

Closer Look
closer look
Some kiosks are only 10 meters from each other, which indicates these locations need more bikes and a potential higher frequency of CheckOut/Return.

Idea II

Will BCycle in Beijing benefit people?

Austin BCycle data tells us

Two Distribution Graph

We were thinking if there is a correlation between Distance vs. Duration?
We tried to predict duration by distance, but find big discrepancy.
Then we attempted to visualize this:

The number of times bikes were used (vs. weekday & hour in a day)
People tended to borrow a bike on weekend and at 11:00 - 20:00.

So we also visualized this:
Duration (vs. Weekday & Check Out time) [average]

But the average was problematic because of the outliers like 1000 hour duration (probably a missing bike or ruined bike)
So we visualized it again with the median:
Duration (vs. Weekday & Check Out time) [median]

Algorithm

We build a model (M2.1) to predict the duration of a bike ride (by distance, weekday, check out time), by using xgboost.

Implementation of the Model in potential Beijing BCycle kiosks

User000 in Beijing If this person is using BCycle, his/her route will looks like this (generating by his original on-foot data):

The we use the model M2.1 to predict the duration by using BCycle, which is 19.93 minutes.
Comparing to the original time duration, which is 49.24 minutes, the BCycle highly elevate the efficiency.

Future Ideas

  • The current Geolife data is confusing because of the unclear specification of the transportation type. We can in the future to calculate if the BCycle is time-saving for bus-commuter or private car driver, since the public transportation in Beijing takes extra time to wait and busy car traffic always causes traffic jam.
  • By visulizing the membership and counts and duration, we are thinking if we can predict the membership type that Beijing people might want to purchase.


About

Data Hackthon - Dataset (BCycle&Beijing Geolife)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%