GitHub - feiut/DataHack20190413: Data Hackthon

DataHack20190413

Data Hackthon - Dataset (BCycle.csv & Beijing Geolife.csv)

Team Members

Binglin(Mark) Zhang
Zijian(Steven) Wang
Fei He

Objectives

Introducing BCycle into Beijing in 2008.
How would it leverage the efficieny?
Would it improve the air condition?
What kind of memeberships will people purchase?

Background Information from Data

Beijing Geolife dataset

Beijing Geolife.csv includes the trajectories of participants. Their locations in measure of Longitude and Latitude would be recorded every 3~5 seconds.

Five random samples from the Geolife dataset
Participants from Beijing were travelling along the roads and might cluster in certain area.

Austin BCycle Dataset

Check out / Return counts density in each Austin Kiosk

Idea I

Beijing might has similar population who need BCycle as Austin BCyclers have.
Why not introduce BCycle into Beijing.

Algorithm

By using K Means on the data of Beijing Geolife, we cluster the footprints into 100 groups. And each group has a center which represents the potential kiosk station.

100 stations

Closer Look

Some kiosks are only 10 meters from each other, which indicates these locations need more bikes and a potential higher frequency of CheckOut/Return.

Idea II

Will BCycle in Beijing benefit people?

Austin BCycle data tells us

Two Distribution Graph

We were thinking if there is a correlation between Distance vs. Duration?
We tried to predict duration by distance, but find big discrepancy.
Then we attempted to visualize this:

The number of times bikes were used (vs. weekday & hour in a day)
People tended to borrow a bike on weekend and at 11:00 - 20:00.

So we also visualized this:
Duration (vs. Weekday & Check Out time) [average]

But the average was problematic because of the outliers like 1000 hour duration (probably a missing bike or ruined bike)
So we visualized it again with the median:
Duration (vs. Weekday & Check Out time) [median]

Algorithm

We build a model (M2.1) to predict the duration of a bike ride (by distance, weekday, check out time), by using xgboost.

Implementation of the Model in potential Beijing BCycle kiosks

User000 in Beijing If this person is using BCycle, his/her route will looks like this (generating by his original on-foot data):

The we use the model M2.1 to predict the duration by using BCycle, which is 19.93 minutes.
Comparing to the original time duration, which is 49.24 minutes, the BCycle highly elevate the efficiency.

Future Ideas

The current Geolife data is confusing because of the unclear specification of the transportation type. We can in the future to calculate if the BCycle is time-saving for bus-commuter or private car driver, since the public transportation in Beijing takes extra time to wait and busy car traffic always causes traffic jam.
By visulizing the membership and counts and duration, we are thinking if we can predict the membership type that Beijing people might want to purchase.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Jupyter Code		Jupyter Code
Tableau		Tableau
Visualization		Visualization
dataset		dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataHack20190413

Team Members

Objectives

Background Information from Data

Beijing Geolife dataset

Austin BCycle Dataset

Idea I

Algorithm

Idea II

Austin BCycle data tells us

Algorithm

Implementation of the Model in potential Beijing BCycle kiosks

Future Ideas

About

Uh oh!

Releases

Packages

Languages

feiut/DataHack20190413

Folders and files

Latest commit

History

Repository files navigation

DataHack20190413

Team Members

Objectives

Background Information from Data

Beijing Geolife dataset

Austin BCycle Dataset

Idea I

Algorithm

Idea II

Austin BCycle data tells us

Algorithm

Implementation of the Model in potential Beijing BCycle kiosks

Future Ideas

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages