Predicting bike share availability with H2O and Spark on AWS EMR

Tools Used:

station.csv - Contains data that represents a station where users can pick up or return bikes.
status.csv - Data about the number of bikes and docks available for a given station and minute.
trips.csv - Data about individual bike trips
weather.csv - Data about the weather on a specific day for certain zip codes

Predict number of bikes available at a given station with:

Data Pipeline:

Prediction Result:

Run Time Comparison on different AWS EMR Clusters:

Group member: Esther Liu, Marine Lin, Akankasha, Lexie

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
graphs		graphs
presentation		presentation
visualization_graphs		visualization_graphs
EDA_preprocessing.ipynb		EDA_preprocessing.ipynb
EDA_visulization.ipynb		EDA_visulization.ipynb
EMR_SparkML_modelling.ipynb		EMR_SparkML_modelling.ipynb
H2O_modelling.ipynb		H2O_modelling.ipynb
README.md		README.md

Provide feedback