Skip to content

fastforwardlabs/workshop_ml_at_scale

Repository files navigation

Machine Learning at Scale

Flight analytics and cancellation prediction with sparklyr and pyspark

This project is for the end-to-end ML at Scale workshop. It creates an API that can predict the likelihood of a flight being cancelled based on historic flight data. The original dataset comes from Kaggle. The workshop shows both the pyspark and sparklyr implementations and covers:

  • Data Science and Exploration
  • ML Model Building and Optimisation
  • ML Model Training
  • ML Model Serving
  • Deploying an Application

Setup Required

All users

!chmod 777 cdsw-build.sh

Python Users

!pip3 install flask

R - users

install these R packages

sparklyr psych ggthemes leaflet

Related Content http://blog.cloudera.com/blog/2017/02/analyzing-us-flight-data-on-amazon-s3-with-sparklyr-and-apache-spark-2-0/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published