Flight analytics and cancellation prediction with sparklyr and pyspark
This project is for the end-to-end ML at Scale workshop. It creates an API that can predict the likelihood of a flight being cancelled based on historic flight data. The original dataset comes from Kaggle. The workshop shows both the pyspark and sparklyr implementations and covers:
- Data Science and Exploration
- ML Model Building and Optimisation
- ML Model Training
- ML Model Serving
- Deploying an Application
!chmod 777 cdsw-build.sh
!pip3 install flask
install these R packages
sparklyr psych ggthemes leaflet
Related Content http://blog.cloudera.com/blog/2017/02/analyzing-us-flight-data-on-amazon-s3-with-sparklyr-and-apache-spark-2-0/