This project serves as an example of a scala-spark project using the Kaggle Titanic dataset
- Clone repository to local directory
- cd into directory
- Compile the project with sbt using:
sbt package
- Train model pipeline using:
spark-submit --class ModelTrain --master local[*] --driver-memory 4G target/scala-2.11/scalasparktitanicproject_2.11-1.0.jar
- Train model pipeline using:
spark-submit --class ModelPredict --master local[*] --driver-memory 4G target/scala-2.11/scalasparktitanicproject_2.11-1.0.jar
Predictions data will be saved as a csv file in the predictions directory found in project root dir