The Salute Project
Automatic data preparation for machine learning
In Big Data processing most of the time is spent in preparing the data ready for use by advanced machine learning tools or humans building reports. This work means that insights remain locked away for too long.
Salute is able to process any type of file (Text, Image, Video, Audio, etc) and generate an output file with all the features created ready to be loaded into a machine learning or reporting tool
Salute is based on Spark and is able to process huge files.
The best way to run Salute is:
<spark_home>/bin/spark-submit target/salute-0.1-SNAPSHOT.jar <input_file> <output_dir>