Automatic data preparation for machine learning
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
LICENSE
README.md
dependency-reduced-pom.xml
pom.xml

README.md

The Salute Project

Automatic data preparation for machine learning

The need

In Big Data processing most of the time is spent in preparing the data ready for use by advanced machine learning tools or humans building reports. This work means that insights remain locked away for too long.

The answer

Salute is able to process any type of file (Text, Image, Video, Audio, etc) and generate an output file with all the features created ready to be loaded into a machine learning or reporting tool

The technology

Salute is based on Spark and is able to process huge files.

Running Salute

The best way to run Salute is:

<spark_home>/bin/spark-submit target/salute-0.1-SNAPSHOT.jar <input_file> <output_dir>