Skip to content

salesforce/transmogrifai-helloworld-sbt

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
sbt
 
 
 
 

TransmogrifAI Hello World for SBT

First, Download Spark 2.4.5

Define SPARK_HOME environment variable

export SPARK_HOME=your_spark_home_dir

Run TitanicSimple:

./sbt "sparkSubmit \
    --class com.salesforce.hw.OpTitanicSimple \
    -- $PWD/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv"

Titanic model

Train

./sbt "sparkSubmit \
    --class com.salesforce.hw.titanic.OpTitanic -- \
    --run-type=train --model-location=/tmp/titanic-model \
    --read-location Passenger=$PWD/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv"

Score

./sbt "sparkSubmit \
    --class com.salesforce.hw.titanic.OpTitanic -- \
    --run-type=score --model-location=/tmp/titanic-model \
    --read-location Passenger=$PWD/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv \
    --write-location /tmp/titanic-scores"

Evaluate

./sbt "sparkSubmit \
    --class com.salesforce.hw.titanic.OpTitanic -- \
    --run-type evaluate \
    --model-location /tmp/titanic-model \
    --read-location Passenger=$PWD/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv \
    --write-location /tmp/titanic-eval \
    --metrics-location /tmp/titanic-metrics"

Boston house model

Train

./sbt "sparkSubmit \
    --class com.salesforce.hw.boston.OpBoston -- \
    --run-type=train --model-location=/tmp/boston-model \
    --read-location BostonHouse=$PWD/src/main/resources/BostonDataset/housing.data"

Score

./sbt "sparkSubmit \
    --class com.salesforce.hw.boston.OpBoston -- \
    --run-type=score --model-location=/tmp/boston-model \
    --read-location BostonHouse=$PWD/src/main/resources/BostonDataset/housing.data \
    --write-location /tmp/boston-scores"

Evaluate

./sbt "sparkSubmit \
    --class com.salesforce.hw.boston.OpBoston -- \
    --run-type evaluate \
    --model-location /tmp/boston-model \
    --read-location BostonHouse=$PWD/src/main/resources/BostonDataset/housing.data \
    --write-location /tmp/boston-eval \
    --metrics-location /tmp/boston-metrics"

Iris model

Train

./sbt "sparkSubmit \
    --class com.salesforce.hw.iris.OpIris -- \
    --run-type=train --model-location=/tmp/iris-model \
    --read-location Iris=$PWD/src/main/resources/IrisDataset/iris.data"

Score

./sbt "sparkSubmit \
    --class com.salesforce.hw.iris.OpIris -- \
    --run-type=score --model-location=/tmp/iris-model \
    --read-location Iris=$PWD/src/main/resources/IrisDataset/bezdekIris.data \
    --write-location /tmp/iris-scores"

Evaluate

./sbt "sparkSubmit \
    --class com.salesforce.hw.iris.OpIris -- \
    --run-type evaluate \
    --model-location /tmp/iris-model \
    --read-location Iris=$PWD/src/main/resources/IrisDataset/bezdekIris.data \
    --write-location /tmp/iris-eval \
    --metrics-location /tmp/iris-metrics"

Data Preparation

./sbt "sparkSubmit \
    --class com.salesforce.hw.dataprep.JoinsAndAggregates -- \
    $PWD/src/main/resources/EmailDataset/Clicks.csv \
    $PWD/src/main/resources/EmailDataset/Sends.csv"

./sbt "sparkSubmit \
    --class com.salesforce.hw.dataprep.ConditionalAggregation -- \
    $PWD/src/main/resources/WebVisitsDataset/WebVisits.csv"

Verify the Results

Look for the output file(s) in the location you specified. For instance, you can use avro-tools to inspect the scores files (on mac simply run brew install avro-tools to install it).

Other than that, the best way to verify the results is to look through the logs that should have been generated during the run. It has all kinds of information about the features the processing and the model reliability.

Generate your own workflow

Experiment with adding feature changes or exploring more models in any of the provided workflows.

See how high you can get your auROC!

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published