This tutorial for generating dummy binary variables from categorical variables, which
may be a required step step before applying using certain machine learning algorithms
Dependent script
Checkout the project avenir. Take the script and from the project and placeit
in ../lib directory with respect the directory containing
Build and Deployment
Please refer to spark_dependency.txt
Generate input data
./ genInput <num_leads> <output_file>
num_leads = number of sales leads. should be few thousands so that each categorical variable
value appears in the data set at least once
output_file = output file name
Copy the output file to spark input directory as specified in
Run unique value finder Spark job
./ uniqueValues
Run binary variable generator Spark job
./ binValGen
The script should be changed as necessary depending on your environment
Configuration parameters are dvg.conf. Make changes as necessary
