Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
40 lines (29 sloc) 1.14 KB
This tutorial for generating dummy binary variables from categorical variables, which
may be a required step step before applying using certain machine learning algorithms
Dependent script
Checkout the project avenir. Take the script and from the project and placeit
in ../lib directory with respect the directory containing
Build and Deployment
Please refer to spark_dependency.txt
Generate input data
./ genInput <num_leads> <output_file>
num_leads = number of sales leads. should be few thousands so that each categorical variable
value appears in the data set at least once
output_file = output file name
Copy the output file to spark input directory as specified in
Run unique value finder Spark job
./ uniqueValues
Run binary variable generator Spark job
./ binValGen
The script should be changed as necessary depending on your environment
Configuration parameters are dvg.conf. Make changes as necessary
You can’t perform that action at this time.