Configuration

Each configuration parameter is documented with name, description and a default value if any

BayesianDistribution MR

tabular.input

Value is true if input format is tabular, false if text (true)

BayesianPredictor MR

feature.schema.file.path Schema JSON file HDFS path

bp.predict.class.cost

Cost values for cost based arbitrator

bp.predict.class

Class atrribute values. Generally this is obtained from the schema

class.prob.diff.threshold

Class attribute probablity dofeerence threshold for prediction purpose (-1)

output.feature.prob.only

Set to true if only feature probability needs to be output (false)

BaggingSampler MR

batch.size

Batch size for random shuffling (1000)

CategoricalCorrelation MR

first.set.attributes

Attribute ordinal list for the first set

second.set.attributes

Attribute ordinal list for the second set

ClassPartitionGenerator MR

feature.schema.file.path

Schema JSON file HDFS path

split.attribute.selection.strategy

Attribute selection strategy for splitting. Values are 1.userSpecified - Specified by the user 2.all - All attributes. An attribute may be split multiple times at different levels 3.notUsedYet - Attributes not split yet will be split 4.random - Attributes will be selected randomly

split.algorithm

Splitting algorithms. Choices are 1.entropy 2. giniIndex 3. hellingerDistance 4. classConfidenceRatio (giniIndex)

output.split.prob

Set to true if addition split probablity is to be output (false)

parent.info

Parent node information content

CramerCorrelation MR

feature.schema.file.path

Schema JSON file HDFS path

source.attributes

Ordinal list for source attributes

dest.attributes

Ordinal list for destination attributes

correlation.scale

Correlation scale (1000)

HeterogeneityReductionCorrelation MR

heterogeneity.algorithm

Heterogeneity algorithm. Choices are 1. gini 2. uncertainty

MutualInformation MR

feature.schema.file.path

Schema JSON file HDFS path

output.mutual.info

Set to true to output mutual info (false)

mutual.info.score.algorithms

Mutual info scoring algorithms. Choices are 1. mutual.info.maximization 2. mutual.info.selection 3. joint.mutual.info 4. double.input.symmetric.relevance 5. min.redundancy.max.relevance (mutual.info.maximization)

mutual.info.redundancy.factor

Redundancy factor for the algorithm mutual.info.selection (1.0)

UnderSamplingBalancer MR

class.attr.ord

Class attribute ordinal

distr.batch.size

Sampling batch size (500)

FeatureCondProbJoiner MR

feature.cond.prob.split.prefix

Feature conditional probability input file prefix (condProb)

NearestNeighbor MR

validation.mode

Set to true if in validation mode (false)

class.condition.weighted

Set to true if class conditional probability weighting is to be applied (false)

prediction.mode

The mode of prediction. Choices are 1. classification 2. regression (classification)

regression.method

The method of regression. Choices are 1. average 2. median 3. linearRegression 4. multiLinearRegression (average)

top.match.count

Number of nearest neighbors (10)

kernel.function

Type of kernel function for calculating score distance. Choices are 1. none 2. linearMultiplicative 3. linearAdditive 4. gaussian 5. sigmoid (none)

kernel.param

Parameter associated with kernel function (-1)

output.class.distr

Set to true if class conditional probability distribution is to be output (false)

inverse.distance.weighted

Set to true if score is to be inverse distance weighted (false)

decision.threshold

Threshold value for score ratio threshold based classification (-1.0)

use.cost.based.classifier

Set to true for cost based classifier (false)

class.attribute.values

Coma separated class attribute values. Needed if use.cost.based.classifier is true and prediction.mode is classification

misclassification.cost

Misclassification cost. Needed if use.cost.based.classifier is true and prediction.mode is classification

HiddenMarkovModelBuilder MR

skip.field.count

Number of fields to skip from the beginning of the beginning (0)

sub.field.delim

Sub field delimiter between state and observation

partially.tagged

Set to true if only some of the observations are tagged with states

window.function

Window function when partially.tagged is true

model.states

List of states

model.observations

List of observations

MarkovStateTransitionModel MR

skip.field.count

Number of fields to skip from the beginning of the beginning (0)

model.states

List of coma separated states

trans.prob.scale

Transition probability scale

ViterbiStatePredictor MR

skip.field.count

Number of fields to skip from the beginning of the beginning (1)

id.field.ordinal

Id field ordinal (0)

output.state.only

Set to true if only states need to be output (true)

sub.field.delim

Sub field delimiter (:)

hmm.model.path

HMM file HDFS path

AuerDeterministic MR

current.round.num

Current round number

det.algorithm

Auer deterministic algorithm. Choices are 1. AuerUBC1 (AuerUBC1)

count.ordinal

Count field ordinal

reward.ordinal

Reward field ordinal

group.item.count.path

HDFS path for file containing group batch size

GreedyRandomBandit MR

current.round.num

Current round number

random.selection.prob

Initial probability for random selection (0.5)

prob.reduction.algorithm

Random selection probability reduction algorithm. Choices are 1. linear 2. logLinear 3. AuerGreedy (linear)

prob.reduction.constant

Random selection probability reduction constant (1.0)

count.ordinal

Count field ordinal

reward.ordinal

Reward field ordinal

auer.greedy.constant

Auer greedy constant. Needed when prob.reduction.algorithm is AuerGreedy (5.0)

group.item.count.path

HDFS path for file containing group batch size

RandomFirstGreedyBandit MR

current.round.num

Current round number

exploration.count.strategy

Strategy for exploration counts. Choices are 1. simple 2. pac (simple)

exploration.count.factor

Exploration count factor. Needed when exploration.count.strategy is simple (2)

pac.reward.diff

Reward difference. Needed when exploration.count.strategy is pac (0.2)

pac.prob.diff

Probability difference. Needed when exploration.count.strategy is pac (0.2)

group.item.count.path

HDFS path for file containing group batch size

SoftMaxBandit MR

current.round.num

Current round number

temp.constant

Temperature constant (1.0)

count.ordinal

Item trial count ordinal

reward.ordinal

Item reward ordinal

group.item.count.path

HDFS path for file containing group batch size

ReinforcementLearnerTopology Storm Topology

spout.threads

Concurrency for spout (1)

bolt.threads

Concurrency for bolts (1)

num.workers

Number of worker processes (1)

RedisSpout Storm Spout

redis.server.host

Redis server host

redis.server.port

Redis server port

redis.event.queue

Redis event queue

redis.reward.queue

Redis reward queue

log.message.count.interval

Interval for log messages

ReinforcementLearnerBolt Storm Bolt

reinforcement.learner.type

Reinforcement learning algorithm

reinforcement.learrner.actions

List of coma separated actions by the learner

reinforcement.learrner.action.writer

Action output Redis queue