GitHub - rcmccartney/random_forest: random forest in python

random_forest

This code is the implementation for a random forest in Python3. It uses three forms of randomization, which is at the heart of the strength of random forests. Randomization allows this algorithm to not overfit to the data as hundreds or thousands of stumps or trees are added, and allows it to build a complex, nonlinear representation of the underlying true distribution of the class variables.

Randomization

The first form of randomization used here is random subsampling of the data, or what is known as bagging. This is extremely useful when dealing with large datasets, and also helps the algorithm to learn.

The second form of randomization is random feature selection. That is, if the underlying data has hundreds or thousands of input features, at each level of the stump or tree only a random subset of all the features will be considered when deciding on an optimal split point.

The final form of randomization comes from random selection of the split points to use within a single feature. That is, once the random features are selected, a number of random split points (if using real valued data) can also be selected, and some metric (such as entropy) evaluated at those split points to determine if this will be the optimal split decision to select at this level of the tree, given the sampled dataset and random features under consideration.

Parallelization

Random forests are inherently parallelizable. They can be trained in parallel using different bags of data, different random features, and different random split points. This greatly reduces the amount of time required for training and testing the model. Python's multiprocessing.pool is used in order to parallelize the creation of trees in the forest.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
rand_forest		rand_forest
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
class design.png		class design.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

random_forest

Randomization

Parallelization

About

Uh oh!

Releases

Packages

Languages

rcmccartney/random_forest

Folders and files

Latest commit

History

Repository files navigation

random_forest

Randomization

Parallelization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages