RandomForest

Random Forest Implementation on Spark

Random forests have a flavour of bagging along with randomness. In a typical decision tree, the nodes are split using the best split among all variables. However, in the case of Random forests, each node is split using a subset of the total features available. Although it sounds absurd, turns out better in terms of accuracy as compared to many other classifiers. It is easier to understand in the sense that it requires only two parameters : the number of features to be used to create each subset of data and the number of trees in the forest.

PMML uses XML to define the data mining models(could be any model from Linear Regression, Decision Trees, Association rules, Naive Bayes, etc.. the list is pretty exhaustive). The structure of the XML is governed by an XML schema. In brief, a PMML document is an XML document with root element as PMML. Spark currently does not have any implementation to convert its machine learning models into PMML. We have implemented a framework to convert Random Forest model trained in Apache Spark into PMML

This Random Forest implementation makes use of the Apache Spark RDD's to store the chunk of data in memory which makes Spark so efficient in data processing. Apache Spark has has an existing implementation of Decision Trees and Regression Trees which we have made use of, to build our forests on. We have also implemented a utility program which converts the Spark trained Random Forest model into PMML which is a standard for converting machine learning models to XML format.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
SparkTest		SparkTest
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RandomForest

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RandomForest

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages