Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

README.md

RF-BigDataCS: A cost-sensitive approach for Random Forest Partial MapReduce algorithm to deal with Imbalanced Big Data.

Inspired by the Mahout Random Forest Partial implementation we build a new Random Forest version that can be used to classify imbalanced big data. The Mahout Partial implementation (RF-BigData) is an algorithm that builds multiple trees for different portions of the data. This algorithm is divided into two different phases: the first phase is devoted to the creation of the model and the second phase is dedicated to the estimation of the classes associated with the dataset using the previous learned model. In the first phase, the Random Forest is built from the original training set following a MapReduce procedure. This process is illustrated in Figure 1 and consists of three steps: Initial, Map and Final. When the building of the forest is finished, the classification phase is initiated to estimate the class associated to a data sample set. This process is illustrated in Figure 2 and consists of three steps: Initial, Map and Final. To adapt the cost-sensitive learning based version of Random Forest to the Mahout environment, we need to include the cost-sensitive operations into the basic Random Forest implementation.

Figure 1: A flowchart of how the building of the Random Forest is organized in the RF-BigDataCS algorithm.

Figure 2: A flowchart of how the classifying step is organized in the RF-BigDataCS algorithm.

References

S. Río, V. López, J.M. Benítez, F. Herrera. On the use of MapReduce for Imbalanced Big Data using Random Forest. Information Sciences 285 (2014) 112-137. doi: 10.1016/j.ins.2014.03.043 link to pdf file

About

RF-BigDataCS: A cost-sensitive approach for Random Forest MapReduce algorithm to deal with Imbalanced Big Data.

Resources

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.