Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
R
 
 
man
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

fastAdaboost Travis-CI Build Statusdownloads

fastAdaboost is a blazingly fast implementation of adaboost for R. It uses C++ code in the backend to provide an implementation of adaboost that is about 50 times faster than native R based libraries currently available. This is especially useful if your data size is large. fastAdaboost works only for binary classification tasks presently. It implements Freund and Schapire's Adaboost.M1 and Zhu et. al's SAMME.R (real adaboost) algorithms.

Install

Install from CRAN

install.packages("fastAdaboost")

Install from Github:

devtools::install_github("souravc83/fastAdaboost")

Quick Demo

library("fastAdaboost")
set.seed(9999)

num_each <- 1000
fakedata <- data.frame( X=c(rnorm(num_each,0,1),rnorm(num_each,1.5,1)), Y=c(rep(0,num_each),rep(1,num_each) ) )
fakedata$Y <- factor(fakedata$Y)
#run adaboost
test_adaboost <- adaboost(Y~X, fakedata, 10)
#print(A)
pred <- predict( test_adaboost, newdata=fakedata)
print(paste("Adaboost Error on fakedata:",pred$error))
#> [1] "Adaboost Error on fakedata: 0.1225"
print(table(pred$class,fakedata$Y))
#>    
#>       0   1
#>   0 848  93
#>   1 152 907

test_real_adaboost <- real_adaboost(Y~X, fakedata, 10)
pred_real <- predict(test_real_adaboost,newdata=fakedata)
print(paste("Real Adaboost Error on fakedata:", pred_real$error))
#> [1] "Real Adaboost Error on fakedata: 0.1105"
print(table(pred_real$class,fakedata$Y))
#>    
#>       0   1
#>   0 906 127
#>   1  94 873

Performance Benchmarking

How fast is fastAdaboost compared to native R implementations? I used the microbenchmark package to compare the running times of fastAdaboost with Adabag, which is one of the most popular native R based libraries which implements the Adaboost algorithm. The benchmarking indicates that fastAdaboost is about ~45-50 times faster than R based implementation. This is a huge benefit when data sizes are large.

library(microbenchmark)
library(adabag)
library(MASS)

#using fastAdaboost
data(bacteria)
print(
  microbenchmark
  ( 
    boost_obj <- adaboost(y~.,bacteria , 10),
    pred <- predict(boost_obj,bacteria) 
  )
  )
#> Unit: milliseconds
#>                                        expr      min       lq    mean
#>  boost_obj <- adaboost(y ~ ., bacteria, 10) 58.01665 58.69384 60.6658
#>        pred <- predict(boost_obj, bacteria) 26.91593 27.41415 29.5689
#>    median       uq      max neval cld
#>  59.20298 60.13180 74.54155   100   b
#>  27.91902 32.50484 37.58375   100  a

#using adabag
print(
  microbenchmark
  ( 
    adabag_obj <-boosting(y~.,bacteria,boos=F,mfinal=10),
    pred_adabag <- predict(adabag_obj, bacteria)
  )
  )
#> Unit: milliseconds
#>                                                            expr        min
#>  adabag_obj <- boosting(y ~ ., bacteria, boos = F, mfinal = 10) 2497.55208
#>                    pred_adabag <- predict(adabag_obj, bacteria)   34.50564
#>          lq       mean     median         uq       max neval cld
#>  2659.99737 2848.80065 2809.39769 2988.49017 3629.1527   100   b
#>    35.72336   45.21379   37.16913   42.22947  242.7932   100  a

About

A blazingly fast implementation of Adaboost in R, based on C++ backend

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.