Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA transformation included in the PMML provided by r2pmml #6

Open
dalpozz opened this issue Mar 10, 2016 · 3 comments
Open

PCA transformation included in the PMML provided by r2pmml #6

dalpozz opened this issue Mar 10, 2016 · 3 comments

Comments

@dalpozz
Copy link

dalpozz commented Mar 10, 2016

I'd like to apply PCA before training a classifier and include both PCA transformation and the classifier into the PMML using the r2pmml package.
There is already a R package called pmmlTransformations that does a similar job, but I see that this is already possible in the Python version "sklearn2pmml" so I was wondering if this feature will be available in the future for r2pmml.

@vruusmann
Copy link
Member

Sure, if your model has specific data pre-processing needs, then it would be desirable to have a way of including those into the PMML document.

The main problem is that R lacks proper abstractions in this area. So, every transformation has to be specified and implemented separately. In Python/Scikit-Learn you have everything collected nicely together into the sklearn.preprocessing package.

Can you provide example R code about using the PCA transformation in your workflow? The obvious candidate would be the preProcess function of the caret package.

@dalpozz
Copy link
Author

dalpozz commented Mar 11, 2016

Here is the R code

#load some toy data
library(unbalanced)
data("ubIonosphere")

#train with caret after applying PCA
library(caret)
fit <- caret::train(Class~ ., data=ubIonosphere, preProcess="pca", method = "rf", ntree = 200)

#save the model as PMML (including pre-processing method)
library(r2pmml)
r2pmml(fit, "fit.pmml")

@vruusmann
Copy link
Member

The r2pmml function takes an optional preProcess argument now.

Data pre-processing can also be done in standalone mode, it doesn't need to be coupled to the train function. For example:

library("caret")

data(iris)
iris.preProcess = preProcess(iris, method = c("range"))

r2pmml(.., preProcess = iris.preProcess, ..)

The current implementation supports range, scale, center and medianImpute transformations. Other transformations (including pca) should become available shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants