Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .toPMML spark methods for MLlib into sparklyr #60

Closed
TaylorAndrew opened this issue Jul 1, 2016 · 19 comments
Closed

Add .toPMML spark methods for MLlib into sparklyr #60

TaylorAndrew opened this issue Jul 1, 2016 · 19 comments

Comments

@TaylorAndrew
Copy link

I'm trying export my models as PMML. In scala one would use

myModel.toPMML("./myModelPMML.xml")

I'm not currently seeing a way to do this, nor a way to 'inject' raw scala so that I could do it myself, similar to how I can use ft_sql_transformer() to submit raw sql to the SparkSQL database instead of using dplyr language.

Are either of these two options possible currently, or are they on the horizon?

@kevinushey
Copy link
Contributor

You should be able to do this 'by hand', with something like:

model <- ml_kmeans(<...>)
sparkapi::invoke(model$.model, "toPMML", "./myModelPMML.xml")

Note that sparkapi::invoke basically gives you access to the Scala API for any Spark object (represented as a proxy jobj on the R side) so this should get you going in the right direction.

Is this a common enough action that it would make sense to expose a function / API on the sparklyr side?

@TaylorAndrew
Copy link
Author

I'll keep playing with sparkapi::invoke() but at this moment I wasn't able to get it to work.

Instead of saving to xml, I just tried the simplest option, to print to the console.

I created a kmeans cluster model:

kmeans <- ml_kmeans(kmeans_tbl, 3)

sparkapi::invoke(kmeans$.model, ".toPMML") 

Should conceptually print the PMML model to the console, but instead I get:

 Warning in readLines(log) :
 cannot open file 'C:\Users\taylo\AppData\Local\Temp\Rtmp6hiGlI\filea7c7b2172b8_spark.log': Permission denied
 Show Traceback
 Error: failed to invoke spark command (unknown reason)

I'm sure many use cases will have people keep the model in spark and make new predictions there. In which case predict() works just fine interactively in R (or just using said model in an application that can hook up to spark). But for people who who need to export the model to be scored with ADAPA, easy access to the PMML via sparklyr would be great.

@kevinushey
Copy link
Contributor

Gotcha -- it sounds like we should think of implementing something like ml_model_save(..., format = "PMML") and ml_model_load(...), perhaps?

I'm less sure about the spark log problem -- perhaps that file is being creating by Spark, which is being running with elevated permissions / alternate permissions and hence isn't user accessible. @javierluraschi, does that sound correct?

@jjallaire
Copy link
Contributor

The log problem is a red herring (on Windows we can't currently access the
log because it's locked exclusively by the spark shell (we need to find a
workaround for this).

In terms of saving and loading models, I believe that Spark 2.0 now enables
you to save and load models to e.g. HDFS in some sort of native form so we
should be sure to support this alongside PPML.

On Thu, Jun 30, 2016 at 11:16 PM, Andrew Taylor notifications@github.com
wrote:

I'll keep playing with sparkapi::invoke() but at this moment I wasn't
able to get it to work.

Instead of saving to xml, I just tried the simplest option, to print to
the console.

I created a kmeans cluster model:

kmeans <- ml_kmeans(kmeans_tbl, 3)

sparkapi::invoke(kmeans$.model, ".toPMML")

Should conceptually print the PMML model to the console, but instead I get:

Warning in readLines(log) :
cannot open file 'C:\Users\taylo\AppData\Local\Temp\Rtmp6hiGlI\filea7c7b2172b8_spark.log': Permission denied
Show Traceback
Error: failed to invoke spark command (unknown reason)

I'm sure many use cases will have people keep the model in spark and make
new predictions there. In which case predict() works just fine
interactively in R (or just using said model in an application that can
hook up to spark). But for people who who need to export the model to be
scored with ADAPA, easy access to the PMML via sparklyr would be great.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAGXxxzNOJ4eY75CneFa8njSeoWBJ5i_ks5qRLDVgaJpZM4JCx_G
.

@javierluraschi
Copy link
Collaborator

@TaylorAndrew The call should not contain a . on it. Try this instead sparkapi::invoke(kmeans$.model, "toPMML").

@jjallaire
Copy link
Contributor

Javier, shouldn't our model objects implement the spark_jobj S3 method so
that accessing the .model property directly isn't required (i.e. just pass
the kmeans object directly to invoke?

On Fri, Jul 1, 2016 at 7:02 AM, Javier Luraschi notifications@github.com
wrote:

@TaylorAndrew https://github.com/TaylorAndrew The call should not
contain a . on it. Try this instead sparkapi::invoke(kmeans$.model,
"toPMML").


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAGXx5ROoK_TurU828iuuk9C5TjDsXF4ks5qRR3xgaJpZM4JCx_G
.

@javierluraschi
Copy link
Collaborator

@jjallaire yes, that would be a good addition. Opened: #61

@kevinushey
Copy link
Contributor

Hrm, to my surprise, even though this appears to be exported on the Scala side, it seems like we can't access it from the RBackendHandler. Perhaps because PMMLExportable is part of the so-called 'developer API'?

We might have to dig a bit more to make this work, unfortunately.

@kevinushey
Copy link
Contributor

Ahhh, I know what's going on. We use the ml package of algorithms for model fitting; not mllib. Unfortunately, the models returned by routines in the ml package do not implement the PMMLExportable interface, and so those methods are not available.

It looks like it's still possible to save these ML model objects; just not as PMML. :-/

@javierluraschi javierluraschi added this to the feature requests milestone Jul 19, 2016
@javierluraschi
Copy link
Collaborator

Marking as feature request since as Kevin mentions, we are based on the newer ml library that does not support this yet.

BTW. Wouldn't it be possible to get the predictive model and manually map the output into the pmml package?

@kevinushey
Copy link
Contributor

Maybe, but I would strongly prefer filing a feature request on the Spark side to add this, rather than trying to implement it ourselves.

@javierluraschi
Copy link
Collaborator

K, marking as feature request here as well.

@ElianoMarques
Copy link

This would be a good feature to add. Also the ability to save and load spark models, which is one of the main features in 2.0.

@ElianoMarques
Copy link

@kevinushey @javierluraschi perhaps the easiest way for this would be to leverage this:
https://github.com/jpmml/jpmml-converter
and the https://github.com/jpmml/jpmml-sparkml?

@javierluraschi javierluraschi removed this from the feature requests milestone May 10, 2017
@michalrudko
Copy link

@kevinushey are there any updates on exporting sparklyr models to pmml? is this already supported by the package?

@kevinushey
Copy link
Contributor

Unfortunately no. As far as I can see, these still haven't been ported from mllib to ml on the Spark side. See e.g.

https://issues.apache.org/jira/browse/SPARK-11171
https://issues.apache.org/jira/browse/SPARK-11237
https://issues.apache.org/jira/browse/SPARK-11239

As far as I can see, the work here has unfortunately fallen off the radar somewhat.

@kevinykuo
Copy link
Collaborator

@mrjoseph84 just curious what's your use case with pmml?

@michalrudko
Copy link

michalrudko commented Jun 23, 2017

@kevinushey Thanks for the update and links - I will be monitoring these issues
@kevinykuo We are developing sparklyr scripts in analytical sandpits. We are planning to use PMML as a way to export predictive models written in sparklyr to visualization tools like MicroStrategy, Shiny, D3, etc.

@kevinykuo
Copy link
Collaborator

Closing this since this is unlikely to be supported in Spark and we support saving/loading pipelines and models now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants