Add .toPMML spark methods for MLlib into sparklyr #60

TaylorAndrew · 2016-07-01T03:10:26Z

I'm trying export my models as PMML. In scala one would use

myModel.toPMML("./myModelPMML.xml")

I'm not currently seeing a way to do this, nor a way to 'inject' raw scala so that I could do it myself, similar to how I can use ft_sql_transformer() to submit raw sql to the SparkSQL database instead of using dplyr language.

Are either of these two options possible currently, or are they on the horizon?

The text was updated successfully, but these errors were encountered:

kevinushey · 2016-07-01T04:27:45Z

You should be able to do this 'by hand', with something like:

model <- ml_kmeans(<...>)
sparkapi::invoke(model$.model, "toPMML", "./myModelPMML.xml")

Note that sparkapi::invoke basically gives you access to the Scala API for any Spark object (represented as a proxy jobj on the R side) so this should get you going in the right direction.

Is this a common enough action that it would make sense to expose a function / API on the sparklyr side?

TaylorAndrew · 2016-07-01T06:16:53Z

I'll keep playing with sparkapi::invoke() but at this moment I wasn't able to get it to work.

Instead of saving to xml, I just tried the simplest option, to print to the console.

I created a kmeans cluster model:

kmeans <- ml_kmeans(kmeans_tbl, 3)

sparkapi::invoke(kmeans$.model, ".toPMML")

Should conceptually print the PMML model to the console, but instead I get:

 Warning in readLines(log) :
 cannot open file 'C:\Users\taylo\AppData\Local\Temp\Rtmp6hiGlI\filea7c7b2172b8_spark.log': Permission denied
 Show Traceback
 Error: failed to invoke spark command (unknown reason)

I'm sure many use cases will have people keep the model in spark and make new predictions there. In which case predict() works just fine interactively in R (or just using said model in an application that can hook up to spark). But for people who who need to export the model to be scored with ADAPA, easy access to the PMML via sparklyr would be great.

kevinushey · 2016-07-01T06:29:09Z

Gotcha -- it sounds like we should think of implementing something like ml_model_save(..., format = "PMML") and ml_model_load(...), perhaps?

I'm less sure about the spark log problem -- perhaps that file is being creating by Spark, which is being running with elevated permissions / alternate permissions and hence isn't user accessible. @javierluraschi, does that sound correct?

jjallaire · 2016-07-01T13:12:52Z

The log problem is a red herring (on Windows we can't currently access the
log because it's locked exclusively by the spark shell (we need to find a
workaround for this).

In terms of saving and loading models, I believe that Spark 2.0 now enables
you to save and load models to e.g. HDFS in some sort of native form so we
should be sure to support this alongside PPML.

On Thu, Jun 30, 2016 at 11:16 PM, Andrew Taylor notifications@github.com
wrote:

I'll keep playing with sparkapi::invoke() but at this moment I wasn't
able to get it to work.

Instead of saving to xml, I just tried the simplest option, to print to
the console.

I created a kmeans cluster model:

kmeans <- ml_kmeans(kmeans_tbl, 3)

sparkapi::invoke(kmeans$.model, ".toPMML")

Should conceptually print the PMML model to the console, but instead I get:

Warning in readLines(log) :
cannot open file 'C:\Users\taylo\AppData\Local\Temp\Rtmp6hiGlI\filea7c7b2172b8_spark.log': Permission denied
Show Traceback
Error: failed to invoke spark command (unknown reason)

I'm sure many use cases will have people keep the model in spark and make
new predictions there. In which case predict() works just fine
interactively in R (or just using said model in an application that can
hook up to spark). But for people who who need to export the model to be
scored with ADAPA, easy access to the PMML via sparklyr would be great.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAGXxxzNOJ4eY75CneFa8njSeoWBJ5i_ks5qRLDVgaJpZM4JCx_G
.

javierluraschi · 2016-07-01T14:02:25Z

@TaylorAndrew The call should not contain a . on it. Try this instead sparkapi::invoke(kmeans$.model, "toPMML").

jjallaire · 2016-07-01T14:04:13Z

Javier, shouldn't our model objects implement the spark_jobj S3 method so
that accessing the .model property directly isn't required (i.e. just pass
the kmeans object directly to invoke?

On Fri, Jul 1, 2016 at 7:02 AM, Javier Luraschi notifications@github.com
wrote:

@TaylorAndrew https://github.com/TaylorAndrew The call should not
contain a . on it. Try this instead sparkapi::invoke(kmeans$.model,
"toPMML").

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#60 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAGXx5ROoK_TurU828iuuk9C5TjDsXF4ks5qRR3xgaJpZM4JCx_G
.

javierluraschi · 2016-07-01T14:32:06Z

@jjallaire yes, that would be a good addition. Opened: #61

kevinushey · 2016-07-01T22:11:05Z

Hrm, to my surprise, even though this appears to be exported on the Scala side, it seems like we can't access it from the RBackendHandler. Perhaps because PMMLExportable is part of the so-called 'developer API'?

We might have to dig a bit more to make this work, unfortunately.

kevinushey · 2016-07-01T22:42:18Z

Ahhh, I know what's going on. We use the ml package of algorithms for model fitting; not mllib. Unfortunately, the models returned by routines in the ml package do not implement the PMMLExportable interface, and so those methods are not available.

It looks like it's still possible to save these ML model objects; just not as PMML. :-/

javierluraschi · 2016-07-19T16:40:48Z

Marking as feature request since as Kevin mentions, we are based on the newer ml library that does not support this yet.

BTW. Wouldn't it be possible to get the predictive model and manually map the output into the pmml package?

kevinushey · 2016-07-19T17:04:14Z

Maybe, but I would strongly prefer filing a feature request on the Spark side to add this, rather than trying to implement it ourselves.

javierluraschi · 2016-07-21T16:25:23Z

K, marking as feature request here as well.

ElianoMarques · 2016-07-28T19:37:50Z

This would be a good feature to add. Also the ability to save and load spark models, which is one of the main features in 2.0.

ElianoMarques · 2016-10-31T23:39:26Z

@kevinushey @javierluraschi perhaps the easiest way for this would be to leverage this:
https://github.com/jpmml/jpmml-converter
and the https://github.com/jpmml/jpmml-sparkml?

michalrudko · 2017-06-22T13:41:12Z

@kevinushey are there any updates on exporting sparklyr models to pmml? is this already supported by the package?

kevinushey · 2017-06-22T16:54:42Z

Unfortunately no. As far as I can see, these still haven't been ported from mllib to ml on the Spark side. See e.g.

https://issues.apache.org/jira/browse/SPARK-11171
https://issues.apache.org/jira/browse/SPARK-11237
https://issues.apache.org/jira/browse/SPARK-11239

As far as I can see, the work here has unfortunately fallen off the radar somewhat.

kevinykuo · 2017-06-22T20:51:48Z

@mrjoseph84 just curious what's your use case with pmml?

michalrudko · 2017-06-23T12:00:59Z

@kevinushey Thanks for the update and links - I will be monitoring these issues
@kevinykuo We are developing sparklyr scripts in analytical sandpits. We are planning to use PMML as a way to export predictive models written in sparklyr to visualization tools like MicroStrategy, Shiny, D3, etc.

kevinykuo · 2017-10-26T19:27:22Z

Closing this since this is unlikely to be supported in Spark and we support saving/loading pipelines and models now.

javierluraschi added the ml label Jul 1, 2016

javierluraschi added this to the feature requests milestone Jul 19, 2016

javierluraschi added the featurerequest label May 10, 2017

javierluraschi removed this from the feature requests milestone May 10, 2017

kevinykuo closed this as completed Oct 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add .toPMML spark methods for MLlib into sparklyr #60

Add .toPMML spark methods for MLlib into sparklyr #60

TaylorAndrew commented Jul 1, 2016

kevinushey commented Jul 1, 2016

TaylorAndrew commented Jul 1, 2016

kevinushey commented Jul 1, 2016

jjallaire commented Jul 1, 2016

javierluraschi commented Jul 1, 2016

jjallaire commented Jul 1, 2016

javierluraschi commented Jul 1, 2016

kevinushey commented Jul 1, 2016

kevinushey commented Jul 1, 2016

javierluraschi commented Jul 19, 2016

kevinushey commented Jul 19, 2016

javierluraschi commented Jul 21, 2016

ElianoMarques commented Jul 28, 2016

ElianoMarques commented Oct 31, 2016

michalrudko commented Jun 22, 2017

kevinushey commented Jun 22, 2017

kevinykuo commented Jun 22, 2017

michalrudko commented Jun 23, 2017 •

edited

kevinykuo commented Oct 26, 2017

Add .toPMML spark methods for MLlib into sparklyr #60

Add .toPMML spark methods for MLlib into sparklyr #60

Comments

TaylorAndrew commented Jul 1, 2016

kevinushey commented Jul 1, 2016

TaylorAndrew commented Jul 1, 2016

kevinushey commented Jul 1, 2016

jjallaire commented Jul 1, 2016

javierluraschi commented Jul 1, 2016

jjallaire commented Jul 1, 2016

javierluraschi commented Jul 1, 2016

kevinushey commented Jul 1, 2016

kevinushey commented Jul 1, 2016

javierluraschi commented Jul 19, 2016

kevinushey commented Jul 19, 2016

javierluraschi commented Jul 21, 2016

ElianoMarques commented Jul 28, 2016

ElianoMarques commented Oct 31, 2016

michalrudko commented Jun 22, 2017

kevinushey commented Jun 22, 2017

kevinykuo commented Jun 22, 2017

michalrudko commented Jun 23, 2017 • edited

kevinykuo commented Oct 26, 2017

michalrudko commented Jun 23, 2017 •

edited