Correct conversion of Spark model stages into MLeap local models #261

tovbinm · 2019-04-03T03:31:04Z

Related issues

Previously the conversion of Spark model stages into MLeap local models was done without providing all the necessary metadata which resulted in exceptions. For example when converting Spark StringIndexerModel into MLeap model it was expecting ml_attr metadata to be present in transformed Dataframe.
Reflecting apply method on MLeap models did not work correctly, since many models had more than one apply method present.

Describe the proposed solution

Provide transformed Dataframe with all the necessary metadata to allow MLeap local models creation.
Get rid of reflection for apply method on MLeap models and explicitly convert MLeap models into scoring methods.
Added tests

Describe alternatives you've considered
N/A

… models instead of reflection

codecov · 2019-04-03T18:07:35Z

Codecov Report

Merging #261 into master will decrease coverage by 0.3%.
The diff coverage is 29.62%.

@@            Coverage Diff             @@
##           master     #261      +/-   ##
==========================================
- Coverage   86.67%   86.36%   -0.31%     
==========================================
  Files         317      318       +1     
  Lines       10403    10447      +44     
  Branches      322      552     +230     
==========================================
+ Hits         9017     9023       +6     
- Misses       1386     1424      +38

Impacted Files	Coverage Δ
...om/salesforce/op/local/OpWorkflowRunnerLocal.scala	`100% <ø> (ø)`	⬆️
.../op/features/types/FeatureTypeSparkConverter.scala	`99.11% <100%> (+0.01%)`	⬆️
.../com/salesforce/op/local/MLeapModelConverter.scala	`5.12% <5.12%> (ø)`
...com/salesforce/op/local/OpWorkflowModelLocal.scala	`97.36% <92.3%> (-2.64%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 394b4bd...1a3d9fa. Read the comment docs.

leahmcguire · 2019-04-03T20:39:03Z

So now we have to spool up spark to use this sparkless scoring?? Could you get the necessary info another way? eg. serialize the dataframe schema along with the model?

leahmcguire · 2019-04-03T20:31:33Z

local/src/main/scala/com/salesforce/op/local/MLeapModelConverter.scala

+    case m: VectorSlicerModel => x => m.apply(x(0).asInstanceOf[Vector])
+    case m: WordLengthFilterModel => x => m.apply(x(0).asInstanceOf[Seq[String]])
+    case m: WordToVectorModel => x => m.apply(x(0).asInstanceOf[Seq[String]])
+    case m => throw new RuntimeException(s"Unsupported MLeap model: ${m.getClass.getName}")


so every wrapped spark stage has to be in this list? we should add that the the docs on wrapping...

I currently added all the stages from features package. We can also add models from classification, regression and recommendation packages, but we already have the first two of them covered as our own OpTransformer stages, so I did not see much of a point adding them.

so to your question - for right now I think we have everything covered, except recommenders, which I am planning to add once we are ready.

Can you add a todo with the classification and regression models? I dont know that this will be much use without them...

Adding those is very easy. the thing is we already have classification and regression models as OpTransformers so MLeap won’t be used to run them.

good point :-)

tovbinm · 2019-04-03T21:28:49Z

@leahmcguire we used to do before when loading spark stages. Now I just explicitly exposed an ability to users to control spark session lifecycle.

The only way around avoiding spark session is to export our models into MLeap format. Which is indeed a possibility and I am open to discuss it.

As of right now, local scoring assumes the model format as we have it now (i.e. json + parquet files).

… into mt/mleap-models

tovbinm added 3 commits April 1, 2019 22:56

MLeap fixes: 1) contruct dataframe with schema + meta 2) use explicit…

90f32a2

… models instead of reflection

Fixes

a77e53b

tests

05f0830

tovbinm requested a review from leahmcguire as a code owner April 3, 2019 03:31

tovbinm added the ready for review label Apr 3, 2019

tovbinm added 2 commits April 2, 2019 20:32

Update README.md

5067d5a

Merge branch 'master' into mt/mleap-models

9cd9973

forgot to commit a test

4b13d4d

tovbinm requested a review from wsuchy April 3, 2019 18:13

leahmcguire reviewed Apr 3, 2019

View reviewed changes

tovbinm and others added 4 commits April 3, 2019 14:38

Merge branch 'mt/mleap-models' of github.com:salesforce/TransmogrifAI…

31f4982

… into mt/mleap-models

Merge branch 'master' into mt/mleap-models

4811ebd

Update README.md

e5fc929

Update README.md

b03e76c

leahmcguire approved these changes Apr 5, 2019

View reviewed changes

tovbinm added 2 commits April 5, 2019 15:38

Merge branch 'master' into mt/mleap-models

f9c27e5

Merge branch 'master' into mt/mleap-models

1a3d9fa

tovbinm merged commit 3a1a2d8 into master Apr 6, 2019

tovbinm deleted the mt/mleap-models branch April 6, 2019 01:27

tovbinm mentioned this pull request Apr 10, 2019

Release 0.5.2 #277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct conversion of Spark model stages into MLeap local models #261

Correct conversion of Spark model stages into MLeap local models #261

tovbinm commented Apr 3, 2019 •

edited

Loading

codecov bot commented Apr 3, 2019 •

edited

Loading

leahmcguire commented Apr 3, 2019

leahmcguire Apr 3, 2019

tovbinm Apr 3, 2019

tovbinm Apr 3, 2019

leahmcguire Apr 5, 2019

tovbinm Apr 5, 2019

leahmcguire Apr 5, 2019

tovbinm commented Apr 3, 2019 •

edited

Loading

Correct conversion of Spark model stages into MLeap local models #261

Correct conversion of Spark model stages into MLeap local models #261

Conversation

tovbinm commented Apr 3, 2019 • edited Loading

codecov bot commented Apr 3, 2019 • edited Loading

Codecov Report

leahmcguire commented Apr 3, 2019

leahmcguire Apr 3, 2019

Choose a reason for hiding this comment

tovbinm Apr 3, 2019

Choose a reason for hiding this comment

tovbinm Apr 3, 2019

Choose a reason for hiding this comment

leahmcguire Apr 5, 2019

Choose a reason for hiding this comment

tovbinm Apr 5, 2019

Choose a reason for hiding this comment

leahmcguire Apr 5, 2019

Choose a reason for hiding this comment

tovbinm commented Apr 3, 2019 • edited Loading

tovbinm commented Apr 3, 2019 •

edited

Loading

codecov bot commented Apr 3, 2019 •

edited

Loading

tovbinm commented Apr 3, 2019 •

edited

Loading