PUBDEV-4266: Adds support for converting word2vec model to a Frame #935

michalkurka · 2017-03-21T00:14:48Z

This PR introduces support for converting word2vec model into a Frame.

It uses algo-specific rapids and introduces a new rapids discovery mechanism. Rapids now support working with models

eg.:

fr <- .newExpr("word2vec.to.frame", w2v.model@model_id)

would convert a word2vec model to an H2O frame

mmalohlava

Michale, I like the idea - do you want to have it fully automatic via a dedicate JVM SPIs?

Or should AbstractRegister do that?

But i really like it, we were thinking about Rapids extensibility for long time, nobody tried that, you are the first one!

mmalohlava · 2017-03-21T17:44:06Z

h2o-algos/src/main/java/hex/api/RegisterAlgos.java


 public class RegisterAlgos extends water.api.AbstractRegister {
  // Register the algorithms and their builder handlers:
  @Override public void register(String relativeResourcePath) throws ClassNotFoundException {
+    // Register algorithm-specific Rapids
+    RapidsInit.registerAlgoRapids();


Could we do that via SPI automatically? Or do you have reason to register them explicitly?

mmalohlava · 2017-03-21T17:48:10Z

h2o-algos/src/main/java/water/rapids/RapidsInit.java

+public class RapidsInit {
+
+  public static void registerAlgoRapids() {
+    Env.init(new AstWord2VecToFrame());


Thinking about it more:

If we go through SPI way, we need some interface/abstract class here, and something like EnvContext, which would permit registration:

public class W2VRapidsInit implements RegisterRapids { @Override public void registerAlgoRapids(EnvContext envContext) { envContext.init(new AstWord2VecToFrame()); } }

where EnvContext is just dummy class:

class DummyEnvCOntext implements EnvContext { @Override public void init(AstOp astOp) { Env.init(astOp); // static call } }``` WDYT?

Thank you for the feedback. I made this just to have something quickly.

We can use SPI however we need to keep in mind that the Rapids depend on availability of the algo (which means the corresponding ModelBuilder needs to be registered).

I don't really like the way I implemented this. I think a little cleaner way is to let ModelBuilder to return the rapids (or factory for them) for the particular algorithm (and register them in the loop in RegisterAlgos).

yes, agree - we need to connect registration of rapids operations with registration of the algo.

mmalohlava · 2017-04-11T23:18:55Z

h2o-algos/src/main/java/hex/word2vec/Word2VecModel.java

+  }
+
+  private static class ConvertToFrameTask extends MRTask<ConvertToFrameTask> {
+    private Word2VecModel _model;


The model key should be enough no? and then pre-fetch the model in setupLocal.
(Note: i am not sure now, if there is optimization in serialization layer which replaces models by its key).

Furthermore, if you decide to go with Model then we can replace _model reference by null at map call (to avoid transfer of model back).

Silly mistake, thanks for catching that. Fixed.

mmalohlava · 2017-04-11T23:23:17Z

h2o-core/src/main/java/water/rapids/Env.java

@@ -288,6 +290,9 @@ static void init(AstPrimitive ast, String name) {
    init(new AstSeq());
    init(new AstSeqLen());

+    // Custom (eg. algo-specific)
+    for (AstPrimitive prim : PrimsService.INSTANCE.getAllPrims())
+      init(prim);


Nice, thanks Michal for that! btw: later i will need to document all extension points we are having (tracked here; https://0xdata.atlassian.net/browse/PUBDEV-4272)

mmalohlava · 2017-04-11T23:29:11Z

h2o-core/src/main/java/water/rapids/PrimsService.java

+ * PrimService manages access to non-core Rapid primitives.
+ * This includes algorithm specific rapids & 3rd party rapids.
+ */
+class PrimsService {


mmalohlava · 2017-04-11T23:33:02Z

h2o-py/h2o/model/word_embedding.py

+        :returns: a frame representing learned word embeddings.
+        """
+        return h2o.H2OFrame._expr(expr=ExprNode("word2vec.to.frame", self))


What will happened if the backend rapids is not available? Should be check for that at client level and report it? I am planning to expose capability API as part of modularization (https://0xdata.atlassian.net/browse/PUBDEV-4280)

Capability API makes sense, it should be applied on a larger unit, eg. for the whole word2vec algo. Rapid "word2vec.to.frame" should always be available if word2vec itself is available. I think we don't need to check in individual methods like in this one.

We could probably check just in the constructor of H2OWord2vecEstimator, what do you think?

Yup, agree, cap are high-level units.

We could probably check just in the constructor of H2OWord2vecEstimator, what do you think?
yes

michalkurka · 2017-04-12T15:42:48Z

@h2o-ops please test!

michalkurka · 2017-04-12T17:35:29Z

@h2o-ops please test

mmalohlava

👍

michalkurka · 2017-04-13T19:57:46Z

@h2o-ops please test

michalkurka · 2017-04-13T20:07:35Z

@h2o-ops please test!

michalkurka requested a review from mmalohlava March 21, 2017 00:14

mmalohlava reviewed Mar 21, 2017

View reviewed changes

PUBDEV-4266: Adds support for converting word2vec model to a Frame

63b5495

michalkurka force-pushed the michalk_algo-rapids branch from 5c5427c to 63b5495 Compare April 10, 2017 23:44

michalkurka changed the title ~~WIP/Proposal, a mechanism for supporting model-specific Rapids~~ PUBDEV-4266: Adds support for converting word2vec model to a Frame Apr 10, 2017

michalkurka assigned mmalohlava Apr 10, 2017

mmalohlava reviewed Apr 11, 2017

View reviewed changes

fix redundant model serialization

df79e2e

mmalohlava approved these changes Apr 12, 2017

View reviewed changes

michalkurka merged commit c7e2775 into master Apr 14, 2017

h2o-ops mentioned this pull request May 15, 2023

Add support for converting word2vec model to a Frame #11155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PUBDEV-4266: Adds support for converting word2vec model to a Frame #935

PUBDEV-4266: Adds support for converting word2vec model to a Frame #935

michalkurka commented Mar 21, 2017 •

edited

Loading

mmalohlava left a comment

mmalohlava Mar 21, 2017

mmalohlava Mar 21, 2017

michalkurka Mar 21, 2017 •

edited

Loading

mmalohlava Mar 21, 2017

mmalohlava Apr 11, 2017

michalkurka Apr 12, 2017

mmalohlava Apr 11, 2017

mmalohlava Apr 11, 2017

mmalohlava Apr 11, 2017

michalkurka Apr 12, 2017

mmalohlava Apr 12, 2017

michalkurka commented Apr 12, 2017

michalkurka commented Apr 12, 2017

mmalohlava left a comment

michalkurka commented Apr 13, 2017

michalkurka commented Apr 13, 2017

PUBDEV-4266: Adds support for converting word2vec model to a Frame #935

PUBDEV-4266: Adds support for converting word2vec model to a Frame #935

Conversation

michalkurka commented Mar 21, 2017 • edited Loading

mmalohlava left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michalkurka Mar 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michalkurka commented Apr 12, 2017

michalkurka commented Apr 12, 2017

mmalohlava left a comment

Choose a reason for hiding this comment

michalkurka commented Apr 13, 2017

michalkurka commented Apr 13, 2017

michalkurka commented Mar 21, 2017 •

edited

Loading

michalkurka Mar 21, 2017 •

edited

Loading