[ML] add new multi custom processor for data frame analytics and model inference #67362

benwtrent · 2021-01-12T16:01:36Z

This adds the multi custom feature processor to data frame analytics and inference.

The multi processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.
Example:

{
    "multi_encoding": {
        "processors": [
            {
                "ngram_encoding" : {
                    "field": "foo",
                    "feature_prefix": "f",
                    "n_grams": [1],
                    "length": 2   
                }
            },
            {
                "one_hot_encoding": {
                    "field": "f.11", //OUTPUT from earlier ngram_encoding
                    "hot_map": {"a": "col_a"}
                }
            },
            {
                "one_hot_encoding": {
                    "field": "some_additional_doc_field", // Some other outside field
                    "hot_map": {"cat": "col_cat"}
                }
            }
        ]
    }
}

This definition has the required input fields of ["foo", "some_additional_doc_field"] and has the output fields of ["f.12", "col_a", "col_cat"]. f.12 is included as it is an output of ngram_encoding but is not used by any other processor in the array.

elasticmachine · 2021-01-12T16:01:38Z

Pinging @elastic/ml-core (:ml)

dimitris-athanasiou

Looks good! A few comments to get through.

...rest-high-level/src/main/java/org/elasticsearch/client/ml/inference/preprocessing/Multi.java

...high-level/src/test/java/org/elasticsearch/client/ml/inference/preprocessing/NGramTests.java

server/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java

dimitris-athanasiou · 2021-01-15T11:31:14Z

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/preprocessing/Multi.java

+            (p, c, n) -> lenient ?
+                p.namedObject(LenientlyParsedPreProcessor.class, n, PreProcessor.PreProcessorParseContext.DEFAULT) :
+                p.namedObject(StrictlyParsedPreProcessor.class, n, PreProcessor.PreProcessorParseContext.DEFAULT),
+            (multiBuilder) -> multiBuilder.setOrdered(true),


I do not understand why we need to setOrdered. I think I also do not understand what orderedModeCallback is all about, it seems to me it's just a way to set some additional property on the created objects. And I find it strange that the other version of declareNamedObjects that doesn't take an orderedModeCallback passes in one that throws if the value is an array given it's declareNamedObjects plural. Not really this PR's issue but I would like to clear this up. Could you help us with this @nik9000 please?

@dimitris-athanasiou a user can declare named objects in two ways:

a hashmap keyed by the named object names (unordered)

an array of objects (ordered)

If I did not pass this call back (and subsequently check that it was called in the builder), the user could pass an unordered hashmap, which we don't allow.

Example of unordered :

{ "multi_encoding": { "processors": { "ngram_encoding" : { "field": "foo", "feature_prefix": "f", "n_grams": [1], "length": 2 }, "one_hot_encoding": { "field": "f.11", //OUTPUT from earlier ngram_encoding "hot_map": {"a": "col_a"} } } }

^ This is a valid named objects collection, but obviously, its order is not guaranteed. We need to throw in this case.

As for making the declareNamedObjects array parsing more clear, that is probably best for another PR.

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/preprocessing/Multi.java

dimitris-athanasiou

LGTM

…l inference (elastic#67362) This adds the multi custom feature processor to data frame analytics and inference. The `multi_encoding` processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.

…d model inference (#67362) (#67595) This adds the multi custom feature processor to data frame analytics and inference. The `multi_encoding` processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.

…l inference (elastic#67362) This adds the multi custom feature processor to data frame analytics and inference. The `multi_encoding` processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.

[ML] add new multi custom processor for data frame analytics

e7d9c28

benwtrent added >enhancement :ml Machine learning v8.0.0 v7.12.0 labels Jan 12, 2021

fixing ngram test generation

ed276fb

dimitris-athanasiou self-requested a review January 13, 2021 08:04

dimitris-athanasiou reviewed Jan 15, 2021

View reviewed changes

benwtrent added 2 commits January 15, 2021 08:23

Merge branch 'master' into feature/ml-inference-multi-pre-processor

0eebe80

addressing PR comments and updating tests

a691ca2

benwtrent requested a review from dimitris-athanasiou January 15, 2021 15:05

dimitris-athanasiou approved these changes Jan 15, 2021

View reviewed changes

benwtrent merged commit cb34ca6 into elastic:master Jan 15, 2021

benwtrent deleted the feature/ml-inference-multi-pre-processor branch January 15, 2021 17:43

benwtrent mentioned this pull request Jan 15, 2021

[7.x] [ML] add new multi custom processor for data frame analytics and model inference (#67362) #67595

Merged

stevejgordon mentioned this pull request Feb 22, 2021

7.12.0 Meta Ticket elastic/elasticsearch-net#5337

Closed

34 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] add new multi custom processor for data frame analytics and model inference #67362

[ML] add new multi custom processor for data frame analytics and model inference #67362

benwtrent commented Jan 12, 2021

elasticmachine commented Jan 12, 2021

dimitris-athanasiou left a comment

dimitris-athanasiou Jan 15, 2021

benwtrent Jan 15, 2021

dimitris-athanasiou left a comment

[ML] add new multi custom processor for data frame analytics and model inference #67362

[ML] add new multi custom processor for data frame analytics and model inference #67362

Conversation

benwtrent commented Jan 12, 2021

elasticmachine commented Jan 12, 2021

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

dimitris-athanasiou Jan 15, 2021

Choose a reason for hiding this comment

benwtrent Jan 15, 2021

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment