New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] add new multi custom processor for data frame analytics and model inference #67362
[ML] add new multi custom processor for data frame analytics and model inference #67362
Conversation
Pinging @elastic/ml-core (:ml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A few comments to get through.
...rest-high-level/src/main/java/org/elasticsearch/client/ml/inference/preprocessing/Multi.java
Show resolved
Hide resolved
...rest-high-level/src/main/java/org/elasticsearch/client/ml/inference/preprocessing/Multi.java
Outdated
Show resolved
Hide resolved
...high-level/src/test/java/org/elasticsearch/client/ml/inference/preprocessing/NGramTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java
Outdated
Show resolved
Hide resolved
(p, c, n) -> lenient ? | ||
p.namedObject(LenientlyParsedPreProcessor.class, n, PreProcessor.PreProcessorParseContext.DEFAULT) : | ||
p.namedObject(StrictlyParsedPreProcessor.class, n, PreProcessor.PreProcessorParseContext.DEFAULT), | ||
(multiBuilder) -> multiBuilder.setOrdered(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why we need to setOrdered
. I think I also do not understand what orderedModeCallback
is all about, it seems to me it's just a way to set some additional property on the created objects. And I find it strange that the other version of declareNamedObjects
that doesn't take an orderedModeCallback
passes in one that throws if the value is an array given it's declareNamedObjects
plural. Not really this PR's issue but I would like to clear this up. Could you help us with this @nik9000 please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitris-athanasiou a user can declare named objects in two ways:
- a hashmap keyed by the named object names (unordered)
- an array of objects (ordered)
If I did not pass this call back (and subsequently check that it was called in the builder), the user could pass an unordered hashmap, which we don't allow.
Example of unordered :
{
"multi_encoding": {
"processors":
{
"ngram_encoding" : {
"field": "foo",
"feature_prefix": "f",
"n_grams": [1],
"length": 2
},
"one_hot_encoding": {
"field": "f.11", //OUTPUT from earlier ngram_encoding
"hot_map": {"a": "col_a"}
}
}
}
^ This is a valid named objects collection, but obviously, its order is not guaranteed. We need to throw in this case.
As for making the declareNamedObjects array parsing more clear, that is probably best for another PR.
...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/preprocessing/Multi.java
Outdated
Show resolved
Hide resolved
...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/preprocessing/Multi.java
Outdated
Show resolved
Hide resolved
...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/preprocessing/Multi.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…l inference (elastic#67362) This adds the multi custom feature processor to data frame analytics and inference. The `multi_encoding` processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.
…l inference (elastic#67362) This adds the multi custom feature processor to data frame analytics and inference. The `multi_encoding` processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.
This adds the
multi
custom feature processor to data frame analytics and inference.The
multi
processor allows custom processors to be chained together and use the outputs from one processor as the inputs to another.Example:
This definition has the required input fields of ["foo", "some_additional_doc_field"] and has the output fields of ["f.12", "col_a", "col_cat"].
f.12
is included as it is an output ofngram_encoding
but is not used by any other processor in the array.