Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how type is stored in an enrich policy. #45789

Merged
merged 4 commits into from Aug 23, 2019

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Aug 21, 2019

A policy type controls how the enrich index is created and
the query executed against the match field. Currently there
is a single policy type (exact_match). In the near future
more policy types will be added and different policy may have
different configuration options.

For this reason type should be a json object instead of a string field:

{
   "exact_match": {
      ...
   }
}

instead of:

{
  "type": "exact_match",
  ...
}

This will make streaming parsing of enrich policies easier as in the
new format, the parsing code can know ahead what configuration fields
to expect. In the latter format that is not possible if the type field
appears not as the first field.

Relates to #32789

A policy type controls how the enrich index is created and
the query executed against the match field. Currently there
is a single policy type (`exact_match`). In the near future
more policy types will be added and different policy may have
different configuration options.

For this reason type should be a json object instead of a string field:

```
{
   "exact_match": {
      ...
   }
}
```

instead of:

```
{
  "type": "exact_match",
  ...
}
```

This will make streaming parsing of enrich policies easier as in the
new format, the parsing code can know ahead what configuration fields
to expect. In the latter format that is not possible if the type field
appears not as the first field.

Relates to elastic#32789
@martijnvg martijnvg added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Aug 21, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@@ -68,7 +70,24 @@ private static void declareParserOptions(ConstructingObjectParser<?, ?> parser)
}

public static EnrichPolicy fromXContent(XContentParser parser) throws IOException {
return PARSER.parse(parser, null);
Token token = parser.currentToken();
Copy link
Member Author

@martijnvg martijnvg Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy with this parsing code, but in ordere to use object parser, another class would need to be introduced for policy types and since there is currently only policy type, this seems overkill to me. I think the important thing here is that the policy format is future proof now.

@@ -268,16 +296,39 @@ public void writeTo(StreamOutput out) throws IOException {
@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
builder.startObject(policy.type);
{
builder.field(NAME.getPreferredName(), name);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should name appear outside the policy type object? I assumed not, because otherwise we have the name and policy type at the same level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like name should stay within the policy's definition. The object above this will only have a few fields that are considered valid: the existing policy types. On top of that, I think the previously decided direction is to keep other metadata about the policy (like the es version it was created under) contained in the definition part.

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/default-distro

1 similar comment
@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/default-distro

@martijnvg
Copy link
Member Author

@elasticmachine run elasticsearch-ci/bwc

"indices": "users",
"match_field": "email",
"enrich_fields": ["first_name", "last_name", "address", "city", "zip", "state"]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an example to includes the query ? (its fine if it is not part of the PR, but probably want it in there somewhere)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is not. It makes sense to emphasise that in a special section with the fact that reference data can be read from multiple indices.

@@ -129,6 +129,16 @@ public void testDeleteExistingPipeline() throws Exception {
assertOK(client().performRequest(getRequest));
}

public static String generatePolicySource(String index) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add a radom boolean to include a query field with a match all ? (I know the behavior is the same as the default but will help catch a regression since we rarely (ever?) test parsing the query field)

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM a couple suggestions but nothing to block the merging here.

@martijnvg martijnvg merged commit f14874c into elastic:enrich Aug 23, 2019
martijnvg added a commit that referenced this pull request Aug 23, 2019
A policy type controls how the enrich index is created and
the query executed against the match field. Currently there
is a single policy type (`exact_match`). In the near future
more policy types will be added and different policy may have
different configuration options.

For this reason type should be a json object instead of a string field:

```
{
   "exact_match": {
      ...
   }
}
```

instead of:

```
{
  "type": "exact_match",
  ...
}
```

This will make streaming parsing of enrich policies easier as in the
new format, the parsing code can know ahead what configuration fields
to expect. In the latter format that is not possible if the type field
appears not as the first field.

Relates to #32789
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants