Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update persistent state document in the index the document belongs to #51751

Merged

Conversation

przemekwitek
Copy link
Contributor

Currently, the persistent state document is always indexed into the index pointed by an alias ".ml-state-write".
We plan this alias->index mapping to change over time (as we introduce rollover).
Therefore, we need to make sure we don't end up with two copies of the state document.
This PR achieves that by:

  • searching for the doc
  • either indexing it into the current write index or updating it in its current index

In order to know the document's id, bytes from C++ stream must be parsed (but only until the first new line character).

Relates #29938

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)


stateProcessor.process(stream);
Exception e = expectThrows(IllegalStateException.class, () -> stateProcessor.process(stream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a dangerous change. Previously if the C++ wrote nothing but whitespace it was silently ignored. Now it will cause the state processor to throw an exception and never process any subsequent state.

I think the behaviour should be changed back to what it was for the pure whitespace case. It's fine to throw an exception if a bulk metadata JSON object is invalid. But if it's not present at all then I think we should maintain the previous behaviour of treating it as a no-op. Otherwise a very careful audit of the C++ code will be required to find what situations it might write blank state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good finding, done.
Please note that I'm now skipping leading blank lines just in case the actual request body is present after those blank lines.

@przemekwitek przemekwitek force-pushed the ml_state_rollover_persistent_state branch from 889df4f to 9095369 Compare February 3, 2020 09:05
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please can you add some comments to make it clearer to future maintainers why we are doing this

@przemekwitek przemekwitek force-pushed the ml_state_rollover_persistent_state branch from 9095369 to 8a97833 Compare February 10, 2020 12:22
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants