Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removes task_manager's sorting of documents by _id #54407

Closed
wants to merge 3 commits into from

Conversation

@pmuellr
Copy link
Contributor

pmuellr commented Jan 9, 2020

resolves #52517

Not sure if these fixes are completely kosher, specifically if there are any
down-stream effects.

Summary

Summarize your PR. If it involves visual changes include a screenshot or gif.

Checklist

Use strikethroughs to remove checklist items you don't feel are applicable to this PR.

For maintainers

resolves #52517

Not sure if these fixes are completely kosher, specifically if there are any
down-stream effects.
@elasticmachine

This comment has been minimized.

Copy link
Contributor

elasticmachine commented Jan 9, 2020

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr pmuellr requested review from gmmorris and mikecote Jan 9, 2020
@pmuellr

This comment has been minimized.

Copy link
Contributor Author

pmuellr commented Jan 9, 2020

Note. You can use the env var

KBN_ES_SNAPSHOT_USE_UNVERIFIED=1

when running FTS to test with an ES that by default does not allow _id field data.

If you run current master (not this PR) with that flag, you'll see failures in TM. With this branch, both FT and jest tests pass. The flag is not needed for jest tests, just FTS (not needed for FTR either).

@@ -234,7 +234,7 @@ describe('TaskStore', () => {

expect(args).toMatchObject({
body: {
sort: [{ 'task.taskType': 'desc' }, { _id: 'desc' }],

This comment has been minimized.

Copy link
@tylersmalley

tylersmalley Jan 9, 2020

Member

It looks like there is an updated_at. I wonder if that would work if sorting in a descending order is required.

{
    "_index" : ".kibana_task_manager_1",
    "_id" : "task:oss_telemetry-vis_telemetry",
    "_score" : 1.0,
    "_source" : {
        "migrationVersion" : {
            "task" : "7.6.0"
        },
        "task" : {
        "taskType" : "vis_telemetry",
        "retryAt" : null,
        "runAt" : "2020-01-10T08:00:00.000Z",
        "startedAt" : null,
        "state" : """{"runs":1}""",
        "params" : "{}",
        "ownerId" : null,
        "scheduledAt" : "2020-01-09T22:07:28.092Z",
        "attempts" : 1,
        "status" : "idle"
        },
        "references" : [ ],
        "updated_at" : "2020-01-09T22:07:31.278Z",
        "type" : "task"
    }
}

This comment has been minimized.

Copy link
@peterschretlen

peterschretlen Jan 9, 2020

Contributor

Looking at the docs for search_after (which is what is used for pagination)
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-search-after, it says

A field with one unique value per document should be used as the tiebreaker of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined and could lead to missing or duplicate results. The _id field has a unique value per document but it is not recommended to use it as a tiebreaker directly.

So it looks like we're using _id for this tiebreaking, which isn't recommended anyway because there are no doc_values.

The recommendation is to duplicate the content of the _id. Or we generate another unique value. Not sure something like updated_at would be close enough to unique.

This comment has been minimized.

Copy link
@pmuellr

pmuellr Jan 10, 2020

Author Contributor

Ya, looking at the mappings.json, I didn't see anything stable enough to use as an alt-stabilizer. updated_at would probably be pretty good, but it's not part of the TM mappings, I think part of the base SO mappings - I guess that's probably good enough.

I'm hoping that we're not really dependent on the ordering here, but if we are, updated_at might be a good replacement ...

@tylersmalley

This comment has been minimized.

Copy link
Member

tylersmalley commented Jan 10, 2020

@elasticmachine merge upstream

@pmuellr pmuellr added this to To-Do 7.6 in Make it Action Jan 10, 2020
@kibanamachine

This comment has been minimized.

Copy link

kibanamachine commented Jan 10, 2020

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pmuellr pmuellr marked this pull request as ready for review Jan 11, 2020
@@ -427,18 +427,13 @@ export class TaskStore {
}
}

// function changed to stop sorting by _id: https://github.com/elastic/kibana/issues/52517

This comment has been minimized.

Copy link
@peterschretlen

peterschretlen Jan 11, 2020

Contributor

I think we should leave a note here and on the fetch() method in task_manager.ts that pagination is likely not reliable because we do not have a unique value per document for use with search_after.

Pagination doesn't seem to be used at the moment, but we should leave hints for people that they shouldn't expect it to work if they try in the future. Maybe open & reference an issue, that we can address if/when pagination is ever needed (for example if we add a task management UI, which is a possibility).

This comment has been minimized.

Copy link
@gmmorris

gmmorris Jan 13, 2020

Contributor

I don't know what the original thinking here was, perhaps @tsullivan can shed some light, but as it seems to refer to pagination and we don't currently use pagination for anything in TM, it's probably best to just remove all references to pagination and if one day we need it, we can figure out how best to support it.

No?

},
SortByRunAtAndRetryAt
);
return SortByRunAtAndRetryAt;

This comment has been minimized.

Copy link
@gmmorris

gmmorris Jan 13, 2020

Contributor

The sort value that we set here based on the _id is to facilitate the run_now feature, not pagination, and isn't an actual sort by _id, so shouldn't be a problem with this breaking change in ES.

The idea was to place the Tasks that were fetched by run_now before tasks whose runAt has expired. Removing that means that if you have 100 expired tasks, the task you've asked to run_now will run after them and the client will end up waiting for them to complete before the task they're waiting for.

I think we can remove the sort by _id in fetch, but need to keep the sortByIdsThenByScheduling as is.

Copy link
Contributor

gmmorris left a comment

We should revert he change to sortByIdsThenByScheduling as it's needed for run_now (and in fact, I need to check why removing this didn't cause a test to fail), but we should be able to remove the sort in fetch

@tylersmalley

This comment has been minimized.

Copy link
Member

tylersmalley commented Jan 13, 2020

We sync'd this morning - Gidi and Patrick are going to reach out to the ES team to ensure there isn't another way we should be accomplishing this while still avoiding aggregating/sorting on _id. If there isn't, they will move forward with removing this from 8.0 to unblock the teams waiting for an updated ES while they work through a better solution.

@pmuellr

This comment has been minimized.

Copy link
Contributor Author

pmuellr commented Jan 15, 2020

superceded by #54765

@pmuellr pmuellr closed this Jan 15, 2020
@mikecote mikecote removed this from To-Do 7.6 in Make it Action Jan 15, 2020
@pmuellr pmuellr deleted the pmuellr:task-manager/no-sort-by-id branch Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.