-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework history updating #1788
Rework history updating #1788
Conversation
… times of contents and only stop checking when history has no more running jobs
Awesome start - thanks for working on this @carlfeberhard! |
Oh this will be awesome! |
This is ready(-ish) now and would be good as a prereq for #1374. |
@carlfeberhard Cool, I'll take a look today. |
Went ahead and played with it for a few minutes and it all looks good to me. Running a new workflow in a large history I see a significant reduction in data sent over the wire on update due to update_time, and everything still seems to work fine. There was one minor problem in the last merge/rebuild where the conflict wasn't resolved and left markers I'm committing the rebuilt client and merging now. |
@dannon Thanks! |
to fetch the contents data for only those contents (datasets and dataset_collections) that have been updated since the last time the contents were polled (using update_time). Hopefully, this is significantly more lightweight (difficult to say due to the cost of the union in the contents query w/o a production db). The method may not be reliable and needs some testing. There are alternative ways we might do this.
to capture all output datasets (finally restoring/improving on ye olde force_history_refresh tool parameter). The history model will now poll the ids of any running jobs associated with it and restart updating if it finds any.
(1) is repeated every four seconds until no contents are still considered unfinished, then (2) is checked and, if jobs are running start back at (1), if not, updates stop. Ideally, the two ajax calls would both be called every four seconds using a batch API call, but here they happen separately in the callback version of a nested loop.