Skip to content

DATAUP-679: Update job to queued directly after submit #430

Merged
MrCreosote merged 3 commits intodevelopfrom
dev-unbatch_queued_update
Dec 20, 2021
Merged

DATAUP-679: Update job to queued directly after submit #430
MrCreosote merged 3 commits intodevelopfrom
dev-unbatch_queued_update

Conversation

@MrCreosote
Copy link
Copy Markdown
Member

Description of PR purpose/changes

If many jobs are submitted in a batch, the jobs can start running before
being updated to queued, which means the jobs never get a queued timestamp.

Jira Ticket / Github Issue

  • Added the Jira Ticket to the title of the PR e.g. (DATAUP-69 Adds a PR template)

Testing Instructions

  • Details for how to test the PR:
  • Tests pass in Github Actions and locally
  • Changes available by spinning up a local test suite

Dev Checklist:

  • My code follows the guidelines at https://sites.google.com/truss.works/kbasetruss/data-upload-project/development
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • [n/a] I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • [n/a] Any dependent changes have been merged and published in downstream modules
  • I have run Black and Flake8 on changed Python Code manually or with git precommit (and the Github Actions build passes)

Updating Version and Release Notes (if applicable)

If many jobs are submitted in a batch, the jobs can start running before
being updated to queued, which means the jobs never get a queued timestamp.
@MrCreosote MrCreosote requested a review from bio-boris December 20, 2021 04:48
@MrCreosote
Copy link
Copy Markdown
Member Author

MrCreosote commented Dec 20, 2021

I realize from our discussion on the prior PR that there is an improvement that could be made re Kafka queuing - all the kafka messages get sent after all the jobs are queued (which is no different than before). This could be improved by sending the message right after each job, which reduces the chance the job will switch to running prior to sending the message. If that happens the queued message won't get sent based on the current code.

Again this is no worse than before.

Edit: alternate approach is to record whether there was an update or not for each job (looking at modified_count in the result) and only sent the message for those that were modified. That removes the race condition

@MrCreosote
Copy link
Copy Markdown
Member Author

Another thing we could do is switch https://github.com/kbase/execution_engine2/blob/develop/lib/execution_engine2/sdk/EE2Runjob.py#L989 to use the new mongo method. If we also do the fix above we could probably use the same method for the mongo update & kafka message for both batch and normal jobs.

@MrCreosote
Copy link
Copy Markdown
Member Author

Another another thing I could do is delete the batch queued method in MongoUtil if you want, since it's now unused.

Copy link
Copy Markdown
Collaborator

@bio-boris bio-boris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MrCreosote MrCreosote merged commit 18408b2 into develop Dec 20, 2021
@MrCreosote MrCreosote deleted the dev-unbatch_queued_update branch December 20, 2021 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants