Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race on task dispatch #5071

Closed
mdellweg opened this issue Feb 20, 2024 · 2 comments · Fixed by #5072
Closed

Possible race on task dispatch #5071

mdellweg opened this issue Feb 20, 2024 · 2 comments · Fixed by #5072
Labels

Comments

@mdellweg
Copy link
Member

We identified that there may be a possible race when dispatching new tasks.
With two processes adding a new task they become visible to the workers after finishing their transaction, but the created_at fields may actually report a different order. This can lead to two workers picking up conflicting tasks concurrently.
This may be even worse on distributed systems where you can have a clockscrew on top.

We believe the window to already be very small, so any attempt to fix is should be like a vacuum seal.

Current working idea: Wrap adding a task to the table in a transaction-based-exclusive-wait-for advisory-lock. Check that the current latest task is actually from the past and add the new task afterwards.

@dralley
Copy link
Contributor

dralley commented Feb 20, 2024

Wild suggestion here, perhaps not even something we can do currently:

UUIDv7 embeds a timestamp and is designed to have a monotonically increasing unique value. Theoretically we could work with the PK instead of raw timestamps (so long as no ancient tasks utilizing UUIDv4 PKs remain) to obtain a complete table ordering. Postgresql has functions for generating UUIDs (but not UUIDv7 currently, you need an extension for it), so if [0] we were able to utilize those for sourcing the PK then we could mostly [1] eliminate clock skew as an issue, as only the database timestamp would matter

[0] we currently "expect" the PK to have been generated in Python prior to any attempts to save IIRC
[1] assuming the database isn't itself split across a blue/green deployment, and/or nobody is messing with the clocks on those systems. I'm not sure how time would be handled in that scenario.

@mdellweg
Copy link
Member Author

Wild suggestion here, perhaps not even something we can do currently:

UUIDv7 embeds a timestamp and is designed to have a monotonically increasing unique value. Theoretically we could work with the PK instead of raw timestamps (so long as no ancient tasks utilizing UUIDv4 PKs remain) to obtain a complete table ordering. Postgresql has functions for generating UUIDs (but not UUIDv7 currently, you need an extension for it), so if [0] we were able to utilize those for sourcing the PK then we could mostly [1] eliminate clock skew as an issue, as only the database timestamp would matter

[0] we currently "expect" the PK to have been generated in Python prior to any attempts to save IIRC [1] assuming the database isn't itself split across a blue/green deployment, and/or nobody is messing with the clocks on those systems. I'm not sure how time would be handled in that scenario.

I'm not entirely convinced this would solve the issue, which is basically that there is a time gap between creating the task and committing the transaction it is in. And while once all tasks are committed (after say 5 s) we do have a definite order, by that time we'd hope some worker would have started on it already.

mdellweg added a commit to mdellweg/pulpcore that referenced this issue Feb 21, 2024
This will acquire a transaction based exclusive advisory lock before
creating a new task object in the database and check the actual validity
of the new tasks "pulp_created" field.

fixes pulp#5071
mdellweg added a commit to mdellweg/pulpcore that referenced this issue Feb 21, 2024
This will acquire a transaction based exclusive advisory lock before
creating a new task object in the database and check the actual validity
of the new tasks "pulp_created" field.

fixes pulp#5071
mdellweg added a commit to mdellweg/pulpcore that referenced this issue Feb 21, 2024
This will acquire a transaction based exclusive advisory lock before
creating a new task object in the database and check the actual validity
of the new tasks "pulp_created" field.

fixes pulp#5071
mdellweg added a commit that referenced this issue Feb 26, 2024
This will acquire a transaction based exclusive advisory lock before
creating a new task object in the database and check the actual validity
of the new tasks "pulp_created" field.

fixes #5071
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants