Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issues when multiple patches happen for the same resources #1277

Conversation

rafaelweingartner
Copy link
Contributor

An issue was found when Ceilometer notification is stopped, and samples accumulate in the RabbitMQ. These samples might contain the same set of attributes that would trigger one revision. However, due to a concurrency issue, when Ceilometer notification is started back again, all of them are processed in almost the same moment, and then pushed back to Gnocchi API. Then, in Gnocchi API the detection to see if the resource needs a revision is executed outside of a semaphore control (e.g. read/write transaction with the DB with a table being locked). Therefore, the code detects that a revision is needed, and generated multiple revisions, which have the same content but different revision start and end.

This patches aims to resolve that situation. The solution adopted is to check again in the transaction (when the lock was acquired), before the revision is created, if the revision is really needed. It it is not needed, a log entry is generated and we stop the processing by returnign the resource data.

gnocchi/rest/api.py Outdated Show resolved Hide resolved
@tobias-urdin
Copy link
Contributor

This makes sense, it's probably very hard to test the race condition there, but do we have a test for the normal case of a new revision to verify this doesn't break normal behavior? (either unit or a gabbit)

Also please squash (cleanup) the commits into one (or more, whatever is preferred) but provide good commit messages.

"concurrency issue that might happen. Therefore, "
"no revision is going to be generated at this "
"time.", data_to_update)
create_revision = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it makes sense to just move the check revision logic to https://github.com/gnocchixyz/gnocchi/blob/master/gnocchi/indexer/sqlalchemy.py#L922-L923. that way it would be behind the lock created by for_update.

that said, now it's always locking and i've no idea the impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, it's probably very hard to test the race condition there, but do we have a test for the normal case of a new revision to verify this doesn't break normal behavior? (either unit or a gabbit)

Also please squash (cleanup) the commits into one (or more, whatever is preferred) but provide good commit messages.

Thanks for the review. I squashed the commits. Also, regarding tests, there are already gabbi tests that cover resource revisions, and all of them are passing just fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chungg I agree with you. Ideally, I Would have done that. However, I implemented this way to be less intrusive as possible in the code base.

I can move the decision to generate revision to the update_resource method only, if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, I'll let @chungg respond and approve if he's OK with it.

An issue was found when Ceilometer notification is stopped, and samples accumulate in the RabbitMQ. These samples might contain the same set of attributes that would trigger one revision. However, due to a concurrency issue, when Ceilometer notification is started back again, all of them are processed in almost the same moment, and then pushed back to Gnocchi API. Then, in Gnocchi API the detection to see if the resource needs a revision is executed outside of a semaphore control (e.g. read/write transaction with the DB with a table being locked). Therefore, the code detects that a revision is needed, and generated multiple revisions, which have the same content but different revision start and end.

This patches aims to resolve that situation. The solution adopted is to check again in the transaction (when the lock was acquired), before the revision is created, if the revision is really needed. It it is not needed, a log entry is generated and we stop the processing by returnign the resource data.
@rafaelweingartner rafaelweingartner force-pushed the fix-concurrency-issue-gnocchi-api-patch branch from 4c360af to 52ff6f2 Compare October 17, 2022 19:34
@rafaelweingartner
Copy link
Contributor Author

Hello guys,
Is there something missing here?

@mergify mergify bot merged commit 63773fd into gnocchixyz:master Jan 2, 2023
@rafaelweingartner
Copy link
Contributor Author

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants