Concurrency issues when multiple patches happen for the same resources #1277

rafaelweingartner · 2022-10-14T12:39:55Z

An issue was found when Ceilometer notification is stopped, and samples accumulate in the RabbitMQ. These samples might contain the same set of attributes that would trigger one revision. However, due to a concurrency issue, when Ceilometer notification is started back again, all of them are processed in almost the same moment, and then pushed back to Gnocchi API. Then, in Gnocchi API the detection to see if the resource needs a revision is executed outside of a semaphore control (e.g. read/write transaction with the DB with a table being locked). Therefore, the code detects that a revision is needed, and generated multiple revisions, which have the same content but different revision start and end.

This patches aims to resolve that situation. The solution adopted is to check again in the transaction (when the lock was acquired), before the revision is created, if the revision is really needed. It it is not needed, a log entry is generated and we stop the processing by returnign the resource data.

gnocchi/rest/api.py

tobias-urdin · 2022-10-16T21:06:20Z

This makes sense, it's probably very hard to test the race condition there, but do we have a test for the normal case of a new revision to verify this doesn't break normal behavior? (either unit or a gabbit)

Also please squash (cleanup) the commits into one (or more, whatever is preferred) but provide good commit messages.

chungg · 2022-10-17T19:19:17Z

gnocchi/indexer/sqlalchemy.py

+                             "concurrency issue that might happen. Therefore, "
+                             "no revision is going to be generated at this "
+                             "time.", data_to_update)
+                    create_revision = False


i wonder if it makes sense to just move the check revision logic to https://github.com/gnocchixyz/gnocchi/blob/master/gnocchi/indexer/sqlalchemy.py#L922-L923. that way it would be behind the lock created by for_update.

that said, now it's always locking and i've no idea the impact.

This makes sense, it's probably very hard to test the race condition there, but do we have a test for the normal case of a new revision to verify this doesn't break normal behavior? (either unit or a gabbit)

Also please squash (cleanup) the commits into one (or more, whatever is preferred) but provide good commit messages.

Thanks for the review. I squashed the commits. Also, regarding tests, there are already gabbi tests that cover resource revisions, and all of them are passing just fine.

@chungg I agree with you. Ideally, I Would have done that. However, I implemented this way to be less intrusive as possible in the code base.

I can move the decision to generate revision to the update_resource method only, if you prefer.

I'm fine with this, I'll let @chungg respond and approve if he's OK with it.

An issue was found when Ceilometer notification is stopped, and samples accumulate in the RabbitMQ. These samples might contain the same set of attributes that would trigger one revision. However, due to a concurrency issue, when Ceilometer notification is started back again, all of them are processed in almost the same moment, and then pushed back to Gnocchi API. Then, in Gnocchi API the detection to see if the resource needs a revision is executed outside of a semaphore control (e.g. read/write transaction with the DB with a table being locked). Therefore, the code detects that a revision is needed, and generated multiple revisions, which have the same content but different revision start and end. This patches aims to resolve that situation. The solution adopted is to check again in the transaction (when the lock was acquired), before the revision is created, if the revision is really needed. It it is not needed, a log entry is generated and we stop the processing by returnign the resource data.

rafaelweingartner · 2022-12-26T14:35:43Z

Hello guys,
Is there something missing here?

rafaelweingartner · 2023-01-02T13:42:13Z

Thanks!!

pedro-martins reviewed Oct 14, 2022

View reviewed changes

gnocchi/rest/api.py Outdated Show resolved Hide resolved

chungg reviewed Oct 17, 2022

View reviewed changes

rafaelweingartner force-pushed the fix-concurrency-issue-gnocchi-api-patch branch from 4c360af to 52ff6f2 Compare October 17, 2022 19:34

tobias-urdin approved these changes Jan 2, 2023

View reviewed changes

mergify bot merged commit 63773fd into gnocchixyz:master Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency issues when multiple patches happen for the same resources #1277

Concurrency issues when multiple patches happen for the same resources #1277

rafaelweingartner commented Oct 14, 2022

tobias-urdin commented Oct 16, 2022

chungg Oct 17, 2022

rafaelweingartner Oct 17, 2022

rafaelweingartner Oct 17, 2022

tobias-urdin Oct 20, 2022

rafaelweingartner commented Dec 26, 2022

rafaelweingartner commented Jan 2, 2023

Concurrency issues when multiple patches happen for the same resources #1277

Concurrency issues when multiple patches happen for the same resources #1277

Conversation

rafaelweingartner commented Oct 14, 2022

tobias-urdin commented Oct 16, 2022

chungg Oct 17, 2022

Choose a reason for hiding this comment

rafaelweingartner Oct 17, 2022

Choose a reason for hiding this comment

rafaelweingartner Oct 17, 2022

Choose a reason for hiding this comment

tobias-urdin Oct 20, 2022

Choose a reason for hiding this comment

rafaelweingartner commented Dec 26, 2022

rafaelweingartner commented Jan 2, 2023