Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rare race condition when rebloged status is deleted #17693

Merged

Conversation

ClearlyClaire
Copy link
Contributor

In some cases, it's possible for a reblog to be created and not marked as discarded while the underlying status is discarded, causing status.reblog? to be true yet status.reblog and status.proper to be nil.

Note that this does not retroactively fix such records nor does it fix the call sites that use status.reblog or status.proper.

@Gargron
Copy link
Member

Gargron commented Mar 9, 2022

Might it make sense to add a database constraint for this instead of a post-transaction check? Or would that not work because the status is not discarded in our transaction? Maybe a pessimistic lock on the original status would help?

@ClearlyClaire
Copy link
Contributor Author

ClearlyClaire commented Mar 9, 2022

I don't think we can write a database constraint that depends on another row than the one being created/updated.

As for a pessimistic lock, that's a good question. I think it would require using the lock in all call sites that create reblogs (2 currently, I think) and in most case would require an extra query anyway. I'm also not completely sure about the performance implications.

This solution is akin to an optimistic lock, although it does an extra query. I've checked how Rails does optimistic locking internally, and it does something smarter by using adding to the WHERE clause in the UPDATE it issues. That's of course only valid for updates, which is not our case here. Taking inspiration from this, the optimal solution might be something like using a subquery in the INSERT INTO but I don't think ActiveRecord would allow to do that without overriding some internal methods.

EDIT: To expand on that last remark, currently, reblogging does something like the following query:

INSERT INTO "statuses" ("created_at", "updated_at", "reblog_of_id", "conversation_id", "local", "account_id") VALUES ($1, $2, $3, $4, $5, $6) RETURNING "id"  [["created_at", "2022-03-09 12:16:28.957537"], ["updated_at", "2022-03-09 12:16:28.957537"], ["reblog_of_id", 107925936721658285], ["conversation_id", 244], ["local", true], ["account_id", 1]]

I think it would be ideal if we could do something like the following, but I do not think this is possible without heavy overriding of some low-level private Rails methods:

INSERT INTO "statuses" ("created_at", "updated_at", "reblog_of_id", "conversation_id", "local", "account_id", "discarded") SELECT ($1, $2, $3, $4, $5, $6, reblog.discarded) FROM statuses reblog WHERE reblog.id = $3 RETURNING "id"  [["created_at", "2022-03-09 12:16:28.957537"], ["updated_at", "2022-03-09 12:16:28.957537"], ["reblog_of_id", 107925936721658285], ["conversation_id", 244], ["local", true], ["account_id", 1]]

@ClearlyClaire ClearlyClaire force-pushed the fixes/deleted_at-race-condition branch 2 times, most recently from ac21a6c to 7216d63 Compare March 9, 2022 15:33
@ClearlyClaire ClearlyClaire force-pushed the fixes/deleted_at-race-condition branch from 92073b6 to ba8e933 Compare March 9, 2022 16:14
@ClearlyClaire
Copy link
Contributor Author

I have pushed a commit changing the approach entirely, by overriding the internal Rails method responsible for building the final SQL query on a create! or create call in order to use a INSERT INTO "statuses" … SELECT (…) FROM "statuses" WHERE … query instead of a INSERT INTO "statuses" … VALUES … query. The inserted values are the same and are passed into the query pretty much in the same way, but the WHERE clause ensures the reblogged status is not discarded.

I believe this approach has minimal overhead since it's a very simple query, SQL-wise and does not result in any additional query.

As for overriding the internal _insert_record class method, I have found no other way to change the SQL query resulting from a create! call, and defining another method to replace call! would essentially require the same work in addition to managing hooks and validations properly, so I think this is the simplest and most maintainable way to do that specific SQL trick, although I do understand it's maintainability is not ideal.

@Gargron Gargron merged commit 2a56a89 into mastodon:main Mar 9, 2022
ClearlyClaire added a commit to ClearlyClaire/mastodon that referenced this pull request Mar 9, 2022
ClearlyClaire added a commit to ClearlyClaire/mastodon that referenced this pull request Mar 9, 2022
Gargron pushed a commit that referenced this pull request Mar 9, 2022
Gargron pushed a commit that referenced this pull request Mar 9, 2022
@mjankowski
Copy link
Contributor

I have a rails 7 WIP branch going, and ran into this monkey patch.

The underlying rails code did change in 7, so we'll need to update the monkey patch. Before I dig into it, is the background here (and need for this patch) that you could have a timing issue and race condition where sometime after a new status is started to be created (and where it's reblogging another status) but before its written to DB, the original status being reblogged is discarded, in which case the newly saved status will be saved as though it's reblogged a status but that original status is now nil?

The naive thing here would be to add a rails model level validation so that during save the new status notices that the other one has been discarded and adds an error (and then stops the reblog service) so that doesn't happen.

Of course that doesn't handle the race condition perfectly, because that validation could pass just fine and then a moment later the other status gets discarded and there's not db-level constraints here enforcing the rule.

The next thing I can think of would be to not only add the validation, but also add an after_commit callback which reloads the referenced reblogged status, and then either destroys or also discards the just-created record running the callback. I think this would happen in advance of the reblog service proceeding as well, so you'd avoid having distributed the new status.

All that said ... I'd love for some guidance along either option of:

  • We tried everything here and this patch does need to exist, so we should update it for rails 7 compatibility
  • It's possible there's another approach and we could refactor this to lose the patch in advance of rails 7 update

In that latter case ... it'd be great to know which specs are relevant here. I'm guessing this linked one? https://github.com/mastodon/mastodon/pull/17732/files

@mjankowski mjankowski mentioned this pull request Mar 24, 2023
@mjankowski
Copy link
Contributor

@ClearlyClaire low priority, but would love to get your input on the question above if possible so I can prepare this for Rails 7 upgrade in either of the proposed directions, or something else.

@ClearlyClaire
Copy link
Contributor Author

The underlying rails code did change in 7, so we'll need to update the monkey patch. Before I dig into it, is the background here (and need for this patch) that you could have a timing issue and race condition where sometime after a new status is started to be created (and where it's reblogging another status) but before its written to DB, the original status being reblogged is discarded, in which case the newly saved status will be saved as though it's reblogged a status but that original status is now nil?

What happens is that there can be race conditions in which posts can be marked as discarded (that is, soft-deleted, the record is still here but with a non-nil deleted_at column) between the time it has been selected for being reblogged and the reblog actually gets written to the database, resulting in a “non-discarded” (deleted_at is nil) reblog of a “discarded” status (deleted_at is not nil).

For most of the codebase, a discarded status is handled like a deleted one, but while we have database constraints that forbid a reblog of a deleted status, we can't have such a database constraint for soft-deletions.

The naive thing here would be to add a rails model level validation so that during save the new status notices that the other one has been discarded and adds an error (and then stops the reblog service) so that doesn't happen.

This would at best narrow the race condition window but it would still be open.

The next thing I can think of would be to not only add the validation, but also add an after_commit callback which reloads the referenced reblogged status, and then either destroys or also discards the just-created record running the callback. I think this would happen in advance of the reblog service proceeding as well, so you'd avoid having distributed the new status.

I think this would work, and be easier to understand and significantly more maintainable. But that's also an extra query in a hot path, so I'd prefer keeping the current approach if it is not too difficult to port.

@ClearlyClaire
Copy link
Contributor Author

In that latter case ... it'd be great to know which specs are relevant here. I'm guessing this linked one? https://github.com/mastodon/mastodon/pull/17732/files

Yes, this is the relevant test case, although it might be a bit too loose if we end up changing the logic, as something like your earlier proposal of adding a Rails validation would pass the test but still have a race condition window open.

@mjankowski
Copy link
Contributor

I think this would work, and be easier to understand and significantly more maintainable. But that's also an extra query in a hot path, so I'd prefer keeping the current approach if it is not too difficult to port.

Sounds good, I'll take that approach in the Rails 7 PR branch.

I also see that you did a quick refactor and spec improvement there, which I'll update PR against as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants