Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent similar versions that only differ in build metadata from being published #6518

Merged
merged 2 commits into from
May 30, 2023

Conversation

Turbo87
Copy link
Member

@Turbo87 Turbo87 commented May 23, 2023

Resolves #1059

cargo does not handle these situations particularly well and the SemVer specification is a little ambiguous about how to handle build metadata.

We still have roughly 600 problematic versions in the database, so we can't introduce a unique index yet, but the query appears to be fast enough according to my local testing:

EXPLAIN SELECT * FROM versions WHERE num = '1.0.0' AND crate_id = 463;
Index Scan using unique_num on versions  (cost=0.42..8.45 rows=1 width=190)
  Index Cond: ((crate_id = 463) AND ((num)::text = '1.0.0'::text))

vs.

EXPLAIN SELECT * FROM versions WHERE split_part(num, '+', 1) = '1.0.0' AND crate_id = 463;
Bitmap Heap Scan on versions  (cost=4.57..79.36 rows=1 width=190)
  Recheck Cond: (crate_id = 463)
"  Filter: (split_part((num)::text, '+'::text, 1) = '1.0.0'::text)"
  ->  Bitmap Index Scan on unique_num  (cost=0.00..4.57 rows=19 width=0)
        Index Cond: (crate_id = 463)

The cost is certainly higher, but the actual wall time for these queries appears to be roughly similar in the end. Since the publish endpoint is only receiving few requests per minute this should not result in any issues AFAICT.

…g published

`cargo` does not handle these situations particularly well and the semver specification is a little ambiguous about how to handle build metadata. We still have roughly 600 problematic versions in the database, so we can't introduce a unique index yet.
@Turbo87 Turbo87 added C-bug 🐞 Category: unintended, undesired behavior A-backend ⚙️ labels May 23, 2023
@Turbo87 Turbo87 requested a review from a team May 23, 2023 14:14
Copy link
Contributor

@LawnGnome LawnGnome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good. 👍

For what it's worth, playing around with this locally seemed to throw up some bigger deltas between the old query and the new one. With a cold cache, the old query executed in 12 ms for me, whereas it took 48 ms for the new query to execute. (I tested this with both the crate ID you used, and the one that has the most actual versions; the delta was similar in both cases.)

I can get the difference back below the noise floor by adding an index on the computed field, which I've pushed to 757dce5. My guess is that you're right that this PR isn't likely to cause problems anyway, but you could cherry pick that in if you want to add the index for more assurance.

@Turbo87
Copy link
Member Author

Turbo87 commented May 24, 2023

For what it's worth, playing around with this locally seemed to throw up some bigger deltas between the old query and the new one. With a cold cache, the old query executed in 12 ms for me, whereas it took 48 ms for the new query to execute.

yep, I remember getting similar numbers here, though for the publish endpoint I guess this would probably be acceptable?

I can get the difference back below the noise floor by adding an index on the computed field, which I've pushed to 757dce5.

As I mentioned in 757dce5#r114738341, the performance actually got worse for me locally when I added this index. If I add an index on the crate_id column and the computed split_part() result I got significantly better results.

Also, I am a bit scared about adding indices in migrations. Locally, adding the index took about three seconds, but I've been bitten by this in the past already, where in production it took a lot longer. Since migrations are currently running after shutdown of the old app version and startup of the new app version it would effectively cause multi-second downtime for us if we add the index in a migration. We can use CREATE INDEX CONCURRENTLY to avoid the table lock when creating the index, but that unfortunately also does not solve the migration issue. I'm happy to brainstorm on this issue separately :)

@Turbo87
Copy link
Member Author

Turbo87 commented May 30, 2023

Assuming that this was discussed in the team meeting last week (where I unfortunately couldn't attend) and since there are apparently no objections I will merge and deploy this change now. Since it does not involve any database migrations we can always revert this if we decide that this is the wrong step forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-bug 🐞 Category: unintended, undesired behavior
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Should publishes with versions only differing in metadata be allowed?
2 participants