Prevent similar versions that only differ in build metadata from being published #6518

Turbo87 · 2023-05-23T14:14:01Z

Resolves #1059

cargo does not handle these situations particularly well and the SemVer specification is a little ambiguous about how to handle build metadata.

We still have roughly 600 problematic versions in the database, so we can't introduce a unique index yet, but the query appears to be fast enough according to my local testing:

EXPLAIN SELECT * FROM versions WHERE num = '1.0.0' AND crate_id = 463;

Index Scan using unique_num on versions  (cost=0.42..8.45 rows=1 width=190)
  Index Cond: ((crate_id = 463) AND ((num)::text = '1.0.0'::text))

vs.

EXPLAIN SELECT * FROM versions WHERE split_part(num, '+', 1) = '1.0.0' AND crate_id = 463;

Bitmap Heap Scan on versions  (cost=4.57..79.36 rows=1 width=190)
  Recheck Cond: (crate_id = 463)
"  Filter: (split_part((num)::text, '+'::text, 1) = '1.0.0'::text)"
  ->  Bitmap Index Scan on unique_num  (cost=0.00..4.57 rows=19 width=0)
        Index Cond: (crate_id = 463)

The cost is certainly higher, but the actual wall time for these queries appears to be roughly similar in the end. Since the publish endpoint is only receiving few requests per minute this should not result in any issues AFAICT.

…g published `cargo` does not handle these situations particularly well and the semver specification is a little ambiguous about how to handle build metadata. We still have roughly 600 problematic versions in the database, so we can't introduce a unique index yet.

LawnGnome

The implementation looks good. 👍

For what it's worth, playing around with this locally seemed to throw up some bigger deltas between the old query and the new one. With a cold cache, the old query executed in 12 ms for me, whereas it took 48 ms for the new query to execute. (I tested this with both the crate ID you used, and the one that has the most actual versions; the delta was similar in both cases.)

I can get the difference back below the noise floor by adding an index on the computed field, which I've pushed to 757dce5. My guess is that you're right that this PR isn't likely to cause problems anyway, but you could cherry pick that in if you want to add the index for more assurance.

Turbo87 · 2023-05-24T08:03:22Z

For what it's worth, playing around with this locally seemed to throw up some bigger deltas between the old query and the new one. With a cold cache, the old query executed in 12 ms for me, whereas it took 48 ms for the new query to execute.

yep, I remember getting similar numbers here, though for the publish endpoint I guess this would probably be acceptable?

I can get the difference back below the noise floor by adding an index on the computed field, which I've pushed to 757dce5.

As I mentioned in 757dce5#r114738341, the performance actually got worse for me locally when I added this index. If I add an index on the crate_id column and the computed split_part() result I got significantly better results.

Also, I am a bit scared about adding indices in migrations. Locally, adding the index took about three seconds, but I've been bitten by this in the past already, where in production it took a lot longer. Since migrations are currently running after shutdown of the old app version and startup of the new app version it would effectively cause multi-second downtime for us if we add the index in a migration. We can use CREATE INDEX CONCURRENTLY to avoid the table lock when creating the index, but that unfortunately also does not solve the migration issue. I'm happy to brainstorm on this issue separately :)

Turbo87 · 2023-05-30T10:14:42Z

Assuming that this was discussed in the team meeting last week (where I unfortunately couldn't attend) and since there are apparently no objections I will merge and deploy this change now. Since it does not involve any database migrations we can always revert this if we decide that this is the wrong step forward.

Turbo87 added 2 commits May 23, 2023 16:09

sql: Make split_part() available in Diesel queries

aa363b6

Turbo87 added C-bug 🐞 Category: unintended, undesired behavior A-backend ⚙️ labels May 23, 2023

Turbo87 requested a review from a team May 23, 2023 14:14

LawnGnome approved these changes May 23, 2023

View reviewed changes

Turbo87 merged commit 5c68d55 into rust-lang:master May 30, 2023
6 checks passed

Turbo87 deleted the build-metadata-block branch May 30, 2023 10:14

This was referenced May 30, 2023

Should publishes with versions only differing in metadata be allowed? #1059

Closed

Fix build metadata handling in SemVer versions #6451

Closed

MarijnS95 mentioned this pull request Jun 29, 2023

Checksum of yanked version causing crate download to fail rust-lang/cargo#11412

Closed

This was referenced Oct 5, 2023

If there's a version in the lock file only use that exact version rust-lang/cargo#12772

Merged

Do not call it "Downgrading" when difference is only build metadata rust-lang/cargo#12796

Merged

epage mentioned this pull request Nov 1, 2023

Cargo confuses checksums of crate versions with + in them rust-lang/cargo#7180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent similar versions that only differ in build metadata from being published #6518

Prevent similar versions that only differ in build metadata from being published #6518

Turbo87 commented May 23, 2023

LawnGnome left a comment

Turbo87 commented May 24, 2023

Turbo87 commented May 30, 2023

Prevent similar versions that only differ in build metadata from being published #6518

Prevent similar versions that only differ in build metadata from being published #6518

Conversation

Turbo87 commented May 23, 2023

LawnGnome left a comment

Choose a reason for hiding this comment

Turbo87 commented May 24, 2023

Turbo87 commented May 30, 2023