Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/crates: Add version columns #4106

Closed
wants to merge 6 commits into from

Conversation

Turbo87
Copy link
Member

@Turbo87 Turbo87 commented Oct 29, 2021

A crate on our JSON API has the fields max_version, newest_version, and max_stable_version. These fields are currently calculated dynamically at the time of each request, and require us to load the list of all crate versions from the database.

The reverse dependencies query does something roughly like SELECT * FROM versions ORDER BY to_semver_no_prerelease(num) LIMIT 1, which ends up being essentially the same as max_stable_version, but this subquery is costing us quite a bit of performance.

This PR proposes to add three new columns to the crates table, one for each of these versions above. Instead of calculating the values on every request we will recalculate them whenever the list of versions changes. This is currently limited to three operations:

  • publishing a new version
  • yanking an existing version
  • unyanking an existing version

This PR also adds a fill-top-versions admin command that iterates through all the crates in the database and fills the columns with the correct values.

Since backfilling of the data is a requirement to adjust the API endpoints to the new columns, this part is not yet included in this PR.

Related:

@Turbo87 Turbo87 added A-backend ⚙️ C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear labels Oct 29, 2021
Copy link
Member

@pietroalbini pietroalbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also missing updating the latest version on crate deletion.

Comment on lines +1 to +2
ALTER TABLE crates ADD COLUMN highest_version VARCHAR;
ALTER TABLE crates ADD COLUMN highest_stable_version VARCHAR;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a VARCHAR? Aren't we storing the versions.id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also store the id, but the API returns the version number, not the ID, so we'd need another SELECT num FROM versions WHERE id = xxx query for the endpoints. it seemed easier to store the num directly and use that as a reference to the versions table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to store the ID for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why that would be more consistent, to be honest. we already have a uniqueness constraint on crate_id, num in the versions table, and as mentioned on Discord we can add a foreign key constraint to these new columns too, which would ensure that the referenced versions actually exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using version IDs as foreign keys in all the tables... I don't think an extra indexed query in the frontend is going to have much of an impact on performance.

@@ -0,0 +1,7 @@
ALTER TABLE crates ADD COLUMN highest_version VARCHAR;
ALTER TABLE crates ADD COLUMN highest_stable_version VARCHAR;
ALTER TABLE crates ADD COLUMN newest_version VARCHAR;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we're storing this? It should be possible to easily determine it with a query (while you can't do that with semver-related columns).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically just to avoid the unnecessary versions queries for the endpoints that return a crate. it's currently somewhat tightly coupled to the highest version fields through our use of the TopVersions struct.

Comment on lines +43 to +45
diesel::sql_query("ALTER TABLE crates DISABLE TRIGGER trigger_crates_set_updated_at")
.execute(&conn)
.context("Failed to disable the `set_updated_at` trigger")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this disabling the trigger globally or just for this transaction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good question. I was assuming that it would only disable it for the transaction, but I'm not sure. can't find anything in the docs about this.

Comment on lines +35 to +40
let crates: Vec<Crate> = Crate::all()
.order(crates::id.asc())
.limit(page_size)
.offset(page_size * (page - 1))
.load(&conn)
.context("Failed to load crates")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably easier and more performant here to do offset-pagination (WHERE id > last_id), but it shouldn't matter that much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, guess that could work too. it's running pretty fast already though, so probably not needed with our current number of crates.

Comment on lines +68 to +70
krate
.update_top_versions(&conn, &top_versions)
.context("Failed to update top versions in the database")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to perform the UPDATE only if the update is actually needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I guess that should be possible

@bors
Copy link
Contributor

bors commented Oct 30, 2021

☔ The latest upstream changes (presumably #4108) made this pull request unmergeable. Please resolve the merge conflicts.

@Turbo87 Turbo87 force-pushed the top-versions branch 3 times, most recently from 8d5c1d9 to 4c716f4 Compare November 2, 2021 21:14
@pietroalbini
Copy link
Member

Was that just a rebase or was feedback addressed @Turbo87?

@Turbo87
Copy link
Member Author

Turbo87 commented Nov 7, 2021

just rebases so far

@bors
Copy link
Contributor

bors commented Nov 13, 2021

☔ The latest upstream changes (presumably #4120) made this pull request unmergeable. Please resolve the merge conflicts.

@Turbo87 Turbo87 marked this pull request as draft November 16, 2021 08:44
@Turbo87
Copy link
Member Author

Turbo87 commented Apr 18, 2024

closing in favor of #8484

@Turbo87 Turbo87 closed this Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants