Set up the groundwork for a port to Diesel #589

sgrif · 2017-03-04T18:41:52Z

This PR changes the users#show endpoint to use Diesel instead of
rust-postgres, (and adds a test for that endpoint). I chose that
endpoint as it seemed likely to touch as few things as possible, but
enough to require setting up Diesel connection pooling, and testing.

I have also ported over the migrations to use Diesel's migration
infrastructure. The migration files (except the final migration) were
generated programatically. Any non-reversible migrations were not dumped
as these contained procedural data updates which aren't relevant to
setting up a new database.

cargo run --bin migrate will move all the entries in the old
migrations table into the one used by Diesel. That function is brittle
since it relies on Diesel internals. However, it really only matters for
existing contributors (and one deploy to Heroku), so when that function
breaks we can just delete it.

I've added an additional migration to make the schema compatible with
infer_schema!, which doesn't support tables that have no primary key.
I'm not using infer_schema! just yet, as it doesn't work with non-core
types and there are some tsvector columns.

I re-ordered the columns on the User struct, as diesel emits an
explicit select clause and then fetches columns by index rather than by
name. This means that the order has to match how it was defined to
Diesel (which in the case of infer_schema! will be the database
definition order). If we don't want this restriction, we can replace the
infer_schema! call with manual table! calls (which can be
automatically generated by diesel print-schema), and have the columns
ordered there to match the struct.

Differences to note

The rust-postgres connection is set to require TLS when the env var
HEROKU is set. In Diesel, this would be to add ?sslmode=require onto
the database URL. I'm unsure if heroku-postgres URLs already have this
or not, so I have not added any explicit code for that. It's a question
that should be answered before this is deployed.

Additionally, I chose not to automatically wrap each request in a
transaction. It's unneccessary most of the time, and seems like it was
only done to make testing easier. Since Diesel has explicit support for
"run tests in a transaction that never commits", we don't need that
here.

sgrif · 2017-03-04T20:36:50Z

So just glancing through where the codebase is at now, a few notes in terms of feature support.

Diesel 0.11 has an API for the cross join used when the q parameter is present in crates#index, but it's going away in 0.12, and I'm not sure I'll have the replacement in that version
There's a lot of many-to-many associations. We don't not support them, it's just a few lines of boilerplate instead of something like #[belongs_to(Crate, through="crates_categories")]
Whatever is going on here isn't something we can write one-to-one with the query builder. We may be able to restructure the query to something that can be written in the query builder.
OMG CAROL STAHP

None of this would block finishing the port, but we will have a few places where we'll have to fall back to raw SQL. And I probably have 2 more releases of Diesel worth of work to do to support all this without raw SQL. :P

carols10cents · 2017-03-04T23:05:44Z

I'm fine with having to fall back to raw sql sometimes. THEY'RE GOOD QUERIES, SGROF

carols10cents · 2017-03-04T23:14:53Z

Diesel 0.11 has an API for the cross join used when the q parameter is present in crates#index, but it's going away in 0.12, and I'm not sure I'll have the replacement in that version

but but https://github.com/rust-lang/crates.io/blob/master/src/krate.rs#L497

I'd like to refactor that part, one because it could use some more structure and two to eventually support things like searching with a query within a keyword. I'll think about how we could rework this.

sgrif · 2017-03-04T23:29:49Z

Yes, everything except the q branch factors out nicely. Even that branch fits nicely in 0.11. I need to better flesh out the feature though which is why I killed it.

carols10cents · 2017-03-05T16:08:36Z

In Diesel, this would be to add ?sslmode=require onto
the database URL. I'm unsure if heroku-postgres URLs already have this
or not, so I have not added any explicit code for that.

The DATABASE_URL env var set by heroku does not have ?sslmode=require, and from what I can tell, we shouldn't need it when querying the database from within heroku. Their docs say to add ?sslmode=require in application code, which sounds like what you're saying. We have a staging heroku instance where we can try this out, although we had some issue with the prod instance that we didn't have with staging..... i forget exactly what though....

Additionally, I chose not to automatically wrap each request in a
transaction. It's unneccessary most of the time, and seems like it was
only done to make testing easier. Since Diesel has explicit support for
"run tests in a transaction that never commits", we don't need that
here.

There's at least one place I know crates.io depends on this-- when uploading a new version, if the git updates or s3 uploads fail, we roll back the database additions. By "it's unnecessary most of the time", do you mean that Diesel has a way to wrap those requests where it is necessary in a transaction?

@alexcrichton, do you know of any other places besides new crate and tests where we're relying on each request being wrapped in a transaction?

sgrif · 2017-03-05T16:22:56Z

Yes. We can wrap in a transaction where it's needed (this PR does that in NewUser#create_or_update). Anywhere that I see >1 query where at least one has side effects I assume it needs a transaction.

sgrif · 2017-03-06T13:55:29Z

I've updated to require SSL on Heroku. (Probably not even required, since the default value is prefer which will use SSL if the server supports it, but who am I to argue with their docs)

alexcrichton · 2017-03-06T17:31:42Z

@alexcrichton, do you know of any other places besides new crate and tests where we're relying on each request being wrapped in a transaction?

Hm. Well so I wrote the whole web app on this assumption, so we may want to leave it for now :). In general there's lots of transactional points such as:

Crate is inserted into db
Crate is uploaded to S3
Crate is uploaded to github

We currently rely on database transactions so if github fails the insertion is rolled back. Same for tons of API calls and such, the database is updated willy-nilly in the assumption that a failure down the line will roll back updates.

Also yeah I believe SSL is required on Heroku nowadays (I found it hard to find docs on that...). The ironic part is that although SSL is required they're not using valid certificates, so it's still MitM happy all over the place. I emailed them awhile back about this and they said yeah that's what most database drivers do, ignore certs. ;_;

sgrif · 2017-03-06T17:38:36Z

To be clear, I have no issue with wrapping everything in a transaction, I just would rather do it in the requests that perform updates than around every request implicitly. Since our transactions are lexically scoped, it's difficult to establish lazily if we need to wrap each request.

sgrif · 2017-03-06T17:40:32Z

Returning an error would still rollback the transaction, we just wouldn't wrap it in endpoints like crates#summary which perform only reads.

alexcrichton · 2017-03-06T17:52:21Z

Ah yeah so if errors still roll back transactions then that's fine by me, it was the only intention of the setup. (I thought that was standard "write a web app" practice, right?)

Unfortunately crates.io doesn't have a lexical scope for the transaction, you can see that with the bit of unsafe code around managing the postgres transaction (I think it's still there at least...)

Also FWIW one speed bump I hit with Heroku + Diesel in the past was that Diesel needed a db connection at compile-time but w/ Heroku you don't have that (only at runtime). Is that #[derive] just not needed though?

sgrif · 2017-03-06T17:56:09Z

I thought that was standard "write a web app" practice, right?

Errors rolling back definitely is. I haven't seen "wrap all requests in a transaction" too often though.

Unfortunately crates.io doesn't have a lexical scope for the transaction, you can see that with the bit of unsafe code around managing the postgres transaction (I think it's still there at least...)

We can do it in a middleware, but it would require eagerly acquiring the database connection. And yeah, we'd have to do something similar to what you're doing with the postgres transaction (and rely on Diesel internals) to avoid that.

Also FWIW one speed bump I hit with Heroku + Diesel in the past was that Diesel needed a db connection at compile-time but w/ Heroku you don't have that (only at runtime)

infer_schema! is the only thing that needs a database connection. We have a CLI command that you can use instead (diesel print-schema) -- I will update this to not require a connection at compile time.

sgrif · 2017-03-06T17:56:28Z

Also I thought you could get a DB connection at build time with Heroku now? Maybe I'm mistaken

sgrif · 2017-03-06T18:02:20Z

Asked my friend who works at Heroku.

db connections are available at build time, however if it's the first push for a rails app a database may not be provisioned yet

sgrif · 2017-03-06T18:04:49Z

@alexcrichton So we can use infer_schema! if we want, or we can choose not to. I don't have a strong opinion one way or the other. Not using it just means there's one file that has to get updated when the schema changes (even if it's generated with a command)

alexcrichton · 2017-03-06T18:13:20Z

Oh interesting, how do you roll back without a transaction? (I thought transactions were how that was done).

Acquiring a db connection eagerly is probably ok, we can always investigate other routes if it turns out to be a problem.

Ah yeah and that makes sense (first deploy not having a db). I'm fine either way w/ crates.io, but it does mean that redeployment may be difficult if we ever try to do it...

sgrif · 2017-03-06T18:15:52Z

Oh interesting, how do you roll back without a transaction? (I thought transactions were how that was done).

Sorry to be more clear, you do use a transaction, just one that's explicit (or implicit at a lower level. e.g. Rails wraps every .save call in a transaction, so anything that happens in callbacks is already wrapped. If you are doing foo.save && bar.save in a controller action, you would want to wrap that in ActiveRecord::Base.transaction do)

Acquiring a db connection eagerly is probably ok, we can always investigate other routes if it turns out to be a problem.

It was a problem in tests, but I can dig in and see if I can figure out why.

Ah yeah and that makes sense (first deploy not having a db). I'm fine either way w/ crates.io, but it does mean that redeployment may be difficult if we ever try to do it...

I'll go with the manual file for now, since getting that TsVector column to generate properly from infer_schema requires a feature that's in 0.12.

sgrif · 2017-03-06T18:19:18Z

I've rebased and updated to not connect to the database at compile time.

alexcrichton · 2017-03-06T18:20:07Z

👍

sgrif · 2017-03-06T18:28:31Z

Ah, I've figured out what the issue with tests was. I'll go ahead and update this to retain the auto transaction wrapping for now. I'll continue to explicitly wrap anything that looks like it should be in a transaction though, so we can easily go to a lazy connection if we decide it's necessary.

sgrif · 2017-03-06T18:31:17Z

Actually I think that's best left for a separate PR, since this one is quite large already. Are there any other concerns I need to address?

This PR changes the `users#show` endpoint to use Diesel instead of rust-postgres, (and adds a test for that endpoint). I chose that endpoint as it seemed likely to touch as few things as possible, but enough to require setting up Diesel connection pooling, and testing. I have also ported over the migrations to use Diesel's migration infrastructure. The migration files (except the final migration) were generated programatically. Any non-reversible migrations were not dumped as these contained procedural data updates which aren't relevant to setting up a new database. `cargo run --bin migrate` will move all the entries in the old migrations table into the one used by Diesel. That function is brittle since it relies on Diesel internals. However, it really only matters for existing contributors (and one deploy to Heroku), so when that function breaks we can just delete it. I've added an additional migration to make the schema compatible with `infer_schema!`, which doesn't support tables that have no primary key. I'm not using `infer_schema!` just yet, as it doesn't work with non-core types and there are some tsvector columns. I re-ordered the columns on the `User` struct, as diesel emits an explicit select clause and then fetches columns by index rather than by name. This means that the order has to match how it was defined to Diesel (which in the case of `infer_schema!` will be the database definition order). If we don't want this restriction, we can replace the `infer_schema!` call with manual `table!` calls (which can be automatically generated by `diesel print-schema`), and have the columns ordered there to match the struct. Differences to note ------------------- The rust-postgres connection is set to require TLS when the env var `HEROKU` is set. In Diesel, this would be to add `?sslmode=require` onto the database URL. I'm unsure if heroku-postgres URLs already have this or not, so I have not added any explicit code for that. It's a question that should be answered before this is deployed. Additionally, I chose not to automatically wrap each request in a transaction. It's unneccessary most of the time, and seems like it was only done to make testing easier. Since Diesel has explicit support for "run tests in a transaction that never commits", we don't need that here.

This isn't strictly necessary, but inferring the `crates` table requires a feature that Diesel won't have until 0.12, and this has the fewest unanswered questions.

carols10cents · 2017-03-07T16:30:49Z

cargo run --bin migrate will move all the entries in the old
migrations table into the one used by Diesel. That function is brittle
since it relies on Diesel internals. However, it really only matters for
existing contributors (and one deploy to Heroku), so when that function
breaks we can just delete it.

So I ran cargo run --bin migrate locally and it just says The migratebinary is no longer used. Usediesel migration runanddiesel migration revert instead.... is that what you expected to happen?

Also could you update the README with new setup instructions? Based on the travis changes, it looks like people need to cargo install diesel_cli and then run diesel migration run instead of migrate?

carols10cents · 2017-03-07T16:32:37Z

Looks like it needs to be cargo install diesel_cli --debug --no-default-features --features postgres, cargo install diesel_cli failed for me because of something about mysql

sgrif · 2017-03-07T16:33:46Z

Yeah, by default it installs with all the supported backends so if you don't have mysql and sqlite installed you'll want --no-default-features --features postgres.

sgrif · 2017-03-07T16:36:09Z

is that what you expected to happen?

Yes. It also migrated your old migrations table over to the Diesel one.

carols10cents · 2017-03-07T16:42:08Z

oops, found another spot-- could you update the Procfile too?

I've left the original migration binary in there for now, since we need to migrate the database to use Diesel's migration infrastructure on the first deploy. After this commit has been deployed it can be removed.

carols10cents · 2017-03-07T16:43:51Z

src/db.rs

 pub type Config = r2d2::Config<pg::Connection, r2d2_postgres::Error>;
 type PooledConnnection = r2d2::PooledConnection<PCM>;
+pub type DieselPool = r2d2::Pool<ConnectionManager<PgConnection>>;
+type DieselPooledConn = r2d2::PooledConnection<ConnectionManager<PgConnection>>;


carols10cents · 2017-03-07T16:54:21Z

src/schema.rs

@@ -0,0 +1,171 @@
+// This file can be regenerated with `diesel print-schema`


YAYYYY A SCHEMA!!!!!!

carols10cents · 2017-03-07T16:54:39Z

src/user/mod.rs

    pub api_token: String,
+    pub gh_login: String,
+    pub name: Option<String>,
+    pub avatar: Option<String>,


Wait, Queryable cares about the order of the fields on User but it doesn't care that the field here is named avatar but the field in the db is named gh_avatar?

Correct. We will generate an explicit select clause so we know the order that the fields come back in, and then access them by order. This is because we can really only represent rows as a tuple, we can't express "has a field called name that is of this type" in the type system. We also don't assume that your Queryable structs are one-to-one with a database table. That said, I'm looking at ways to optionally allow by-name instead of by-index.

Ah, ok, gotcha.

carols10cents · 2017-03-07T16:58:36Z

Seems to work locally, just have that one question about gh_avatar tho... but let's try deploying to staging!!

carols10cents · 2017-03-07T17:02:17Z

wait how does heroku know it needs diesel-cli?

sgrif · 2017-03-07T17:02:40Z

It doesn't. I'm making a buildpack now.

carols10cents · 2017-03-07T17:03:20Z

lolololol

carols10cents · 2017-03-07T17:31:12Z

.travis.yml

 env:
  global:
+    - DATABASE_URL=postgres://postgres:@localhost/cargo_registry_test
    - TEST_DATABASE_URL=postgres://postgres:@localhost/cargo_registry_test


oh so if travis is using DATABASE_URL now, do we need to set TEST_DATABASE_URL anymore?

DATABASE_URL is just used by diesel for migrations by convention. The tests still run using TEST_DATABASE_URL

I was also using DATABASE_URL for compilation before, but since we don't need that now I can just specify --database-url instead

You mean just for diesel database setup, not migrations, right? because it looks like src/tests/all.rs takes care of the migrations and uses TEST_DATABASE_URL? I just want to make sure I understand :)

Yes. diesel database setup runs migrations as well.

We'll want to drop `schema_migrations` sometime soonish, but for now let's make sure that everything just works. We don't want to incur downtime if we have to roll back.

R2D2 tries to immediately populate the pool, and we're running out of connections when it does so. Since Diesel is only used on one endpoint, we don't need 10 connections.

carols10cents · 2017-03-07T19:27:16Z

omg it's working on staging, hold onto your butts

sgrif force-pushed the sg-port-for-real-this-time branch from 1689966 to 0296ec4 Compare March 4, 2017 19:04

sgrif force-pushed the sg-port-for-real-this-time branch from 8a61fc2 to d2cfce3 Compare March 6, 2017 14:00

sgrif mentioned this pull request Mar 6, 2017

canonicalize crate name before checking if it's reserved #591

Merged

sgrif force-pushed the sg-port-for-real-this-time branch from d2cfce3 to a3decb3 Compare March 6, 2017 18:19

sgrif added 4 commits March 7, 2017 11:01

Attempt to get travis passing

39c7a18

Require SSL on Heroku

034ed98

Don't rely on a database connection at compile time

f5f3d85

This isn't strictly necessary, but inferring the `crates` table requires a feature that Diesel won't have until 0.12, and this has the fewest unanswered questions.

sgrif force-pushed the sg-port-for-real-this-time branch from a3decb3 to f5f3d85 Compare March 7, 2017 16:03

Update the README to specify how to run migrations

b090a86

Run Diesel migrations in the Procfile

5082c78

I've left the original migration binary in there for now, since we need to migrate the database to use Diesel's migration infrastructure on the first deploy. After this commit has been deployed it can be removed.

carols10cents reviewed Mar 7, 2017

View reviewed changes

src/schema.rs

@@ -0,0 +1,171 @@

// This file can be regenerated with `diesel print-schema`

Copy link

Member

carols10cents Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAYYYY A SCHEMA!!!!!!

carols10cents reviewed Mar 7, 2017

View reviewed changes

sgrif added 2 commits March 7, 2017 13:09

Add diesel-cli to the buildpacks, don't drop schema_migrations

525baa3

We'll want to drop `schema_migrations` sometime soonish, but for now let's make sure that everything just works. We don't want to incur downtime if we have to roll back.

One query = no transaction needed

203555f

sgrif force-pushed the sg-port-for-real-this-time branch from 8a245a9 to 203555f Compare March 7, 2017 18:17

sgrif added 2 commits March 7, 2017 13:53

Decrease the Diesel pool size to 1

c35dd07

R2D2 tries to immediately populate the pool, and we're running out of connections when it does so. Since Diesel is only used on one endpoint, we don't need 10 connections.

Specify bin

0cdc64e

carols10cents merged commit b0a7a8d into rust-lang:master Mar 7, 2017

sgrif deleted the sg-port-for-real-this-time branch March 7, 2017 19:29

carols10cents mentioned this pull request May 2, 2017

Remove migration crate that has been replaced with diesel #707

Merged

		@@ -0,0 +1,171 @@
		// This file can be regenerated with `diesel print-schema`

Set up the groundwork for a port to Diesel #589

Set up the groundwork for a port to Diesel #589

Uh oh!

Conversation

sgrif commented Mar 4, 2017

Differences to note

Uh oh!

sgrif commented Mar 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carols10cents commented Mar 4, 2017

Uh oh!

carols10cents commented Mar 4, 2017

Uh oh!

sgrif commented Mar 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carols10cents commented Mar 5, 2017

Uh oh!

sgrif commented Mar 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgrif commented Mar 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

alexcrichton commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

alexcrichton commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

alexcrichton commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017

Uh oh!

sgrif commented Mar 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carols10cents commented Mar 7, 2017

Uh oh!

carols10cents commented Mar 7, 2017

Uh oh!

sgrif commented Mar 7, 2017

Uh oh!

sgrif commented Mar 7, 2017

Uh oh!

carols10cents commented Mar 7, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carols10cents commented Mar 7, 2017

Uh oh!

carols10cents commented Mar 7, 2017

Uh oh!

sgrif commented Mar 7, 2017

sgrif commented Mar 4, 2017 •

edited

Loading

sgrif commented Mar 4, 2017 •

edited

Loading

sgrif commented Mar 5, 2017 •

edited

Loading

sgrif commented Mar 6, 2017 •

edited

Loading

sgrif commented Mar 6, 2017 •

edited

Loading

sgrif commented Mar 6, 2017 •

edited

Loading