Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up the groundwork for a port to Diesel #589

Merged
merged 10 commits into from Mar 7, 2017

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Mar 4, 2017

#588 (comment)

This PR changes the users#show endpoint to use Diesel instead of
rust-postgres, (and adds a test for that endpoint). I chose that
endpoint as it seemed likely to touch as few things as possible, but
enough to require setting up Diesel connection pooling, and testing.

I have also ported over the migrations to use Diesel's migration
infrastructure. The migration files (except the final migration) were
generated programatically. Any non-reversible migrations were not dumped
as these contained procedural data updates which aren't relevant to
setting up a new database.

cargo run --bin migrate will move all the entries in the old
migrations table into the one used by Diesel. That function is brittle
since it relies on Diesel internals. However, it really only matters for
existing contributors (and one deploy to Heroku), so when that function
breaks we can just delete it.

I've added an additional migration to make the schema compatible with
infer_schema!, which doesn't support tables that have no primary key.
I'm not using infer_schema! just yet, as it doesn't work with non-core
types and there are some tsvector columns.

I re-ordered the columns on the User struct, as diesel emits an
explicit select clause and then fetches columns by index rather than by
name. This means that the order has to match how it was defined to
Diesel (which in the case of infer_schema! will be the database
definition order). If we don't want this restriction, we can replace the
infer_schema! call with manual table! calls (which can be
automatically generated by diesel print-schema), and have the columns
ordered there to match the struct.

Differences to note

The rust-postgres connection is set to require TLS when the env var
HEROKU is set. In Diesel, this would be to add ?sslmode=require onto
the database URL. I'm unsure if heroku-postgres URLs already have this
or not, so I have not added any explicit code for that. It's a question
that should be answered before this is deployed.

Additionally, I chose not to automatically wrap each request in a
transaction. It's unneccessary most of the time, and seems like it was
only done to make testing easier. Since Diesel has explicit support for
"run tests in a transaction that never commits", we don't need that
here.

@sgrif sgrif force-pushed the sg-port-for-real-this-time branch from 1689966 to 0296ec4 Compare March 4, 2017 19:04
@sgrif
Copy link
Contributor Author

sgrif commented Mar 4, 2017

So just glancing through where the codebase is at now, a few notes in terms of feature support.

  • Diesel 0.11 has an API for the cross join used when the q parameter is present in crates#index, but it's going away in 0.12, and I'm not sure I'll have the replacement in that version
  • There's a lot of many-to-many associations. We don't not support them, it's just a few lines of boilerplate instead of something like #[belongs_to(Crate, through="crates_categories")]
  • Whatever is going on here isn't something we can write one-to-one with the query builder. We may be able to restructure the query to something that can be written in the query builder.
  • OMG CAROL STAHP

None of this would block finishing the port, but we will have a few places where we'll have to fall back to raw SQL. And I probably have 2 more releases of Diesel worth of work to do to support all this without raw SQL. :P

@carols10cents
Copy link
Member

I'm fine with having to fall back to raw sql sometimes. THEY'RE GOOD QUERIES, SGROF

@carols10cents
Copy link
Member

Diesel 0.11 has an API for the cross join used when the q parameter is present in crates#index, but it's going away in 0.12, and I'm not sure I'll have the replacement in that version

but but https://github.com/rust-lang/crates.io/blob/master/src/krate.rs#L497

I'd like to refactor that part, one because it could use some more structure and two to eventually support things like searching with a query within a keyword. I'll think about how we could rework this.

@sgrif
Copy link
Contributor Author

sgrif commented Mar 4, 2017

Yes, everything except the q branch factors out nicely. Even that branch fits nicely in 0.11. I need to better flesh out the feature though which is why I killed it.

@carols10cents
Copy link
Member

In Diesel, this would be to add ?sslmode=require onto
the database URL. I'm unsure if heroku-postgres URLs already have this
or not, so I have not added any explicit code for that.

The DATABASE_URL env var set by heroku does not have ?sslmode=require, and from what I can tell, we shouldn't need it when querying the database from within heroku. Their docs say to add ?sslmode=require in application code, which sounds like what you're saying. We have a staging heroku instance where we can try this out, although we had some issue with the prod instance that we didn't have with staging..... i forget exactly what though....

Additionally, I chose not to automatically wrap each request in a
transaction. It's unneccessary most of the time, and seems like it was
only done to make testing easier. Since Diesel has explicit support for
"run tests in a transaction that never commits", we don't need that
here.

There's at least one place I know crates.io depends on this-- when uploading a new version, if the git updates or s3 uploads fail, we roll back the database additions. By "it's unnecessary most of the time", do you mean that Diesel has a way to wrap those requests where it is necessary in a transaction?

@alexcrichton, do you know of any other places besides new crate and tests where we're relying on each request being wrapped in a transaction?

@sgrif
Copy link
Contributor Author

sgrif commented Mar 5, 2017

Yes. We can wrap in a transaction where it's needed (this PR does that in NewUser#create_or_update). Anywhere that I see >1 query where at least one has side effects I assume it needs a transaction.

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

I've updated to require SSL on Heroku. (Probably not even required, since the default value is prefer which will use SSL if the server supports it, but who am I to argue with their docs)

@sgrif sgrif force-pushed the sg-port-for-real-this-time branch from 8a61fc2 to d2cfce3 Compare March 6, 2017 14:00
@alexcrichton
Copy link
Member

@alexcrichton, do you know of any other places besides new crate and tests where we're relying on each request being wrapped in a transaction?

Hm. Well so I wrote the whole web app on this assumption, so we may want to leave it for now :). In general there's lots of transactional points such as:

  • Crate is inserted into db
  • Crate is uploaded to S3
  • Crate is uploaded to github

We currently rely on database transactions so if github fails the insertion is rolled back. Same for tons of API calls and such, the database is updated willy-nilly in the assumption that a failure down the line will roll back updates.


Also yeah I believe SSL is required on Heroku nowadays (I found it hard to find docs on that...). The ironic part is that although SSL is required they're not using valid certificates, so it's still MitM happy all over the place. I emailed them awhile back about this and they said yeah that's what most database drivers do, ignore certs. ;_;

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

To be clear, I have no issue with wrapping everything in a transaction, I just would rather do it in the requests that perform updates than around every request implicitly. Since our transactions are lexically scoped, it's difficult to establish lazily if we need to wrap each request.

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

Returning an error would still rollback the transaction, we just wouldn't wrap it in endpoints like crates#summary which perform only reads.

@alexcrichton
Copy link
Member

Ah yeah so if errors still roll back transactions then that's fine by me, it was the only intention of the setup. (I thought that was standard "write a web app" practice, right?)

Unfortunately crates.io doesn't have a lexical scope for the transaction, you can see that with the bit of unsafe code around managing the postgres transaction (I think it's still there at least...)

Also FWIW one speed bump I hit with Heroku + Diesel in the past was that Diesel needed a db connection at compile-time but w/ Heroku you don't have that (only at runtime). Is that #[derive] just not needed though?

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

I thought that was standard "write a web app" practice, right?

Errors rolling back definitely is. I haven't seen "wrap all requests in a transaction" too often though.

Unfortunately crates.io doesn't have a lexical scope for the transaction, you can see that with the bit of unsafe code around managing the postgres transaction (I think it's still there at least...)

We can do it in a middleware, but it would require eagerly acquiring the database connection. And yeah, we'd have to do something similar to what you're doing with the postgres transaction (and rely on Diesel internals) to avoid that.

Also FWIW one speed bump I hit with Heroku + Diesel in the past was that Diesel needed a db connection at compile-time but w/ Heroku you don't have that (only at runtime)

infer_schema! is the only thing that needs a database connection. We have a CLI command that you can use instead (diesel print-schema) -- I will update this to not require a connection at compile time.

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

Also I thought you could get a DB connection at build time with Heroku now? Maybe I'm mistaken

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

Asked my friend who works at Heroku.

db connections are available at build time, however if it's the first push for a rails app a database may not be provisioned yet

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

@alexcrichton So we can use infer_schema! if we want, or we can choose not to. I don't have a strong opinion one way or the other. Not using it just means there's one file that has to get updated when the schema changes (even if it's generated with a command)

@alexcrichton
Copy link
Member

Oh interesting, how do you roll back without a transaction? (I thought transactions were how that was done).

Acquiring a db connection eagerly is probably ok, we can always investigate other routes if it turns out to be a problem.

Ah yeah and that makes sense (first deploy not having a db). I'm fine either way w/ crates.io, but it does mean that redeployment may be difficult if we ever try to do it...

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

Oh interesting, how do you roll back without a transaction? (I thought transactions were how that was done).

Sorry to be more clear, you do use a transaction, just one that's explicit (or implicit at a lower level. e.g. Rails wraps every .save call in a transaction, so anything that happens in callbacks is already wrapped. If you are doing foo.save && bar.save in a controller action, you would want to wrap that in ActiveRecord::Base.transaction do)

Acquiring a db connection eagerly is probably ok, we can always investigate other routes if it turns out to be a problem.

It was a problem in tests, but I can dig in and see if I can figure out why.

Ah yeah and that makes sense (first deploy not having a db). I'm fine either way w/ crates.io, but it does mean that redeployment may be difficult if we ever try to do it...

I'll go with the manual file for now, since getting that TsVector column to generate properly from infer_schema requires a feature that's in 0.12.

@sgrif sgrif force-pushed the sg-port-for-real-this-time branch from d2cfce3 to a3decb3 Compare March 6, 2017 18:19
@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

I've rebased and updated to not connect to the database at compile time.

@alexcrichton
Copy link
Member

👍

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

Ah, I've figured out what the issue with tests was. I'll go ahead and update this to retain the auto transaction wrapping for now. I'll continue to explicitly wrap anything that looks like it should be in a transaction though, so we can easily go to a lazy connection if we decide it's necessary.

@sgrif
Copy link
Contributor Author

sgrif commented Mar 6, 2017

Actually I think that's best left for a separate PR, since this one is quite large already. Are there any other concerns I need to address?

This PR changes the `users#show` endpoint to use Diesel instead of
rust-postgres, (and adds a test for that endpoint). I chose that
endpoint as it seemed likely to touch as few things as possible, but
enough to require setting up Diesel connection pooling, and testing.

I have also ported over the migrations to use Diesel's migration
infrastructure. The migration files (except the final migration) were
generated programatically. Any non-reversible migrations were not dumped
as these contained procedural data updates which aren't relevant to
setting up a new database.

`cargo run --bin migrate` will move all the entries in the old
migrations table into the one used by Diesel. That function is brittle
since it relies on Diesel internals. However, it really only matters for
existing contributors (and one deploy to Heroku), so when that function
breaks we can just delete it.

I've added an additional migration to make the schema compatible with
`infer_schema!`, which doesn't support tables that have no primary key.
I'm not using `infer_schema!` just yet, as it doesn't work with non-core
types and there are some tsvector columns.

I re-ordered the columns on the `User` struct, as diesel emits an
explicit select clause and then fetches columns by index rather than by
name. This means that the order has to match how it was defined to
Diesel (which in the case of `infer_schema!` will be the database
definition order). If we don't want this restriction, we can replace the
`infer_schema!` call with manual `table!` calls (which can be
automatically generated by `diesel print-schema`), and have the columns
ordered there to match the struct.

Differences to note
-------------------

The rust-postgres connection is set to require TLS when the env var
`HEROKU` is set. In Diesel, this would be to add `?sslmode=require` onto
the database URL. I'm unsure if heroku-postgres URLs already have this
or not, so I have not added any explicit code for that. It's a question
that should be answered before this is deployed.

Additionally, I chose not to automatically wrap each request in a
transaction. It's unneccessary most of the time, and seems like it was
only done to make testing easier. Since Diesel has explicit support for
"run tests in a transaction that never commits", we don't need that
here.
This isn't strictly necessary, but inferring the `crates` table requires
a feature that Diesel won't have until 0.12, and this has the fewest
unanswered questions.
@sgrif sgrif force-pushed the sg-port-for-real-this-time branch from a3decb3 to f5f3d85 Compare March 7, 2017 16:03
@carols10cents
Copy link
Member

cargo run --bin migrate will move all the entries in the old
migrations table into the one used by Diesel. That function is brittle
since it relies on Diesel internals. However, it really only matters for
existing contributors (and one deploy to Heroku), so when that function
breaks we can just delete it.

So I ran cargo run --bin migrate locally and it just says The migratebinary is no longer used. Usediesel migration runanddiesel migration revert instead.... is that what you expected to happen?

Also could you update the README with new setup instructions? Based on the travis changes, it looks like people need to cargo install diesel_cli and then run diesel migration run instead of migrate?

@carols10cents
Copy link
Member

Looks like it needs to be cargo install diesel_cli --debug --no-default-features --features postgres, cargo install diesel_cli failed for me because of something about mysql

@sgrif
Copy link
Contributor Author

sgrif commented Mar 7, 2017

Yeah, by default it installs with all the supported backends so if you don't have mysql and sqlite installed you'll want --no-default-features --features postgres.

@sgrif
Copy link
Contributor Author

sgrif commented Mar 7, 2017

is that what you expected to happen?

Yes. It also migrated your old migrations table over to the Diesel one.

@carols10cents
Copy link
Member

oops, found another spot-- could you update the Procfile too?

I've left the original migration binary in there for now, since we need
to migrate the database to use Diesel's migration infrastructure on the
first deploy. After this commit has been deployed it can be removed.

use app::{App, RequestApp};
use util::{CargoResult, LazyCell, internal};

pub type Pool = r2d2::Pool<PCM>;
pub type Config = r2d2::Config<pg::Connection, r2d2_postgres::Error>;
type PooledConnnection = r2d2::PooledConnection<PCM>;
pub type DieselPool = r2d2::Pool<ConnectionManager<PgConnection>>;
type DieselPooledConn = r2d2::PooledConnection<ConnectionManager<PgConnection>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -0,0 +1,171 @@
// This file can be regenerated with `diesel print-schema`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAYYYY A SCHEMA!!!!!!

pub gh_access_token: String,
pub api_token: String,
pub gh_login: String,
pub name: Option<String>,
pub avatar: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, Queryable cares about the order of the fields on User but it doesn't care that the field here is named avatar but the field in the db is named gh_avatar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We will generate an explicit select clause so we know the order that the fields come back in, and then access them by order. This is because we can really only represent rows as a tuple, we can't express "has a field called name that is of this type" in the type system. We also don't assume that your Queryable structs are one-to-one with a database table. That said, I'm looking at ways to optionally allow by-name instead of by-index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, gotcha.

@carols10cents
Copy link
Member

Seems to work locally, just have that one question about gh_avatar tho... but let's try deploying to staging!!

@carols10cents
Copy link
Member

wait how does heroku know it needs diesel-cli?

@sgrif
Copy link
Contributor Author

sgrif commented Mar 7, 2017

It doesn't. I'm making a buildpack now.

@carols10cents
Copy link
Member

lolololol

@@ -42,6 +43,7 @@ addons:

env:
global:
- DATABASE_URL=postgres://postgres:@localhost/cargo_registry_test
- TEST_DATABASE_URL=postgres://postgres:@localhost/cargo_registry_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh so if travis is using DATABASE_URL now, do we need to set TEST_DATABASE_URL anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DATABASE_URL is just used by diesel for migrations by convention. The tests still run using TEST_DATABASE_URL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also using DATABASE_URL for compilation before, but since we don't need that now I can just specify --database-url instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean just for diesel database setup, not migrations, right? because it looks like src/tests/all.rs takes care of the migrations and uses TEST_DATABASE_URL? I just want to make sure I understand :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. diesel database setup runs migrations as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see

We'll want to drop `schema_migrations` sometime soonish, but for now
let's make sure that everything just works. We don't want to incur
downtime if we have to roll back.
@sgrif sgrif force-pushed the sg-port-for-real-this-time branch from 8a245a9 to 203555f Compare March 7, 2017 18:17
R2D2 tries to immediately populate the pool, and we're running out of
connections when it does so. Since Diesel is only used on one endpoint,
we don't need 10 connections.
@carols10cents
Copy link
Member

omg it's working on staging, hold onto your butts

@carols10cents carols10cents merged commit b0a7a8d into rust-lang:master Mar 7, 2017
@sgrif sgrif deleted the sg-port-for-real-this-time branch March 7, 2017 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants