Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Diesel middleware and example #309

Closed
colinbankier opened this issue Mar 27, 2019 · 11 comments · Fixed by #313
Closed

Async Diesel middleware and example #309

colinbankier opened this issue Mar 27, 2019 · 11 comments · Fixed by #313
Assignees
Labels

Comments

@colinbankier
Copy link
Collaborator

The current "under development" Diesel middleware provides a "blocking" api to use Diesel.
Some previous work was done (and abandoned) to provide a worker CpuPool middleware to allow not blocking request handling when using diesel.

I have been experimenting with continuing this, but with a different approach. Although Diesel doesn't directly support async, we can still build an async application around it using tokio_threadpool::blocking. This removes the need to explicitly manage a separate CpuPool, and looks like it will be a simpler implementation and api.

I'm opening this issue to solicit discussion around this approach and coordinate efforts in pushing Diesel support forward.
The new experimental api so far provides a repo abstraction that handles the DB connection pool and wraps tokio_threadpool::blocking. Usage looks something like:

    let repo = Repo::borrow_from(&state);
    let user = .....;
    // Returns a future able to be chained into a `HandlerFuture`
    repo.run(move |conn| {
        diesel::insert_into(users::table)
            .values(&user)
            .get_result(&conn)
    }) 

Previous work this would supersede includes:
#227
#77
#126

@colinbankier colinbankier self-assigned this Mar 27, 2019
@whitfin
Copy link
Contributor

whitfin commented Mar 27, 2019

This is definitely a better approach IMO, but my only concern is that I'm not sure we should really push Diesel that much if it truly won't support async... if we push synchronous database calls into a Gotham environment, the whole thing is going to run much slower.

I mean, people are opting into that by using Diesel, but I just want to make sure we make it clear (even in the API) that that's what they're doing.

@colinbankier
Copy link
Collaborator Author

colinbankier commented Mar 28, 2019

I think that using a relational DB from a web app is so common, and Diesel is the defacto standard in rust to do that, Gotham would be lacking not to provide good support and examples for it.

Also, correct me if I misunderstand what tokio_threadpool::blocking gives us - but for most purposes I understand it gives a close enough approximation of being truly async - with a few caveats.

  • It won't block the tokio event loop handling other requests
  • The default concurrent max blocking sections is 100 - which seems ample for many use-cases - and can be increased if needed.
  • There may be some slight overhead of tokio managing OS threads to handle this, but it's done automatically and we don't need to think about it.

It doesn't seem that Diesel natively supporting async is anywhere in sight. Are there drawbacks to the proposed approach that I'm missing?

@whitfin
Copy link
Contributor

whitfin commented Mar 28, 2019

Sure, I don't disagree with that! Just that I'm not entirely sure how we go about this.

I don't know tokio_threadpool explicitly. If it's just a thread pool to run the Diesel stuff on with an async-looking API, then yeah, that seems fine (I think). I misunderstood and thought that it was placing the calls on the main Tokio runtime.

@jdno
Copy link

jdno commented Apr 14, 2019

Thanks @colinbankier for working on Diesel support. I've spent the weekend exploring Gotham a bit, and like it a lot. I did get blocked eventually, though, when I tried to work with a database. An official middleware, or a new example would be much appreciated.

Is there anything I can do to help?

jdno added a commit to jdno/venja-rust that referenced this issue Apr 14, 2019
Gotham has had unofficial support for Diesel for a while. A half
finished middleware exists in its Git repository, and a work-in-progress
pull request for an example using this middleware was closed recently.

Official support seems to be a bit further down the road, as different
ways to enable proper async support are currently being discussed (see
gotham-rs/gotham#309).

To get unblocked until official support arrives, the existing middleware
implementation has been copied into the project. The goal is to remove
it asap, but this is blocked by the above issue in Gotham.
@colinbankier
Copy link
Collaborator Author

Great to hear this would be useful @jdno.
I have a WIP branch here https://github.com/colinbankier/gotham/tree/diesel-tokio-blocking-middleware
with example https://github.com/colinbankier/gotham/tree/diesel-tokio-blocking-middleware/examples/diesel.
It pretty much just needs some tests and documentation filled in before opening a PR - hopefully I can find time to do that this week.
If you wanted to try out the branch in the meantime feel free - and thoughts so far always appreciated.

@jdno
Copy link

jdno commented Apr 15, 2019

Awesome, I'll check out the branch somewhere this week. Maybe I can contribute something to the documentation.

@jdno
Copy link

jdno commented Apr 19, 2019

I found some time today to look at your code and the examples, and refactor my app to use the new middleware. The proof of concept endpoint I've built is a simple health check that prints the current environment (e.g. development or production), and does a database query. With the async middleware, this is how my handler looks now:

pub fn check(state: State) -> Box<HandlerFuture> {
    let app_state = AppState::borrow_from(&state).clone();
    let repo = Repository::borrow_from(&state).clone();

    let future = repo
        .run(move |connection| sql_query("SELECT 1").execute(&connection))
        .map_err(|e| e.into_handler_error())
        .then(move |result| {
            let postgres_status = match result {
                Ok(_) => Status::Pass,
                Err(_) => Status::Fail,
            };

            let health = Health {
                environment: app_state.config.env.clone(),
                postgres: postgres_status,
            };

            let response = create_response(
                &state,
                StatusCode::OK,
                mime::APPLICATION_JSON,
                serde_json::to_string(&health).expect("Failed to serialize health"),
            );

            future::ok((state, response))
        });

    Box::new(future)
}

It should be said that I am quite new to Rust, so this is probably not the best implementation. And most of my struggles could just be related to my lack of experience with futures and tokio. Having said that, these are my observations:

  1. The code feels a lot more verbose. My previous handler implementation called a helper function check_postgres(conn: &PgConnection) -> Status to do the database query, and then simple created the response with create_response. Simple and easy to read. I have to play with this more to see how I can split the web logic (build response, return future) from the business logic (query database, return health).
  2. I don't know how I would implement an endpoint that needs to do multiple database queries.
  3. I am a bit concerned about all the cloning that's going on. This feels like a code smell to me. With my current knowledge, I know too little to see if/how cloning could be avoided.

On the positive side, it was pretty easy to set this up. And knowing that Diesel won't block the main thread is quite nice. 🙂

@colinbankier
Copy link
Collaborator Author

Thanks for the great feedback @jdno.
I agree that things can be a bit more verbose - I think this is a little inevitable with an async api - as we now have the future to deal with as well as another layer of Result - i.e. one for query result, and one for the future. When async/await hits stable, this should make this easier - but of course happy for any suggestions on how to ease this in the current state. A good point to consider.
Cloning is not necessarily a smell - e.g. the db connection pool in Repo uses an Arc internally, so a clone creates a threadsafe reference to it, necessary to use it from a new future / async function.
I'll see if this is done more than necessary in your example.
I didn't get time to look at things further as hoped last week...hopefully in the coming week (or two :) )

@jdno
Copy link

jdno commented Apr 23, 2019

Cloning is not necessarily a smell - e.g. the db connection pool in Repo uses an Arc internally, so a clone creates a threadsafe reference to it, necessary to use it from a new future / async function.

This is a really good point. I saw that the examples use Arcs, but didn’t understand why until now.

I didn't get time to look at things further as hoped last week...hopefully in the coming week (or two :) )

No worries. I’m a week on vacation myself. When I’m back I’ll play a bit more with the code to see what improvements I can make to the code, and my understanding of it. 😅

@colinbankier
Copy link
Collaborator Author

I have opened #313 to get some more feedback and suggestions on this approach.

Thanks @jdno - based on your feedback I simplified the return type from run - to remove the nesting of a Result inside the Future - and just return the Result parts as the future's Item and Error. Hopefully this makes the api a little nicer.

I have a slightly larger app (also WIP) here: https://github.com/colinbankier/realworld-gotham
that uses it so far to try it out in a more "real" example, if that is useful.

@colinbankier
Copy link
Collaborator Author

colinbankier commented Apr 30, 2019

I don't know how I would implement an endpoint that needs to do multiple database queries.

Mostly this should be same as combining or chaining other sorts of futures.
e.g. If the queries are independent, create 2 separate Futures, and then Future::join them, and then chain a then on the end to build your response using both the results.
If the 2nd one depends on the first, you have to chain them with and_then or then, and finally build your response at the end of the chain. Happy to help with a concrete code example if you have one you're struggling with.
Async / await will make this sort of thing much easier when it hits stable :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants