Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown #528

Open
ivan770 opened this issue May 22, 2020 · 18 comments · May be fixed by #766
Open

Graceful shutdown #528

ivan770 opened this issue May 22, 2020 · 18 comments · May be fixed by #766

Comments

@ivan770
Copy link

ivan770 commented May 22, 2020

Tide could store JoinHandle of every request it handles, and await them when server is shutting down.

tide/src/server.rs

Lines 300 to 311 in 7d0b848

task::spawn(async move {
let res = async_h1::accept(&addr, stream, |req| async {
let res = this.respond(req).await;
let res = res.map_err(|_| io::Error::from(io::ErrorKind::Other))?;
Ok(res)
})
.await;
if let Err(err) = res {
log::error!("async-h1 error", { error: err.to_string() });
}
});

async-std book

@Fishrock123
Copy link
Member

I think the server only shuts down in a panic! or due to an external signal, such SIGINT, or whatever the Windows equivalents are.

I suppose this could possibly be desirable for SIGTERM which allows for such a thing. If you panic I'm not sure it is ever really safe to continue.

@Fishrock123
Copy link
Member

Oh I think I see what you are saying - essentially we should have a way to await all tasks which are still alive, I think?

@ivan770
Copy link
Author

ivan770 commented May 22, 2020

Oh I think I see what you are saying - essentially we should have a way to await all tasks which are still alive, I think?

Yes. I mean, new requests are not handled, but at least those that are still awaiting response should be finished

@mehcode
Copy link
Contributor

mehcode commented May 27, 2020

Was just coming to make an issue for this. It's critical from a production standpoint that a load balancer or swarm manager can initiate a controlled or graceful shutdown.

Any requests that came in before the "shut down please" request should still finish. Icing would be a way to shutdown with a timeout but I think we can compose that if there is an API for shutdown.

The way I envision usage would be a shutdown method on Server. As Server is clone-able we could clone, wait for SIGTERM, and call shutdown on it.

@repnop
Copy link

repnop commented Jun 11, 2020

Definitely agree with @mehcode. Having the ability to gracefully shutdown the server is pretty important for me as well. 👍

@ririsoft
Copy link
Contributor

ririsoft commented Jun 11, 2020

FWIW here is how I do it using the async-ctrlc crate:

use async_std::prelude::FutureExt;

async fn run() -> Result<()> {
    let ctrlc = async {
        CtrlC::new().expect("Cannot use CTRL-C handler").await;
        println!("termination signal received, stopping server...");
        Ok(())
    };

    let app = async {
        let app = tide::new();
        app.listen("localhost:8080").await
    };

    app.race(ctrlc).await?;
    println!("server stopped.");
    Ok(())
}

Please note that this will not wait for all running requests to complete, this is a "brutal graceful shutdown", if I dare. But at least it handles SIGTERM and CTRL-C shutdown (cross platform) which is better than nothing.

@repnop
Copy link

repnop commented Jun 12, 2020

I did try messing around with adding graceful shutdown to tide yesterday evening and even got tests working, but ran into a bit of a problem that I'll outline in hopes someone better at async code can figure out a better solution.

The actual waiting for requests to finish is pretty simple with the design I used: spawn a task that uses a Receiver (I used the futures-channel channel) and use that as a stream to await all job handles and when the channel is closed from the sender side, it'll finish all of the jobs and break out of looping, which can be used to signal all finished requests have been finished.

The other part seems just as simple as well: if someone has initiated a shutdown (I used an Arc<AtomicBool> here and set it to true in a shutdown method), stop listening for new connections, close the sender portion of the channel, and wait for the existing ones to finish (I also had it serve up 503s during this time), then return. This worked, except that using stream combinators to figure out when to stop polling would result in it waiting for a request to be served after shutdown was initiated because, I think, the TcpListener was only notifying the waker it was ready when there was new data on the socket, at which point it would then see shutdown is true on the next poll cycle, and go as expected.

To get around this I was able to use a select! and check the value of shutdown if the incoming.next() didn't resolve, but to make it stop infinitely looping there needed to be a task::yield_now().await on the false case, which seemed less than ideal, and where I ended my attempt.

If anyone has thoughts on how to avoid the issue, do let me know and I can give them a try and report back. Or if anyone else wants to take a stab at implementing it themselves, by all means 👍

@yoshuawuyts
Copy link
Member

I've thought about this a little in the past, and gathered some notes on this. What I think we need to do is use stop-token here:

tide/src/server.rs

Lines 334 to 335 in 293610b

let mut incoming = listener.incoming();
while let Some(stream) = incoming.next().await {
and here

tide/src/server.rs

Lines 292 to 293 in 293610b

let mut incoming = listener.incoming();
while let Some(stream) = incoming.next().await {

This allows us to stop receiving incoming requests and allow the server to gracefully exit once a token has been received.


However stop-token is not quite the end of the road for us. What I think we need is a tighter integration with Stream and Future. I've also gathered some notes on a design for async-std we could use to make cancellation easier. If we had that we could probably use it to enable cancellation of Tide as well:

let src = StopSource::new();
let token = src.token();

// Send a token to each app we initialize
task::spawn(async move {
    let mut app = tide::new();
    app.stop_on(token.clone());
    app.listen("localhost:8080").await?;
});

// app will stop receiving requests once we drop the `StopSource`.
// this can be be done in response to e.g. a signal handler.

It's a bit of a question how we'd combine this today. We could probably define our own StopToken for the time being, and implement the API outlined above. But once we have this in async-std we could probably rely on that. Hope this is helpful!

@GWBasic
Copy link

GWBasic commented Nov 8, 2020

I'm learning Rust, so I tried to see if I could fix this: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1

Specifically, I solved @repnop 's problem by using future::select. It allows awaiting on two futures and returns when the first completes: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1#diff-1f14b8b6953e2388b19efaf814862a89c0b57c45a6814f79ed373fde05d864d0R82

I then created a "CancelationToken" future that returns once it is canceled.

So far, I only know how to test tcp_listener, and I adjusted its tests to work by canceling gracefully: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1#diff-0f530b03216a301ca1004179731df8510170be421ff7abf545db673eca3bc4beR12

I haven't done anything with waiting for ongoing requests to complete, or testing other listeners. I must admit that I've tried to return futures from the listen function; but that leads to lots of fighting with the borrow checker. That's a bit more than I can handle as a Rust novice. ;)

Anyway, if you believe my branch is in line with tide's overall design philosophy, I'm happy to make a pull request and change whatever you want. (Especially for testing the rest of the listeners.) Otherwise, this bug was a great learning exercise for me.

@Fishrock123
Copy link
Member

@GWBasic Can you make at least a draft PR?

@GWBasic
Copy link

GWBasic commented Nov 18, 2020

@Fishrock123 : See #746

(Note that some merge conflicts creeped in.)

Looking forward to your feedback!

@GWBasic GWBasic linked a pull request Dec 16, 2020 that will close this issue
@GWBasic
Copy link

GWBasic commented Dec 16, 2020

@Fishrock123 : I had too many merge conflicts in #746, so I created #766

It uses futures-lite.

Thoughts? Again, I'm mostly interested in learning Rust, so if you'd rather do this in a different way, that's fine with me.

@pbzweihander
Copy link
Contributor

I've also made #836. Please give any thoughts or suggestions!

@KoltesDigital
Copy link

Any news?

@yoshuawuyts
Copy link
Member

I think it's important to break this down into two parts:

  1. What is the design as it would be if async Rust had all primitives implemented.
  2. What design can we implement now?

A design for Future Rust.

I've recently written about async cancellation. For Tide, the change we need to make is that our calls to task::spawn (e.g. tcp_listener, unix_listener) implement "cancel on drop" semantics. So that when the Server instance is dropped, all tasks are cancelled.

We could do this using e.g. the async-task-group crate instead of calling async_std::task::spawn directly. Within the spawned tasks we could make it so each existing request finishes executing in an AsyncDrop destructor, so by the time the server finishes dropping all requests have finished executing.

Using (for example) the stop-token crate, we could then shutdown a server like:

use stop_token::prelude::*;
use std::time::{Duration, Instant};
use async_std::task;

// Create the server and register a route which takes some time to complete.
let mut server = tide::new();
server.at("/").get(|_| async move {
    task::sleep(Duration::from_secs(10)).await;
    Ok("hello")
});

// Start listening for incoming requests, but close the server after 3 secs.
// Instead of waiting for a duration we could instead also wait for an external signal.
// Either way, this should graciously wait for all requests to terminate.
server
    .listen("localhost:8080")
    .timeout_at(Instant::now() + Duration::from_secs(3));
    .await?;

println!("finished cleaning up all requests!");

What can we do now?

Unfortunately we currently don't have access to async Drop so we're limited by how we can implement this. The biggest downside of this is that instead of being able to use composable cancellation (e.g. when Server is dropped, we gracefully shut down), we need to hard-code a shutdown mechanism into Tide.

Likely the easiest route to go would be to hard-code a Server::timeout_at method which takes a Deadline. The question is whether we would want to take a dependency on an external crate for this, or provide this mechanism ourselves. I feel that if we were to provide it ourselves, it might be the most resilient. But keen to hear thoughts on this.


I hope this helps clarify what the current thinking is!

@KoltesDigital
Copy link

Thanks!

I'm sure I read on a page linked from your blog that drop is not guaranteed to be called. What is your take on relying on it to shutdown the server?

Maybe app.listen could yield a kind of RunningServerHandle from which .shutdown() could be called. I've done a lot of NodeJS servers with express and those app object have listen and close methods, I think it is more idiomatic to give shutdown once listen has been called. Maybe this handle could also have a block method, in order to get back the current behavior, i.e. most users would simply change from app.listen(...).await?; to app.listen(...).block().await?;. Sadly it's a breaking change.

@GWBasic
Copy link

GWBasic commented Jan 26, 2022

Hey, thanks for the mention. I'm leaving this thread: I only attempted (#766) as a learning exercise.

Best of luck!

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Jan 28, 2022

I'm sure I read on a page linked from your blog that drop is not guaranteed to be called. What is your take on relying on it to shutdown the server?

Drop is not guaranteed to be called because mem::forget is not unsafe, which means for soundness purposes we cannot rely on it. This is far less about "what could we theoretically do" and more: "what is the default behavior". In cases like these we can absolutely expect Drop to just run, and we don't need to worry about it not working 1.

Footnotes

  1. If Drop wasn't something we could rely on, abstractions such as Arc, Mutex, etc. would be so unreliable that they'd be pointless. But luckily we can just rely on Drop for almost everything except when we're talking about soundness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants