Comments to environment cleanup handlers #18

Closed
miyagawa opened this Issue Sep 6, 2011 · 23 comments

4 participants

@avar

Having started this whole thing I feel I should chime in, I really like the proposal, the API is simple and analogous to the existing $env API, and in particular having an existing ArrayRef instead of relying an object garbage collection makes this fast and robust.

@hdp
hdp commented Sep 7, 2011

I think that instead of an arrayref, it should be a coderef, e.g. $env->{'psgix.cleanup.register'}->(\&long_running_task). This would allow middleware to wrap the cleanup registration function for things like providing $SIG{__DIE__}, localizing application-specific globals, catching exceptions, etc.

@miyagawa
plack member

@hdp i'd say that sounds too complicated.

@hdp
hdp commented Sep 7, 2011

For example,

sub call {
    my $env = shift;

    my $catch_and_log = sub {
        my $code = shift;
        return (
            try { $code->() }
            catch { 
                $self->log("Errored! $_");
                die $_;
            }
        );
    };

    my $cleanup;
    if ($env->{'psgix.cleanup'} and
        my $orig = $env->{'psgix.cleanup.register') {
        $cleanup = sub {
            my @handlers = @_;
            return $orig->(map {
                my $handler = $_;
                sub { $catch_and_log->($handler) } 
            } @handlers);
        };
    }
    local $env->{'psgix.cleanup.register'} = $cleanup;

    my $res = $catch_and_log->(sub { $self->app->($env) });

    if (ref $res eq 'ARRAY') {
        return $res;
    }
    elsif (ref $res eq 'CODE') {
        return sub {
            my $respond = shift;
            local $env->{'psgix.cleanup.register'} = $cleanup;
            $catch_and_log->(sub { $res->($respond) });
        };
    }
}

This isn't particularly pretty code, and I haven't tested it, but it should be enough to show the sort of thing I'd like to make sure we can enable.

@hdp
hdp commented Sep 7, 2011

@miyagawa code using it would potentially be complicated, yes. The actual psgix.cleanup.register function does not need to be very complicated, e.g. the implementation in my middleware.

@miyagawa
plack member

Understood the intent, but particularly:

   local $env->{'psgix.cleanup.register'} = $cleanup;
   $catch_and_log->(sub { $res->($respond) });

There's no guarantee here that the wrapped sub {} is actually run when local() is being in effect, at least with the nonblocking servers. (To be fair that's always true with streaming response as well anyway)

Also, let's say we want to do this with StackTrace - but the response has already been sent, and there's no place to capture the errors and report to users. So this is not really a valid use case where we can think of.

As a bottom line i don't want to update existing middleware to support this - response_cb() already makes it a bit hard (although it's a simple wrapper) to write response mangling middleware, and I don't want to make it messier than the current.

@miyagawa
plack member

@kazuho asks what is the order of the execution of stacked cleanup handlers. It maybe a) FIFO b) LIFO or c) Unspecified. http://twitter.com/kazuho/status/111279294730342400

@hdp
hdp commented Sep 7, 2011

I don't understand why you say there's no guarantee local is in effect there; under what circumstances wouldn't it be?

Anyway, this would in no way require existing middleware to be psgix.cleanup.register friendly, first because the goal is to let (ultimately) applications use this for their own ends, not to affect the response, and second because many of the existing middleware would not make sense with it anyway (like StackTrace, as you mention).

My biggest problem with this is that exposing a plain arrayref is too-tight coupling. It means that you can't write a middleware or server whose 'cleanup' handler forks a new process, or serializes arguments and sends them off to some other server for handling, or does anything other than 'accumulate these in memory until the end of the request'. It also implies that there is a guarantee of ordering, which may make some optimizations impossible.

@miyagawa
plack member

I don't understand why you say there's no guarantee local is in effect there; under what circumstances wouldn't it be?

In case of Twiggy and Feersum, the streaming callback is registered using timer and such, so the local() will be out of scope when the actual callback runs.

@miyagawa
plack member

does anything other than 'accumulate these in memory until the end of the request'.

Because that's the goal of this proposal.

It means that you can't write a middleware or server whose 'cleanup' handler forks a new process, or serializes arguments and sends them off to some other server for handling

I fail to see why you can't write such middleware. Or are you saying that you can't write multiple of such middleware and combine them each other without side affects?

@hdp
hdp commented Sep 7, 2011

I fail to see why you can't write such middleware. Or are you saying that you can't write multiple of such middleware and combine them each other without side affects?

You can write them, but they are then stuck with the same semantics as the proposal (everything waits until the end).

I guess this is out of the scope you see for this proposal, and I seem to be the only dissenter, so I will stop trying to convince you.

@miyagawa
plack member

You can write them, but they are then stuck with the same semantics as the proposal (everything waits until the end).

Same as what? The important thing here is that it won't block browsers/clients, not other registered handlers.

In case of mod_perl cleanup handlers, if you register sub { sleep 1 } and sub { sleep 2 }, that will run in some order and will sleep in total 3 seconds, I think. And that's what we'll get here with the proposal as well, and that's ok.

@avar

One other thing to consider is that e.g. in apache you can call ->child_terminate in these cleanup handlers.

For Plack that would mean that psgix.harakiri.commit would have to be processed after psgix.cleanup.handlers has been run.

That should probably be added to the spec.

@stash

I think psgix.cleanup.handlers should be a regular arrayref and that the handlers should be executed FIFO: apps/middleware can then control the execution order (push and unshift) and it makes servers simpler (shift).

Regarding "Callbacks will be called once a request is complete", what does "complete" mean here? The response stream has been closed? Or when all bytes of the response written to the socket? The later may be useful for Feersum-using apps that want to detect if a response was delivered downstream in its entirety.

Is it worth specifing when the ref to each handler is dropped? Two options are "right after each handler executes" and "when the $env is destroyed".

I think it's a good comment about the relative order of psgix.harakiri.commit and the cleanup handlers. Should the server accept new connections while processing cleanup handlers if harakiri is to be committed?

In the case of Feersum implementation, what would be better: an "idle" handler or a "next tick" handler (i.e. right after current event handlers fire)? We could make Twiggy use the AnyEvent analogy.

For a future spec: If a server supports keep-alives, and there's a PSGI environment var associated with a connection (not a request), specifying a connection cleanup handler in addition to these request cleanup handlers would be good for consistency.

@miyagawa
plack member
@miyagawa
plack member

I think psgix.cleanup.handlers should be a regular arrayref and that the handlers should be executed FIFO: apps/middleware can then control the execution order (push and unshift) and it makes servers simpler (shift).

Agreed.

Regarding "Callbacks will be called once a request is complete", what does "complete" mean here?  The response stream has been closed? Or when all bytes of the response written to the socket?  The later may be useful for Feersum-using apps that want to detect if a response was delivered downstream in its entirety.

I usually make things like that vague enough to mean that it is specific to server implementations.

Is it worth specifing when the ref to each handler is dropped?  Two options are "right after each handler executes" and "when the $env is destroyed".

The latter, "when the $env is destroyed" had an issue with it since when it is destroyed isn't specified in the spec. The former makes
more sense IMO.

I think it's a good comment about the relative order of psgix.harakiri.commit and the cleanup handlers. Should the server accept new connections while processing cleanup handlers if harakiri is to be committed?

harakiri is usually (only?) useful in the case of preforking server handlers, so if harakiri is to be committed, I'd imagine th server
shouldn't accept new connections, but of course that wasn't clearly stated :)

In the case of Feersum implementation, what would be better: an "idle" handler or a "next tick" handler (i.e. right after current event handlers fire)?  We could make Twiggy use the AnyEvent analogy.

Makes sense, but that sounds like a different area of interest from this one - at least a bit too specific to non-blocking async implementations :)

At the very least, the original idea of "poor man's job processing" doesn't make much sense in the context of non-blocking servers, since running long_running_task may most likely block your server - and if there's a way to run it non-blockingly, then you don't need to do it in the cleanup handler, in the first place.

@hdp
hdp commented Sep 7, 2011

what does "complete" mean here?

I think, at least, it has to mean that the client will not be waiting for the execution of whatever coderefs are in psgix.cleanup.handlers. Beyond that is server-dependent.

@miyagawa
plack member

what does "complete" mean here?

I think, at least, it has to mean that the client will not be waiting for the execution of whatever coderefs are in psgix.cleanup.handlers. Beyond that is server-dependent.

+1

@stash

Thanks for the responses. I'm happy with how it stands.

@avar

What's the current status of the proposal? I.e. what needs to happen for it to be ratified?

@avar

Note that uwsgi has now implemented this: unbit/uwsgi@cf480e4

@avar

One change I'd like to make to this is that servers s/SHOULD/MUST/ implement cleanup handlers in such a way that cleanup handlers run before the harakiri checker.

I think the main reason for doing so is not so that the cleanup handlers can commit the harakiri flag (although that's handy too), but because the cleanup handlers might be doing other important things that should be done before the child is killed, e.g. sending out logging information over the network, executing piled-up database requests etc.

If you write an application that relies on that it's a giant PITA to handle the edge case of not being able to kill the child in the same request (e.g. because it ran out of memory).

From looking at how a few servers implement these mechanisms I've concluded that there's no good reason why this can't be a MUST instead of SHOULD.

@miyagawa
plack member

This is now in the spec with #26 thanks to @avar

@miyagawa miyagawa closed this Oct 10, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment