Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec ambiguity: environment instance #2144

Closed
lloeki opened this issue Jan 10, 2024 · 11 comments
Closed

Spec ambiguity: environment instance #2144

lloeki opened this issue Jan 10, 2024 · 11 comments

Comments

@lloeki
Copy link

lloeki commented Jan 10, 2024

The spec says this regarding the environment instance:

The environment must be an unfrozen instance of Hash that includes CGI-like headers. The Rack application is free to modify the environment.

But it does not say whether through the Rack stack the environment may or may not be dup'd to be passed to the next call.

Specifically these cases could be specified:

  1. The environment instance MUST NOT be the same across calls through the stack: this would ensure that information is forgotten at each step when bubbling up the Rack stack, providing isolation from lower levels to upper levels, at least for the environment hash itself, but not for the values themselves. This may be complemented by:

    • upon calling the next step, environment values MUST be assumed immutable (ideally, enforced via freeze or other mechanism)

    The benefit would be to assume functional style side-effect-freeness from the environment within a given middleware level. Annotated example:

    class MiddlewareA
      def call(env)
         env[:foo] = Foo.new.freeze
    
         thr = Thread.new do
           loop do
             env[:foo] # guaranteed to be the same as before
             env.key?(:bar) # guaranteed to be false
           end
         end
         thr.run
    
         app.call(env.dup)
    
         thr.join
    
         env[:foo] # guaranteed to be the same as before
         env.key?(:bar) # guaranteed to be false
      end
    end
    
    class MiddlewareB
      def call(env)
         env[:bar] = Bar.new.freeze # will be invisible to MiddlewareA
         env[:foo].mutate! # forbidden
         env[:foo] = wrap(env[:foo) # has no impact on MiddlewareA caller
         app.call(env.dup)
      end
    end
    
    # and so on
    
  2. The environment instance MUST be the same across calls through the stack. Mutation of existing values attached to environment keys or value instances themselves is allowed and MUST NOT be prevented.

    The benefit would be to ensure that information is kept when unwinding the Rack stack, providing a mechanism to bubble information to the upper levels. Annotated example:

    # designed to be used with MiddlewareB somewhere down the stack (not necessarily the immediate next one)
    class MiddlewareA
      def call(env)
         env[:foo] = Foo.new
    
         thr = Thread.new do
           loop do
             env[:foo] # may be mutated or outright swapped under this thread feet
             env.key?(:bar) # could be false or true
           end
         end
         thr.run
    
         app.call(env) # pass same env instance
    
         thr.join
    
         env[:foo] # known to be mutated / swapped by MiddlewareB
         env.key?(:bar) # true on happy path, gives access to bubbled up information
      end
    end
    
    # designed to be used with MiddlewareA somewhere up the stack (not necessarily the immediate previous one)
    class MiddlewareB
      def call(env)
         env[:bar] = Bar.new # guaranteed to be visible to MiddlewareA (and up, all the way)
         env[:foo].mutate! # allowed
         env[:foo] = Baz.new # allowed, guaranteed to be visible to MiddlewareA (and up, all the way)
         app.call(env) # still same env instance
      end
    end
    
    # and so on
    
  3. The environment instance MAY OR MAY NOT be the same across calls through the stack: this means the environment can be relied on when going down the Rack stack but CANNOT be relied on when unwinding, thus SHALL NOT be used to bubble up information when the Rack stack unwinds.

    The benefit is that it's a free for all, whoever wants to do anything can, but the consequence of that must be made very clear, no middleware can assume anything about what's going up above or below, including unanticipated whatever unanticipated middlewares might do.

    # cannot be designed to be used with MiddlewareB (unless guaranteed to be just next... which would make it the "same middleware, only split in two classes")
    # but also cannot assume any immutability guarantees
    # so suffers from both 1. and 2. drawbacks with none of the benefits
    class MiddlewareA
      def call(env)
         env[:foo] = Foo.new
    
         thr = Thread.new do
           loop do
             env[:foo] # may be mutated or outright swapped under this thread feet
             env.key?(:bar) # could be false or true
           end
         end
         thr.run
    
         app.call(env) # pass same env instance
    
         thr.join
    
         env[:foo] # could have been mutated or outright swapped, or could be the same depending on intermediate middlewares, no guarantee that information is being passed from below, can only assume that this quacks like Foo.
         env.key?(:bar) # could be false or true depending on intermediate middlewares, no guarantee that the information is being passed form below, must not be relied upon
      end
    end
    
    class MiddlewareB
      def call(env)
         env[:bar] = Bar.new # may or may not be visible to MiddlewareA (and up, all the way)
         env[:foo].mutate! # allowed, but may be useless, or may crash
         env[:foo] = Baz.new # allowed, may or may not be visible to MiddlewareA (and up, all the way)
         app.call(env) # pass same env instance
      end
    end
    
    # imagine whether or not this is between MiddlewareA and MiddlewareB
    class EvilMiddleware
      def call(env)
        if [true, false].sample # just to drive the point home
          app.call(env)
        else
          env[:foo].freeze # because why not, it's not forbidden by the spec
          app.call(env.dup) # because why not, it's not forbidden by the spec
        end
      end
    end
    
    # and so on
    

Currently:

  • the expectation about the environment hash is unspecified, which means the spec sits at 3., although the spec lacks this bit: "the Rack stack but CANNOT be relied on when unwinding, thus SHALL NOT be used to bubble up information when the Rack stack unwinds".
  • in practice I guess most (if not all) middlewares are developed with an expectation of 2., although I've seen some assuming bits of 1. here and there, or conversely failing to account for corner cases of 2., which results in occasional bugs.

WDYT? Is Rack's design intended to be 1., 2., or 3.?

@ioquatix
Copy link
Member

The environment represents the state of the request. This is mutable state and there isn't much that can be done to make this fully immutable, e.g. rack.input is not buffered (usually), rack.hijack can add rack.hijack_io to the environment (bad design IMHO) - I'm sure there are other cases too, where the state of the environment is manipulated.

I think 3 is closest to how it works today, however you are correct, whether the information can bubble up is not specified. Is there value in specifying it?

@dentarg
Copy link
Contributor

dentarg commented Jan 12, 2024

Seems unlikely the intention has ever been 1.? Seeing rack itself ships with the Rack::Config middleware (since 2009)

rack/test/spec_config.rb

Lines 12 to 22 in 3897649

describe Rack::Config do
it "accept a block that modifies the environment" do
app = Rack::Builder.new do
use Rack::Lint
use Rack::Config do |env|
env['greeting'] = 'hello'
end
run lambda { |env|
[200, { 'content-type' => 'text/plain' }, [env['greeting'] || '']]
}
end

@lloeki
Copy link
Author

lloeki commented Jan 17, 2024

@dentarg this example only passes down the Rack stack.

The question is whether we can rely on env carrying information up the stack:

    app = Rack::Builder.new do
      use Rack::Lint
      use Rack::ContentLength
      use Class.new do
        def initialize(app)
          @app = app
        end

        def call(env)
          status, headers, body = @app.call(env)

          # use bubbled up information that was set downstream
          headers.merge!('x-yup' => '1') if env['random']

          [status, headers, body]
        end
      end
      use Rack::Config do |env|
        env['greeting'] = 'hello'
        env['random'] = [true, false].sample
      end
      run lambda { |env|
        [200, {'Content-Type' => 'text/plain'}, [env['greeting'] || '']]
      }
    end

Or if the only reliable way is via the return value (which is limited in what it allows + can be mutated willy nilly by each middleware), in which case we have to resort to things like this:

    app = Rack::Builder.new do
      use Rack::Lint
      use Rack::ContentLength
      use Class.new do
        def initialize(app)
          @app = app
        end

        def call(env)
          status, headers, body = @app.call(env)

          # use bubbled up information that was set downstream
          headers.merge!('x-yup' => '1') if Thread.current['random']

          [status, headers, body]
        end
      end
      use Rack::Config do |env|
        env['greeting'] = 'hello'
        Thread.current['random'] = [true, false].sample
      end
      run lambda { |env|
        [200, {'Content-Type' => 'text/plain'}, [env['greeting'] || '']]
      }
    end

@lloeki
Copy link
Author

lloeki commented Jan 17, 2024

I think 3 is closest to how it works today, however you are correct, whether the information can bubble up is not specified. Is there value in specifying it?

I believe there is:

  • I've had the need a few times
  • RequestStore (with this note) exists for a reason that may be completely eschewed in most cases if there was such a guarantee as 2 (which basically boils down to Rack env must not be dup'd, and if replaced it should be wrapped in a way that would not preclude inserting arbitrary middlewares downstream of the replacement)

I'm fine either way, as long as this is clarified.

@lloeki
Copy link
Author

lloeki commented Jan 17, 2024

rack.input is not buffered (usually), rack.hijack can add rack.hijack_io

Fair point regarding this in case 1.:

This may be complemented by:

  • upon calling the next step, environment values MUST be assumed immutable (ideally, enforced via freeze or other mechanism)

Note that in case 1. I added this about the values being immutable only as an possibility ("this may be complemented by"), but what would matter much more is the env hash instance itself. Essentially for a middleware to mutate the env it would become mandatory to dup the env, modify it, and pass it downstream (freeze on call or dup on argument passing could possibly even be done by Rack itself?).

In the spec things like rack.hijack / rack.hijack_io seem to be about the interface between the web server and the Rack stack. Is there a known case where a deep end of the Rack stack sets this key and it therefore has to bubble up all the way to the web server? That would give credence to 2. (forbidding e.g dup before passing down), otherwise hijack could just not be guaranteed to work.

@lloeki
Copy link
Author

lloeki commented Jan 17, 2024

@ioquatix Looking into hijack (which I must admit I've never used) I found this bit of yours:

rack.hijack_io modified env and was hard to use if any request has done env.dup so it's practically useless - one less overhead to deal with.

This seems to exclude 2. as the possibility of env.dup has already been considered as being a thing in practice.

In practice too, 1. is not realistic today, as env is mutated in place and most middlewares do not dup it, thus it cannot be frozen.

To the best of my knowledge, rack.hijack_io which has been removed is the only one that precludes env.dup as they are mutually exclusive.

Both rack.hijack and rack.hijack? are provided by the web server via env for downstream to call.

In addition the Partial Hijack section describes an alternative way to bubble up information:

If rack.hijack? is present in env and truthy, an application may set the special response header rack.hijack to an object that responds to call, accepting a stream argument.

This gives additional credence to the rationale that env is to pass information down and the return value of call is to pass information up, even if it means inserting fake headers that ought to be removed.

Therefore the only possible practical state of the Rack 3.0 spec is 3: env.dup is allowed, ergo env CANNOT and MUST NOT be used to bubble up information across middlewares.

Any of this may or may not change in the future, but that would be a breaking change and thus a Rack 4 spec.

@lloeki
Copy link
Author

lloeki commented Jan 17, 2024

I think 3 is closest to how it works today, however you are correct, whether the information can bubble up is not specified. Is there value in specifying it?

@ioquatix given the amount of digging I had to do and the possibility of subtle breakage I definitely think there is value in specifying it, even if it's only a few words.

@jeremyevans
Copy link
Contributor

I don't think we should specify behavior here. While we could say that reusing env for additional calls is undefined behavior (explicitly undefined), I don't think it adds value over the current spec, where it is implicitly undefined.

@matthewd
Copy link
Contributor

I don't think we should specify behavior here

Agree; the spec does not enumerate things you can't assume, and beginning to do so only moves towards people feeling more free to assume anything that hasn't been explicitly excluded.

The nature of any spec of this sort is that it is very explicit about a small set of things you can rely upon, and anything else is definitionally not guaranteed.

@ioquatix
Copy link
Member

I would support the addition of a document that outlines best practices such as discussed here. In some ways the upgrade guide I wrote tried to encapsulate some of that knowledge, but specific to Rack 2 -> Rack 3 changes.

I think it's expected that env goes down the stack and the result goes up. In other words, I don't see that as surprising. However, while we do want to encourage this model, it's not a hard line drawn by the specification, in other words, there are cases where other approaches can work.

As an example, env.dup is common, as is using Rack::MockRequest for virtual requests. In the case of env.dup, you can definitely store rich objects which are thus shared between the main request and any sub-requests. It's reasonable to assume that some frameworks will depend on that. Arbitrary middleware should aim to be compatible by following that model. But I think the consensus here is that we shouldn't force them to. If someone wants bananas, they should get bananas.

@lloeki
Copy link
Author

lloeki commented Jan 24, 2024

However, while we do want to encourage this model, it's not a hard line drawn by the specification, in other words, there are cases where other approaches can work.

Okay I think I get the idea: the Rack spec is largely unconcerned with what people do with the Rack middleware stack, it merely specifies that the Rack stack and the web server can communicate using the protocol.

IIUC then a system or framework could build its Rack stack based on any of these approaches:

class Noop
  def initialize(app)
    @app = app
  end

  def call(env)
    @app.call(env)
  end
end

class Freezer < Noop
  def call(env)
    super(env.freeze)
  end
end

class Duper < Noop
  def call(env)
    super(env.dup)
  end
end

class Wrapper < Noop
  def call(env)
    super(Wrap.new(env))
  end
end

class Bubbler < Noop
  def call(env)
    super.tap { |s, h, b| env.merge!('bubble' => true) }
  rescue Exception => e
    env[:raised] = e
    return [500, {}, 'Oopsies']
  end
end

Of course some of these options would be mutually exclusive but it's up to the Rack stack builder to make sure that they only use those that are not.

tl;dr: Fine by me, this is now all clear enough. Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants