Spec ambiguity: environment instance #2144

lloeki · 2024-01-10T11:42:08Z

The spec says this regarding the environment instance:

The environment must be an unfrozen instance of Hash that includes CGI-like headers. The Rack application is free to modify the environment.

But it does not say whether through the Rack stack the environment may or may not be dup'd to be passed to the next call.

Specifically these cases could be specified:

The environment instance MUST NOT be the same across calls through the stack: this would ensure that information is forgotten at each step when bubbling up the Rack stack, providing isolation from lower levels to upper levels, at least for the environment hash itself, but not for the values themselves. This may be complemented by:

upon calling the next step, environment values MUST be assumed immutable (ideally, enforced via freeze or other mechanism)

The benefit would be to assume functional style side-effect-freeness from the environment within a given middleware level. Annotated example:

class MiddlewareA
  def call(env)
     env[:foo] = Foo.new.freeze

     thr = Thread.new do
       loop do
         env[:foo] # guaranteed to be the same as before
         env.key?(:bar) # guaranteed to be false
       end
     end
     thr.run

     app.call(env.dup)

     thr.join

     env[:foo] # guaranteed to be the same as before
     env.key?(:bar) # guaranteed to be false
  end
end

class MiddlewareB
  def call(env)
     env[:bar] = Bar.new.freeze # will be invisible to MiddlewareA
     env[:foo].mutate! # forbidden
     env[:foo] = wrap(env[:foo) # has no impact on MiddlewareA caller
     app.call(env.dup)
  end
end

# and so on

The environment instance MUST be the same across calls through the stack. Mutation of existing values attached to environment keys or value instances themselves is allowed and MUST NOT be prevented.

The benefit would be to ensure that information is kept when unwinding the Rack stack, providing a mechanism to bubble information to the upper levels. Annotated example:

# designed to be used with MiddlewareB somewhere down the stack (not necessarily the immediate next one)
class MiddlewareA
  def call(env)
     env[:foo] = Foo.new

     thr = Thread.new do
       loop do
         env[:foo] # may be mutated or outright swapped under this thread feet
         env.key?(:bar) # could be false or true
       end
     end
     thr.run

     app.call(env) # pass same env instance

     thr.join

     env[:foo] # known to be mutated / swapped by MiddlewareB
     env.key?(:bar) # true on happy path, gives access to bubbled up information
  end
end

# designed to be used with MiddlewareA somewhere up the stack (not necessarily the immediate previous one)
class MiddlewareB
  def call(env)
     env[:bar] = Bar.new # guaranteed to be visible to MiddlewareA (and up, all the way)
     env[:foo].mutate! # allowed
     env[:foo] = Baz.new # allowed, guaranteed to be visible to MiddlewareA (and up, all the way)
     app.call(env) # still same env instance
  end
end

# and so on

The environment instance MAY OR MAY NOT be the same across calls through the stack: this means the environment can be relied on when going down the Rack stack but CANNOT be relied on when unwinding, thus SHALL NOT be used to bubble up information when the Rack stack unwinds.

The benefit is that it's a free for all, whoever wants to do anything can, but the consequence of that must be made very clear, no middleware can assume anything about what's going up above or below, including unanticipated whatever unanticipated middlewares might do.

# cannot be designed to be used with MiddlewareB (unless guaranteed to be just next... which would make it the "same middleware, only split in two classes")
# but also cannot assume any immutability guarantees
# so suffers from both 1. and 2. drawbacks with none of the benefits
class MiddlewareA
  def call(env)
     env[:foo] = Foo.new

     thr = Thread.new do
       loop do
         env[:foo] # may be mutated or outright swapped under this thread feet
         env.key?(:bar) # could be false or true
       end
     end
     thr.run

     app.call(env) # pass same env instance

     thr.join

     env[:foo] # could have been mutated or outright swapped, or could be the same depending on intermediate middlewares, no guarantee that information is being passed from below, can only assume that this quacks like Foo.
     env.key?(:bar) # could be false or true depending on intermediate middlewares, no guarantee that the information is being passed form below, must not be relied upon
  end
end

class MiddlewareB
  def call(env)
     env[:bar] = Bar.new # may or may not be visible to MiddlewareA (and up, all the way)
     env[:foo].mutate! # allowed, but may be useless, or may crash
     env[:foo] = Baz.new # allowed, may or may not be visible to MiddlewareA (and up, all the way)
     app.call(env) # pass same env instance
  end
end

# imagine whether or not this is between MiddlewareA and MiddlewareB
class EvilMiddleware
  def call(env)
    if [true, false].sample # just to drive the point home
      app.call(env)
    else
      env[:foo].freeze # because why not, it's not forbidden by the spec
      app.call(env.dup) # because why not, it's not forbidden by the spec
    end
  end
end

# and so on

Currently:

the expectation about the environment hash is unspecified, which means the spec sits at 3., although the spec lacks this bit: "the Rack stack but CANNOT be relied on when unwinding, thus SHALL NOT be used to bubble up information when the Rack stack unwinds".
in practice I guess most (if not all) middlewares are developed with an expectation of 2., although I've seen some assuming bits of 1. here and there, or conversely failing to account for corner cases of 2., which results in occasional bugs.

WDYT? Is Rack's design intended to be 1., 2., or 3.?

The text was updated successfully, but these errors were encountered:

ioquatix · 2024-01-11T23:09:12Z

The environment represents the state of the request. This is mutable state and there isn't much that can be done to make this fully immutable, e.g. rack.input is not buffered (usually), rack.hijack can add rack.hijack_io to the environment (bad design IMHO) - I'm sure there are other cases too, where the state of the environment is manipulated.

I think 3 is closest to how it works today, however you are correct, whether the information can bubble up is not specified. Is there value in specifying it?

dentarg · 2024-01-12T05:39:39Z

Seems unlikely the intention has ever been 1.? Seeing rack itself ships with the Rack::Config middleware (since 2009)

rack/test/spec_config.rb

Lines 12 to 22 in 3897649

    
           describe Rack::Config do 
        
             it "accept a block that modifies the environment" do 
        
               app = Rack::Builder.new do 
        
                 use Rack::Lint 
        
                 use Rack::Config do |env| 
        
                   env['greeting'] = 'hello' 
        
                 end 
        
                 run lambda { |env| 
        
                   [200, { 'content-type' => 'text/plain' }, [env['greeting'] || '']] 
        
                 } 
        
               end

lloeki · 2024-01-17T13:37:41Z

@dentarg this example only passes down the Rack stack.

The question is whether we can rely on env carrying information up the stack:

    app = Rack::Builder.new do
      use Rack::Lint
      use Rack::ContentLength
      use Class.new do
        def initialize(app)
          @app = app
        end

        def call(env)
          status, headers, body = @app.call(env)

          # use bubbled up information that was set downstream
          headers.merge!('x-yup' => '1') if env['random']

          [status, headers, body]
        end
      end
      use Rack::Config do |env|
        env['greeting'] = 'hello'
        env['random'] = [true, false].sample
      end
      run lambda { |env|
        [200, {'Content-Type' => 'text/plain'}, [env['greeting'] || '']]
      }
    end

Or if the only reliable way is via the return value (which is limited in what it allows + can be mutated willy nilly by each middleware), in which case we have to resort to things like this:

    app = Rack::Builder.new do
      use Rack::Lint
      use Rack::ContentLength
      use Class.new do
        def initialize(app)
          @app = app
        end

        def call(env)
          status, headers, body = @app.call(env)

          # use bubbled up information that was set downstream
          headers.merge!('x-yup' => '1') if Thread.current['random']

          [status, headers, body]
        end
      end
      use Rack::Config do |env|
        env['greeting'] = 'hello'
        Thread.current['random'] = [true, false].sample
      end
      run lambda { |env|
        [200, {'Content-Type' => 'text/plain'}, [env['greeting'] || '']]
      }
    end

lloeki · 2024-01-17T13:43:57Z

I think 3 is closest to how it works today, however you are correct, whether the information can bubble up is not specified. Is there value in specifying it?

I believe there is:

I've had the need a few times
RequestStore (with this note) exists for a reason that may be completely eschewed in most cases if there was such a guarantee as 2 (which basically boils down to Rack env must not be dup'd, and if replaced it should be wrapped in a way that would not preclude inserting arbitrary middlewares downstream of the replacement)

I'm fine either way, as long as this is clarified.

lloeki · 2024-01-17T13:54:33Z

rack.input is not buffered (usually), rack.hijack can add rack.hijack_io

Fair point regarding this in case 1.:

This may be complemented by:

upon calling the next step, environment values MUST be assumed immutable (ideally, enforced via freeze or other mechanism)

Note that in case 1. I added this about the values being immutable only as an possibility ("this may be complemented by"), but what would matter much more is the env hash instance itself. Essentially for a middleware to mutate the env it would become mandatory to dup the env, modify it, and pass it downstream (freeze on call or dup on argument passing could possibly even be done by Rack itself?).

In the spec things like rack.hijack / rack.hijack_io seem to be about the interface between the web server and the Rack stack. Is there a known case where a deep end of the Rack stack sets this key and it therefore has to bubble up all the way to the web server? That would give credence to 2. (forbidding e.g dup before passing down), otherwise hijack could just not be guaranteed to work.

lloeki · 2024-01-17T16:33:58Z

@ioquatix Looking into hijack (which I must admit I've never used) I found this bit of yours:

rack.hijack_io modified env and was hard to use if any request has done env.dup so it's practically useless - one less overhead to deal with.

This seems to exclude 2. as the possibility of env.dup has already been considered as being a thing in practice.

In practice too, 1. is not realistic today, as env is mutated in place and most middlewares do not dup it, thus it cannot be frozen.

To the best of my knowledge, rack.hijack_io which has been removed is the only one that precludes env.dup as they are mutually exclusive.

Both rack.hijack and rack.hijack? are provided by the web server via env for downstream to call.

In addition the Partial Hijack section describes an alternative way to bubble up information:

If rack.hijack? is present in env and truthy, an application may set the special response header rack.hijack to an object that responds to call, accepting a stream argument.

This gives additional credence to the rationale that env is to pass information down and the return value of call is to pass information up, even if it means inserting fake headers that ought to be removed.

Therefore the only possible practical state of the Rack 3.0 spec is 3: env.dup is allowed, ergo env CANNOT and MUST NOT be used to bubble up information across middlewares.

Any of this may or may not change in the future, but that would be a breaking change and thus a Rack 4 spec.

lloeki · 2024-01-17T16:38:26Z

I think 3 is closest to how it works today, however you are correct, whether the information can bubble up is not specified. Is there value in specifying it?

@ioquatix given the amount of digging I had to do and the possibility of subtle breakage I definitely think there is value in specifying it, even if it's only a few words.

jeremyevans · 2024-01-17T16:43:11Z

I don't think we should specify behavior here. While we could say that reusing env for additional calls is undefined behavior (explicitly undefined), I don't think it adds value over the current spec, where it is implicitly undefined.

matthewd · 2024-01-18T05:18:50Z

I don't think we should specify behavior here

Agree; the spec does not enumerate things you can't assume, and beginning to do so only moves towards people feeling more free to assume anything that hasn't been explicitly excluded.

The nature of any spec of this sort is that it is very explicit about a small set of things you can rely upon, and anything else is definitionally not guaranteed.

ioquatix · 2024-01-18T22:19:49Z

I would support the addition of a document that outlines best practices such as discussed here. In some ways the upgrade guide I wrote tried to encapsulate some of that knowledge, but specific to Rack 2 -> Rack 3 changes.

I think it's expected that env goes down the stack and the result goes up. In other words, I don't see that as surprising. However, while we do want to encourage this model, it's not a hard line drawn by the specification, in other words, there are cases where other approaches can work.

As an example, env.dup is common, as is using Rack::MockRequest for virtual requests. In the case of env.dup, you can definitely store rich objects which are thus shared between the main request and any sub-requests. It's reasonable to assume that some frameworks will depend on that. Arbitrary middleware should aim to be compatible by following that model. But I think the consensus here is that we shouldn't force them to. If someone wants bananas, they should get bananas.

lloeki · 2024-01-24T17:13:44Z

However, while we do want to encourage this model, it's not a hard line drawn by the specification, in other words, there are cases where other approaches can work.

Okay I think I get the idea: the Rack spec is largely unconcerned with what people do with the Rack middleware stack, it merely specifies that the Rack stack and the web server can communicate using the protocol.

IIUC then a system or framework could build its Rack stack based on any of these approaches:

class Noop
  def initialize(app)
    @app = app
  end

  def call(env)
    @app.call(env)
  end
end

class Freezer < Noop
  def call(env)
    super(env.freeze)
  end
end

class Duper < Noop
  def call(env)
    super(env.dup)
  end
end

class Wrapper < Noop
  def call(env)
    super(Wrap.new(env))
  end
end

class Bubbler < Noop
  def call(env)
    super.tap { |s, h, b| env.merge!('bubble' => true) }
  rescue Exception => e
    env[:raised] = e
    return [500, {}, 'Oopsies']
  end
end

Of course some of these options would be mutually exclusive but it's up to the Rack stack builder to make sure that they only use those that are not.

tl;dr: Fine by me, this is now all clear enough. Thank you all!

lloeki mentioned this issue Jan 10, 2024

Add http.route tag to rack DataDog/dd-trace-rb#3345

Merged

2 tasks

ioquatix closed this as completed Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec ambiguity: environment instance #2144

Spec ambiguity: environment instance #2144

lloeki commented Jan 10, 2024 •

edited

ioquatix commented Jan 11, 2024

dentarg commented Jan 12, 2024

lloeki commented Jan 17, 2024

lloeki commented Jan 17, 2024 •

edited

lloeki commented Jan 17, 2024 •

edited

lloeki commented Jan 17, 2024

lloeki commented Jan 17, 2024

jeremyevans commented Jan 17, 2024

matthewd commented Jan 18, 2024

ioquatix commented Jan 18, 2024

lloeki commented Jan 24, 2024

Spec ambiguity: environment instance #2144

Spec ambiguity: environment instance #2144

Comments

lloeki commented Jan 10, 2024 • edited

ioquatix commented Jan 11, 2024

dentarg commented Jan 12, 2024

lloeki commented Jan 17, 2024

lloeki commented Jan 17, 2024 • edited

lloeki commented Jan 17, 2024 • edited

lloeki commented Jan 17, 2024

lloeki commented Jan 17, 2024

jeremyevans commented Jan 17, 2024

matthewd commented Jan 18, 2024

ioquatix commented Jan 18, 2024

lloeki commented Jan 24, 2024

lloeki commented Jan 10, 2024 •

edited

lloeki commented Jan 17, 2024 •

edited

lloeki commented Jan 17, 2024 •

edited