Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to make gevent respect SCRIPT_NAME env var? #1667

Closed
bsolomon1124 opened this issue Sep 2, 2020 · 5 comments
Closed

Possible to make gevent respect SCRIPT_NAME env var? #1667

bsolomon1124 opened this issue Sep 2, 2020 · 5 comments
Labels
Type: Question User support and/or waiting for responses

Comments

@bsolomon1124
Copy link

  • gevent version:20.6.2
  • Python version: CPython 3.7.3 system
  • Operating System: Mac OS X 10.15.6

Background:

  • Nginx reverse proxy, proxy passes /foo to a Flask application root sitting behind gevent as wsgi app server
  • Flask ultimately uses SCRIPT_NAME from request.environ in generation of url_for() results
  • gevent looks to set this variable to the empty string and ignore it, meaning url_for() will be lacking the correct prefix
  • This means that using, e.g. href="{{ url_for('static', filename='favicon.ico') }}" breaks, since it generates an absolute path without the prefix from SCRIPT_PATH.

Case 1 (unexpected/non-working results, gevent):

hello.py:

from flask import Flask, redirect, url_for
from gevent.pywsgi import WSGIServer

app = Flask(__name__)
app.config["APPLICATION_ROOT"] = "/foo"

@app.route("/favicon.ico")
def favicon():
    to = url_for("static", filename="favicon.ico")
    import pdb; pdb.set_trace()
    return redirect(to)

if __name__ == "__main__":
    WSGIServer(('127.0.0.1', 5000),application=app).serve_forever()

Start:

$ SCRIPT_NAME='/foo' python hello.py

Result: to is /static/favicon.ico.

Case 2 (expected results, gunicorn):

hello.py:

from flask import Flask, redirect, url_for
from gevent.pywsgi import WSGIServer

app = Flask(__name__)
app.config["APPLICATION_ROOT"] = "/foo"

@app.route("/favicon.ico")
def favicon():
    to = url_for("static", filename="favicon.ico")
    import pdb; pdb.set_trace()
    return redirect(to)

Start:

$ SCRIPT_NAME=/foo gunicorn -b 127.0.0.1:5000 -w 2 hello:app

The url_for() call produces /foo/static/favicon.ico correctly.

@jamadden
Copy link
Member

jamadden commented Sep 3, 2020

Can you be specific about what change you're proposing, and how it relates to the WSGI specification? Note that gevent is in no way flask specific.

@bsolomon1124
Copy link
Author

bsolomon1124 commented Sep 3, 2020

Can you be specific about what change you're proposing, and how it relates to the WSGI specification? Note that gevent is in no way flask specific.

Yes, I know this 😏

I believe it relates to attaching SCRIPT_NAME from os.environ onto request.environ, similar to as is done here:

https://github.com/benoitc/gunicorn/blob/e4e20f273e95f505277a8dadf390bbdd162cfff4/gunicorn/http/wsgi.py#L180

It looks like gevent currently sets SCRIPT_NAME to an empty string indiscriminately. In pdb:

(Pdb) p request.environ["SCRIPT_NAME"]  # gunicorn
'/foo'
(Pdb) p request.environ["SCRIPT_NAME"]  # gevent
''

Again, I understand the concern in this repo is not with Flask, but stepping into url_for() above, I see:

(Pdb) p endpoint
'static'
(Pdb) p values
{'filename': 'favicon.ico'}
(Pdb) p url_adapter.build(endpoint, values, method=method, force_external=external)
'/foo/static/favicon.ico'

To bring it back to gevent, the recognition and management of SCRIPT_NAME seems to be standard across different wsgi app servers, so I'm wondering if there is a way to achieve this effect with gevent, or why the behavior seems to differ.

@jamadden
Copy link
Member

jamadden commented Dec 9, 2020

The WSGI PEP mostly imports many environment variables from the CGI spec. The CGI spec says of SCRIPT_NAME:

The SCRIPT_NAME string forms some leading part of the path component of the Script-URI derived in some implementation defined manner. No PATH_INFO segment (see section 4.1.5) is included in the SCRIPT_NAME value.

An empty value for SCRIPT_NAME is thus correct by that definition, as any path.startswith("") is always true; empty values are explicitly allowed.

The WSGI PEP tightens this slightly:

The initial portion of the request URL's "path" that corresponds to the application object, so that the application knows its virtual "location". This may be an empty string, if the application corresponds to the "root" of the server.

As far as gevent knows, there is only one application, and it is the root. There is no concept of any hierarchy of application objects or any relation to URL space or filesystem layout. So an empty SCRIPT_NAME is consistent.

the recognition and management of SCRIPT_NAME seems to be standard across different wsgi app servers,

I disagree. The "implementation defined manner" seems to be quite different across servers.

I agree that gunicorn will read a SCRIPT_NAME from the process's environment. It will also, for some reason, accept it in a request header.

In contrast, waitress requires the command-line argument --url-prefix, which invokes some complex behaviour; it too uses the empty string by default:

Setting this to anything except the empty string will cause the WSGI SCRIPT_NAME value to be the value passed minus any trailing slashes you add, and it will cause the PATH_INFO of any request which is prefixed with this value to be stripped of the prefix. Default is the empty string.

The standard library reference implementation also just hardcodes an empty string, presumably for similar reasons to gevent. (It does read the process environment and add it to the WSGI environment, but the hardcoded variables replace whatever was in the process environment.)

Given those examples, I don't see a clear consensus. It does seem like they all do something a bit differently. SCRIPT_NAME is not portable as defined.

But, in all cases it should be pretty simple to make it do what your application expects. The most portable way would be with a tiny bit of WSGI middleware:

def make_middleware(app):
   def middleware(environ, start_response):
        environ['SCRIPT_NAME'] = os.environ.get('SCRIPT_NAME', '')
        return app(environ, start_response)

app = make_middleware(app)

If you wish to rely directly on gevent, then one could also subclass WSGIHandler:

class MyHandler(WSGIHandler):
    def get_environ(self):
         environ = super().get_environ()
         environ['SCRIPT_NAME'] = os.environ.get('SCRIPT_NAME', '')

server = WSGIServer(":8080", app)
server.handler_class = MyHandler
server.serve_forever()

@jamadden jamadden added the Type: Question User support and/or waiting for responses label Dec 9, 2020
jamadden added a commit that referenced this issue Dec 9, 2020
@bsolomon1124
Copy link
Author

bsolomon1124 commented Dec 9, 2020

I disagree. The "implementation defined manner" seems to be quite different across servers.

Fair enough - admittedly, using Gunicorn as a single example in my original question means using "across servers" without additional evidence was probably an overextended conclusion.

As far as gevent knows, there is only one application, and it is the root. There is no concept of any hierarchy of application objects or any relation to URL space or filesystem layout. So an empty SCRIPT_NAME is consistent.

From my layman's perspective, this sounds correct but overly theoretical. A common structure is Nginx as reverse proxy -> Gevent -> Flask/Django. It seems like needing to subclass WSGIHandler and reassign a WSGIServer attribute to get that setup working when Nginx is doing URL path prefix routing isn't ideal. Yes, it's just a few lines of code, but it would be convenient for that functionality to be exposed in a more formalized way.

@jamadden
Copy link
Member

jamadden commented Dec 9, 2020

Again, I disagree. That’s not a common setup in my experience. Production use tends towards something reasonable, like Pyramid+gunicorn, which doesn’t suffer from this issue.

Nonetheless, it seems there is nothing reasonable for gevent to do here. This is a portability issue with other trivial solutions.

Thank you for your post!

@jamadden jamadden closed this as completed Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Question User support and/or waiting for responses
Projects
None yet
Development

No branches or pull requests

2 participants