New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gunicorn/gevent docker, log fixes, cache busting #371
Conversation
Updates the Docker configuration to use the gunicorn server with gevent workers by default. Making this happen brought a few other issues to light, which are also addressed here. - Docker log output not immediately being flushed to stdout (#358): resolved by setting the `PYTHONUNBUFFERED` env var to `t` in the docker container - When the WSGIRef server is selected, its access logs are written directly to stderr, rather than going through the logging machinery: resolved by adding a new `WsgiHandler` class and passing in to bottle's `run()` method when running the wsgi server. This required a new `ServerCheck` class to determine whether the wsgi server is selected when the `auto` option is used - When using `gunicorn` along with the watchdog cache, package uplaods were not being picked up by the watcher. Updated the `add_package` and `remove_package` methods on the `CachingFileBackend` to bust the cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice :) I like the idea of getting gunicorn
to work through the normal --server
arg. However, I am a bit concerned about patching sys.argv
. It seems like its a big hammer that may produce unwanted side effects. Although, I don't have any other idea on how to do it. And we're running an app, not a library. So it's probably fine. 😅
do you think it's possible to get a test to verify that we're actually running gunicorn?
furthermore, do we need some tests to check new behaviour of main()
with regards to the server/extra_kwargs selection?
pypiserver/__main__.py
Outdated
if config.server_method == "gunicorn": | ||
# When bottle runs gunicorn, gunicorn tries to pull its arguments from | ||
# sys.argv. Because pypiserver's arguments don't match gunicorn's, | ||
# this leads to errors. | ||
sys.argv = ["gunicorn"] | ||
|
||
wsgi_kwargs = {"handler_class": WsgiHandler} | ||
|
||
if config.server_method == "auto": | ||
expected_server = guess_auto_server() | ||
extra_kwargs = ( | ||
wsgi_kwargs if expected_server is bottle.WSGIRefServer else {} | ||
) | ||
log.debug( | ||
"Server 'auto' selected. Expecting bottle to run '%s'. " | ||
"Passing extra keyword args: %s", | ||
expected_server.__name__, | ||
extra_kwargs, | ||
) | ||
else: | ||
extra_kwargs = wsgi_kwargs if config.server_method == "wsgiref" else {} | ||
log.debug( | ||
"Running bottle with selected server '%s'", config.server_method | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you use guess_auto_server()
only for determining what additional keywords to use. Isn't it easier to let it return a string? such as
if config.server_method == "auto":
config.server_method = guess_auto_server()
if config.server_method == "wsgiref":
...
log.debug("running bottle with args: ..."
or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that, but it would involve an additional layer of translation, since at the bottle layer, the .adapters
list is the source of truth for servers might be chosen, and it's already an array of ServerAdapter
instances. So, if we wanted to use a string, we'd have to map those adapters back to our strings, which seems like it'd be adding an extra potential source of errors. Ideally bottle would provide some method to get the auto server instance itself, but the AutoServer
only provides a run()
method unfortunately.
pypiserver/__main__.py
Outdated
return next( | ||
filter( | ||
lambda s: getattr(ServerCheck, s.__name__)(), | ||
pypiserver.bottle.AutoServer.adapters, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're assuming a number of available servers here (the implemented methods of ServerCheck
so we're not really decoupled from whatever is defined in AutoServer.adapters
. Since we also use the result only for identification of a server_method
(see my other comment below), we may just hard code a couple of server_method
strings in here, which could simplify this function to something like this:
def guess_auto_server():
server_methods = [('waitress', 'waitress'), ('paste', 'paste'), ('twisted', 'twisted.web'), ('cherrypy', 'cherrypy.wsgiserver'), ('wsgiref','wsgiref')]
return next(s[0] for s in server_methods if can_import(s[1]))
Also, do you think it's a good idea to slap a try/except StopIteration on it to give a more useful error message? I guess the chances of it failing with a StopIteration are slim because the wsgiref
package should always be available, but idk, lettng StopIteration propagate seems kinda strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is to remain coupled to what's defined in .adapters
, since that's bottle's source of truth on what can run. I'd like to be sure that if we pull in a bottle update that updates that list, we wind up with an error because we don't have a method for it in ServerCheck
.
We can catch the error and raise something more specific, for sure. My thinking was in line with yours, like it should be impossible because wsgiref
is in the standard library, but we might as well be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if I understand you correctly, you are worried that we might inadvertently run out of sync in the way we resolve auto
and bottle
does. The way this function works now, in CI it goes through all items in Autoserver.adapters
because we don't install any third party servers in CI so it ends up at WSGIRefServer
always. When a new ServerAdapter
is added or a name changes, we do not have any corresponding method in ServerCheck
and we get an error in CI (currently StopIteration), so that we know something has changed. Whenever the tests are run on another machine, behaviour is undefined since any third party servers might be installed.
This kind of sounds like a very implicit way of testing whether something has changed in bottle. Might be better to write an explicit test for this, if this is what you're worried about. When using strings, this would be something like reverse-mapping bottle.server_names
and checking our list of strings against whatever AutoServer.adapters
maps to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you understand my concern perfectly. We vendor bottle, so we're insulated from unexpected changes, but I would at least like to know if something breaks. Mapping the servers to some other format seems (to me) like extra work for not a whole lot of gain, since we'd have to validate:
- our string mapping retains the same order as the bottle server mapping
- all of the bottle adapters are mapped to string representations
- we have no extra string representations that aren't represented in the bottle adapters
which I guess you could verify with something like tuple(map(to_string_mapping, bottle.adapters)) == tuple(string_mapping_to_import_check_map)
.
We could do that in unit tests, which does reduce coupling to bottle at the application layer, and so is probably not a bad idea, but the coupling still exists anyway and remains intimately tied to bottle's adapter protocol.
I'll play with it a bit and see what it feels like. If nothing else, I'll throw a TODO in to improve the decoupling in a future commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this?
somewhere in __main__.py
server_methods = [('waitress', 'waitress'), ('paste', 'paste'), ('twisted', 'twisted.web'), ('cherrypy', 'cherrypy.wsgiserver'), ('wsgiref','wsgiref')]
somewhere in the tests:
import pypiserver.bottle
def test_bottle_auto_resolution_order():
adapter_rmap = {
val: key for key, val in pypiserver.bottle.server_names.items()
}
assert [adapter_rmap[a] for a in pypiserver.bottle.AutoServer.adapters] == [
s[0] for s in server_methods
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I added this (using an enum instead of strings) in bf4c9fa. The tests wound up being a bit awkward because we need to do two separate import checks for cherry py, so it required deduping, but I think it's reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't quite understand the benefit of using Enum in this case. I get the idea, but the result doesn't look necessarily nice imho. But sure :)
I think you might be able to reduce the awkwardness by changing the signature of _can_import
to def _can_import(*names: str)
(add a for loop in body), let the generator expr be (s for s, *i in AUTO_SERVER_IMPORTS if _can_import(*i))
and the IMPORT (AutoServer.CherryPy, "cheroot.wsgi", "cherrypy.wsgiserver")
so you'd have unique entries in there
The assert statements seem like they can then be replaced by a simple ==
assertion on the AUTO_SERVER_IMPORTS first items lower str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For typing, I prefer having a defined enumeration of options than a general str
, since the variants of a string are unlimited. Returning an enum allows callers to potentially verify statically that they've covered all the cases. I think the code itself is fine enough. The tests are a bit awkward, but nothing too crazy.
The asserts can't be replaced by a regular ==
because the enumeration names don't exactly match the ServerAdapter
class names, by virtue of wanting to decouple them.
CMD=("run" "-p" "${PYPISERVER_PORT:-$PORT}") | ||
# Use the gunicorn server by default, since it's more performant than | ||
# bottle's default server | ||
CMD=("run" "-p" "${PYPISERVER_PORT:-$PORT}" "--server" "gunicorn") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not technically part of this PR, but should we do a check here to see if $PORT
is and give a deprecation warning if it is, before using it? (I guess we then also need to not set it in Dockerfile
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled a bit with this, because what if they have the env var $PORT
set for some other reason or as part of something else they've got running on a derived docker container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see. What about not showing that warning when a $PYPISERVER_DISABLE_DEPRECATION_WARNINGS variable is set? or is that taking things too far 😅 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha, I think that's probably good actually, although then I'd have to go update the deprecation warning I'm showing for the old arg form as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly. One would expect that to not only hide docker-specific deprecations, but also those coming from the python source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to think a bit more about this and potentially do it in a separate PR, but I do think it's a good idea
# When bottle runs gunicorn, gunicorn tries to pull its arguments from | ||
# sys.argv. Because pypiserver's arguments don't match gunicorn's, | ||
# this leads to errors. | ||
sys.argv = ["gunicorn"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we pass additional configuration to gunicorn? only through a gunicorn.conf.py
?
And just out of curiosity: how is the gunicorn.conf.py
passed as a config file in the docker container? Or does gunicorn
pick it up automatically from the user's $HOME?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is another thing I struggled with. There are a couple of ways as it stands that folks could configure gunicorn. One is setting the GUNICORN_CMD_ARGS
env var, another is by mounting a new gunicorn config file, and another would be by setting certain env vars directly. But, I couldn't figure out any sensible way to have them passed via the commandline.
The gunicorn.conf.py
is picked up automatically when it's in the current working directory when you're running gunicorn. We might consider adding something like a --gunicorn-conf
option to allow people to specify some config file rather than using gunicorn's default search mechanisms, but since they can already do this with GUNICORN_CMD_ARGS="-c my_config_file.py"
it seemed like it wasn't strictly necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, it's automatically picked up from the working dir, I had a look at the gunicorn docs, but couldn't find it there. Thanks :)
I'd say adding a gunicorn specific command line argument is not really the way to go. Maybe in the future there could be a way to pass arbitrary options to arbitrary servers, but I guess that's for another time :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I threw a few extra comments in around this in the most recent commits. It'll also be something we want to document as we work on getting the docs up to snuff for v2
pypiserver/backend.py
Outdated
self.cache_manager.invalidate_root_cache(self.roots[0]) | ||
|
||
def remove_package(self, pkg: PkgFile) -> None: | ||
if pkg.fn is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ever called with None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I copied these straight out of the SimpleFileBackend
and added the cache manager calls without thinking too much about it. I probably should update to calling super().add_package(...)
and then update the ones there to address these concerns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this ever gets called with pkg.fn
being none. It's allowed to be none according to the type definition of PkgFile
, but it does seem like we'd need something like this. I'll dig in a bit and see if this check is really necessary, or if we can maybe change the type of the PkgFile
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, then it must've been my doing 🙄 . I guess I did the None-check to make the typechecker happer.
I'm currently working on simplifying PkgFile
, so will have a look at it as well
pypiserver/backend.py
Outdated
|
||
def remove_package(self, pkg: PkgFile) -> None: | ||
if pkg.fn is not None: | ||
os.remove(pkg.fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to catch FileNotFoundError
here to be safe. (Or maybe even OSError
, although that might be too broad a catch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So frustrating that python doesn't include which exceptions can be raised from a function in the stdlib documentation honestly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know, but I guess that'd be documentation that's very hard to maintain if it needs to list all possible exceptions.
also: I meant IOError
earlier, not OSError
. It's still probably to broad an exception group to (silently) handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added in a catch of FileNotFoundError
with a log message and a pass, and a catch of OSError
, which logs an exception and then re-raises 🤷
I don't love it either. IMO the gunicorn programmatic interface shouldn't be trying to parse argv at all! If we were doing anything that running
Yeah, I think this shouldn't be too hard. I definitely verified myself manually (we get different logs), but it's worth a quick test
yep, can do |
Also just noting re: the |
Right, yeah that limits the usefulness of restoring the patched values
cool 👍 |
Went ahead and also in the most recent commits included |
a87e771
to
bf4c9fa
Compare
Updates the Docker configuration to use the gunicorn server with gevent
workers by default. Hopefully this will resolve the symptoms seen when
running pypiserver with the non-production grade wsgiref server as
reported in #286.
Making this happen brought a few other issues to light, which are also
addressed here.
resolved by setting the
PYTHONUNBUFFERED
env var tot
in thedocker container
directly to stderr, rather than going through the logging machinery:
resolved by adding a new
WsgiHandler
class and passing in tobottle's
run()
method when running the wsgi server. This required anew
ServerCheck
class to determine whether the wsgi server isselected when the
auto
option is usedgunicorn
along with the watchdog cache, package uplaodswere not being picked up by the watcher. Updated the
add_package
and
remove_package
methods on theCachingFileBackend
to bust thecache
change to code led to all the dependencies being reinstalled. Updated the
file so that dependencies can be cached as a separate image from code.
Resolves #358