Gunicorn/gevent docker, log fixes, cache busting #371

mplanchard · 2021-02-07T03:22:44Z

Updates the Docker configuration to use the gunicorn server with gevent
workers by default. Hopefully this will resolve the symptoms seen when
running pypiserver with the non-production grade wsgiref server as
reported in #286.

Making this happen brought a few other issues to light, which are also
addressed here.

Docker log output not immediately being flushed to stdout (Logs not showing when running from docker #358):
resolved by setting the PYTHONUNBUFFERED env var to t in the
docker container
When the WSGIRef server is selected, its access logs are written
directly to stderr, rather than going through the logging machinery:
resolved by adding a new WsgiHandler class and passing in to
bottle's run() method when running the wsgi server. This required a
new ServerCheck class to determine whether the wsgi server is
selected when the auto option is used
When using gunicorn along with the watchdog cache, package uplaods
were not being picked up by the watcher. Updated the add_package
and remove_package methods on the CachingFileBackend to bust the
cache
The Dockerfile was not really amenable to local development, since any
change to code led to all the dependencies being reinstalled. Updated the
file so that dependencies can be cached as a separate image from code.

Resolves #358

Updates the Docker configuration to use the gunicorn server with gevent workers by default. Making this happen brought a few other issues to light, which are also addressed here. - Docker log output not immediately being flushed to stdout (#358): resolved by setting the `PYTHONUNBUFFERED` env var to `t` in the docker container - When the WSGIRef server is selected, its access logs are written directly to stderr, rather than going through the logging machinery: resolved by adding a new `WsgiHandler` class and passing in to bottle's `run()` method when running the wsgi server. This required a new `ServerCheck` class to determine whether the wsgi server is selected when the `auto` option is used - When using `gunicorn` along with the watchdog cache, package uplaods were not being picked up by the watcher. Updated the `add_package` and `remove_package` methods on the `CachingFileBackend` to bust the cache

elfjes

nice :) I like the idea of getting gunicorn to work through the normal --server arg. However, I am a bit concerned about patching sys.argv. It seems like its a big hammer that may produce unwanted side effects. Although, I don't have any other idea on how to do it. And we're running an app, not a library. So it's probably fine. 😅

do you think it's possible to get a test to verify that we're actually running gunicorn?

furthermore, do we need some tests to check new behaviour of main() with regards to the server/extra_kwargs selection?

elfjes · 2021-02-07T16:58:17Z

pypiserver/__main__.py

+    if config.server_method == "gunicorn":
+        # When bottle runs gunicorn, gunicorn tries to pull its arguments from
+        # sys.argv. Because pypiserver's arguments don't match gunicorn's,
+        # this leads to errors.
+        sys.argv = ["gunicorn"]
+
+    wsgi_kwargs = {"handler_class": WsgiHandler}
+
+    if config.server_method == "auto":
+        expected_server = guess_auto_server()
+        extra_kwargs = (
+            wsgi_kwargs if expected_server is bottle.WSGIRefServer else {}
+        )
+        log.debug(
+            "Server 'auto' selected. Expecting bottle to run '%s'. "
+            "Passing extra keyword args: %s",
+            expected_server.__name__,
+            extra_kwargs,
+        )
+    else:
+        extra_kwargs = wsgi_kwargs if config.server_method == "wsgiref" else {}
+        log.debug(
+            "Running bottle with selected server '%s'", config.server_method
+        )
+


It seems like you use guess_auto_server() only for determining what additional keywords to use. Isn't it easier to let it return a string? such as

if config.server_method == "auto": config.server_method = guess_auto_server() if config.server_method == "wsgiref": ... log.debug("running bottle with args: ..."

or something like that

We could do that, but it would involve an additional layer of translation, since at the bottle layer, the .adapters list is the source of truth for servers might be chosen, and it's already an array of ServerAdapter instances. So, if we wanted to use a string, we'd have to map those adapters back to our strings, which seems like it'd be adding an extra potential source of errors. Ideally bottle would provide some method to get the auto server instance itself, but the AutoServer only provides a run() method unfortunately.

pypiserver/__main__.py

elfjes · 2021-02-07T17:18:35Z

pypiserver/__main__.py

+    return next(
+        filter(
+            lambda s: getattr(ServerCheck, s.__name__)(),
+            pypiserver.bottle.AutoServer.adapters,
+        )
+    )


We're assuming a number of available servers here (the implemented methods of ServerCheck so we're not really decoupled from whatever is defined in AutoServer.adapters. Since we also use the result only for identification of a server_method (see my other comment below), we may just hard code a couple of server_method strings in here, which could simplify this function to something like this:

def guess_auto_server(): server_methods = [('waitress', 'waitress'), ('paste', 'paste'), ('twisted', 'twisted.web'), ('cherrypy', 'cherrypy.wsgiserver'), ('wsgiref','wsgiref')] return next(s[0] for s in server_methods if can_import(s[1]))

Also, do you think it's a good idea to slap a try/except StopIteration on it to give a more useful error message? I guess the chances of it failing with a StopIteration are slim because the wsgiref package should always be available, but idk, lettng StopIteration propagate seems kinda strange.

The intent is to remain coupled to what's defined in .adapters, since that's bottle's source of truth on what can run. I'd like to be sure that if we pull in a bottle update that updates that list, we wind up with an error because we don't have a method for it in ServerCheck.

We can catch the error and raise something more specific, for sure. My thinking was in line with yours, like it should be impossible because wsgiref is in the standard library, but we might as well be safe.

Hmm, if I understand you correctly, you are worried that we might inadvertently run out of sync in the way we resolve auto and bottle does. The way this function works now, in CI it goes through all items in Autoserver.adapters because we don't install any third party servers in CI so it ends up at WSGIRefServer always. When a new ServerAdapter is added or a name changes, we do not have any corresponding method in ServerCheck and we get an error in CI (currently StopIteration), so that we know something has changed. Whenever the tests are run on another machine, behaviour is undefined since any third party servers might be installed.

This kind of sounds like a very implicit way of testing whether something has changed in bottle. Might be better to write an explicit test for this, if this is what you're worried about. When using strings, this would be something like reverse-mapping bottle.server_names and checking our list of strings against whatever AutoServer.adapters maps to.

Yep, you understand my concern perfectly. We vendor bottle, so we're insulated from unexpected changes, but I would at least like to know if something breaks. Mapping the servers to some other format seems (to me) like extra work for not a whole lot of gain, since we'd have to validate:

our string mapping retains the same order as the bottle server mapping

all of the bottle adapters are mapped to string representations

we have no extra string representations that aren't represented in the bottle adapters

which I guess you could verify with something like tuple(map(to_string_mapping, bottle.adapters)) == tuple(string_mapping_to_import_check_map).

We could do that in unit tests, which does reduce coupling to bottle at the application layer, and so is probably not a bad idea, but the coupling still exists anyway and remains intimately tied to bottle's adapter protocol.

I'll play with it a bit and see what it feels like. If nothing else, I'll throw a TODO in to improve the decoupling in a future commit

something like this?

somewhere in __main__.py

server_methods = [('waitress', 'waitress'), ('paste', 'paste'), ('twisted', 'twisted.web'), ('cherrypy', 'cherrypy.wsgiserver'), ('wsgiref','wsgiref')]

somewhere in the tests:

import pypiserver.bottle def test_bottle_auto_resolution_order(): adapter_rmap = { val: key for key, val in pypiserver.bottle.server_names.items() } assert [adapter_rmap[a] for a in pypiserver.bottle.AutoServer.adapters] == [ s[0] for s in server_methods ]

Okay I added this (using an enum instead of strings) in bf4c9fa. The tests wound up being a bit awkward because we need to do two separate import checks for cherry py, so it required deduping, but I think it's reasonable.

Hmm I don't quite understand the benefit of using Enum in this case. I get the idea, but the result doesn't look necessarily nice imho. But sure :)

I think you might be able to reduce the awkwardness by changing the signature of _can_import to def _can_import(*names: str) (add a for loop in body), let the generator expr be (s for s, *i in AUTO_SERVER_IMPORTS if _can_import(*i)) and the IMPORT (AutoServer.CherryPy, "cheroot.wsgi", "cherrypy.wsgiserver") so you'd have unique entries in there

The assert statements seem like they can then be replaced by a simple == assertion on the AUTO_SERVER_IMPORTS first items lower str

For typing, I prefer having a defined enumeration of options than a general str, since the variants of a string are unlimited. Returning an enum allows callers to potentially verify statically that they've covered all the cases. I think the code itself is fine enough. The tests are a bit awkward, but nothing too crazy.

The asserts can't be replaced by a regular == because the enumeration names don't exactly match the ServerAdapter class names, by virtue of wanting to decouple them.

elfjes · 2021-02-07T17:31:05Z

docker/entrypoint.sh

-    CMD=("run" "-p" "${PYPISERVER_PORT:-$PORT}")
+    # Use the gunicorn server by default, since it's more performant than
+    # bottle's default server
+    CMD=("run" "-p" "${PYPISERVER_PORT:-$PORT}" "--server" "gunicorn")


I know it's not technically part of this PR, but should we do a check here to see if $PORT is and give a deprecation warning if it is, before using it? (I guess we then also need to not set it in Dockerfile)

I struggled a bit with this, because what if they have the env var $PORT set for some other reason or as part of something else they've got running on a derived docker container?

Hmm, I see. What about not showing that warning when a $PYPISERVER_DISABLE_DEPRECATION_WARNINGS variable is set? or is that taking things too far 😅 ?

Hahaha, I think that's probably good actually, although then I'd have to go update the deprecation warning I'm showing for the old arg form as well

Yeah exactly. One would expect that to not only hide docker-specific deprecations, but also those coming from the python source

I would like to think a bit more about this and potentially do it in a separate PR, but I do think it's a good idea

elfjes · 2021-02-07T17:42:14Z

pypiserver/__main__.py

+        # When bottle runs gunicorn, gunicorn tries to pull its arguments from
+        # sys.argv. Because pypiserver's arguments don't match gunicorn's,
+        # this leads to errors.
+        sys.argv = ["gunicorn"]


How can we pass additional configuration to gunicorn? only through a gunicorn.conf.py?

And just out of curiosity: how is the gunicorn.conf.py passed as a config file in the docker container? Or does gunicorn pick it up automatically from the user's $HOME?

Yeah, this is another thing I struggled with. There are a couple of ways as it stands that folks could configure gunicorn. One is setting the GUNICORN_CMD_ARGS env var, another is by mounting a new gunicorn config file, and another would be by setting certain env vars directly. But, I couldn't figure out any sensible way to have them passed via the commandline.

The gunicorn.conf.py is picked up automatically when it's in the current working directory when you're running gunicorn. We might consider adding something like a --gunicorn-conf option to allow people to specify some config file rather than using gunicorn's default search mechanisms, but since they can already do this with GUNICORN_CMD_ARGS="-c my_config_file.py" it seemed like it wasn't strictly necessary

Ah right, it's automatically picked up from the working dir, I had a look at the gunicorn docs, but couldn't find it there. Thanks :)

I'd say adding a gunicorn specific command line argument is not really the way to go. Maybe in the future there could be a way to pass arbitrary options to arbitrary servers, but I guess that's for another time :)

I threw a few extra comments in around this in the most recent commits. It'll also be something we want to document as we work on getting the docs up to snuff for v2

elfjes · 2021-02-07T17:44:45Z

pypiserver/backend.py

+        self.cache_manager.invalidate_root_cache(self.roots[0])
+
+    def remove_package(self, pkg: PkgFile) -> None:
+        if pkg.fn is not None:


is this ever called with None?

Honestly I copied these straight out of the SimpleFileBackend and added the cache manager calls without thinking too much about it. I probably should update to calling super().add_package(...) and then update the ones there to address these concerns

I don't know if this ever gets called with pkg.fn being none. It's allowed to be none according to the type definition of PkgFile, but it does seem like we'd need something like this. I'll dig in a bit and see if this check is really necessary, or if we can maybe change the type of the PkgFile class

Haha, then it must've been my doing 🙄 . I guess I did the None-check to make the typechecker happer.

I'm currently working on simplifying PkgFile, so will have a look at it as well

elfjes · 2021-02-07T17:45:40Z

pypiserver/backend.py

+
+    def remove_package(self, pkg: PkgFile) -> None:
+        if pkg.fn is not None:
+            os.remove(pkg.fn)


I think we need to catch FileNotFoundError here to be safe. (Or maybe even OSError, although that might be too broad a catch)

So frustrating that python doesn't include which exceptions can be raised from a function in the stdlib documentation honestly

Yeah I know, but I guess that'd be documentation that's very hard to maintain if it needs to list all possible exceptions.

also: I meant IOError earlier, not OSError. It's still probably to broad an exception group to (silently) handle

I added in a catch of FileNotFoundError with a log message and a pass, and a catch of OSError, which logs an exception and then re-raises 🤷

mplanchard · 2021-02-07T19:14:17Z

nice :) I like the idea of getting gunicorn to work through the normal --server arg. However, I am a bit concerned about patching sys.argv. It seems like its a big hammer that may produce unwanted side effects. Although, I don't have any other idea on how to do it. And we're running an app, not a library. So it's probably fine. 😅

I don't love it either. IMO the gunicorn programmatic interface shouldn't be trying to parse argv at all! If we were doing anything that running bottle.run() here, I'd consider saving the original value of argv and patching it back, but since we need to update it for the run() call, and we exit immediately after the run() call, it didn't seem worth it

do you think it's possible to get a test to verify that we're actually running gunicorn?

Yeah, I think this shouldn't be too hard. I definitely verified myself manually (we get different logs), but it's worth a quick test

furthermore, do we need some tests to check new behaviour of main() with regards to the server/extra_kwargs selection?

yep, can do

mplanchard · 2021-02-07T19:38:27Z

Also just noting re: the argv thing that if someone is using pypiserver itself as a lib, they should be using one of the app constructors (app or app_from_config) in __init__.py and then passing that WSGI application to some server themselves, so they won't hit the code that patches argv

elfjes · 2021-02-07T20:32:32Z

nice :) I like the idea of getting gunicorn to work through the normal --server arg. However, I am a bit concerned about patching sys.argv. It seems like its a big hammer that may produce unwanted side effects. Although, I don't have any other idea on how to do it. And we're running an app, not a library. So it's probably fine. sweat_smile

I don't love it either. IMO the gunicorn programmatic interface shouldn't be trying to parse argv at all! If we were doing anything that running bottle.run() here, I'd consider saving the original value of argv and patching it back, but since we need to update it for the run() call, and we exit immediately after the run() call, it didn't seem worth it

Right, yeah that limits the usefulness of restoring the patched values

do you think it's possible to get a test to verify that we're actually running gunicorn?

Yeah, I think this shouldn't be too hard. I definitely verified myself manually (we get different logs), but it's worth a quick test

furthermore, do we need some tests to check new behaviour of main() with regards to the server/extra_kwargs selection?

yep, can do

cool 👍

mplanchard · 2021-02-07T20:39:33Z

Went ahead and also in the most recent commits included waitress as a dependency of the Docker container. I suspect that, regardless of how well we document things, folks who override args may not remember to set --server gunicorn as part of their override, and I think we probably want that to fall back to something other than wsgiref

mplanchard added type.Maintainance tag.docker labels Feb 7, 2021

mplanchard self-assigned this Feb 7, 2021

mplanchard added this to the 2.0.0 milestone Feb 7, 2021

mplanchard mentioned this pull request Feb 7, 2021

Server randomly hangs #286

Open

elfjes reviewed Feb 7, 2021

View reviewed changes

Check imports with utility, improve error message

523c9e7

mplanchard added 3 commits February 7, 2021 13:48

Improve error handling for remove_package

e878dcf

Add waitress as docker fallback, test servers

dc80788

test extra kwargs

065a9e9

Reduce coupling to bottle adapters

bf4c9fa

mplanchard force-pushed the maint/docker-gunicorn branch from a87e771 to bf4c9fa Compare February 7, 2021 21:25

mplanchard mentioned this pull request Feb 7, 2021

Allow suppression of deprecation warnings via env var #372

Open

mplanchard merged commit 8306de1 into master Feb 7, 2021

mplanchard deleted the maint/docker-gunicorn branch February 7, 2021 23:04

pault28 mentioned this pull request Feb 11, 2021

AttributeError: 'Configuration' object has no attribute 'welcome_msg' #376

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gunicorn/gevent docker, log fixes, cache busting #371

Gunicorn/gevent docker, log fixes, cache busting #371

mplanchard commented Feb 7, 2021 •

edited

elfjes left a comment

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021 •

edited

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

elfjes Feb 7, 2021

mplanchard Feb 7, 2021

mplanchard commented Feb 7, 2021

mplanchard commented Feb 7, 2021

elfjes commented Feb 7, 2021

mplanchard commented Feb 7, 2021

Gunicorn/gevent docker, log fixes, cache busting #371

Gunicorn/gevent docker, log fixes, cache busting #371

Conversation

mplanchard commented Feb 7, 2021 • edited

elfjes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mplanchard Feb 7, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mplanchard commented Feb 7, 2021

mplanchard commented Feb 7, 2021

elfjes commented Feb 7, 2021

mplanchard commented Feb 7, 2021

mplanchard commented Feb 7, 2021 •

edited

mplanchard Feb 7, 2021 •

edited