Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Log to files instead of docker logs #201
This PR changes handling of logs by lms, cms, mysql and nginx so that they log to
It includes an opinionated refactoring that should reduce code duplication. Should that part need further discussion I can revert it and PR it separately.
On second thought, I think we may have an issue here. I understand what you are trying to achieve here, and in some cases it's a considerable improvement: it helps debug issues, makes logs persistent, etc. But in some other cases, there is the possibility that it degrades the user experience. I'm thinking in particular about Kubernetes: I heard there are solutions that automatically collect logs from stdout and aggregate them nicely, per container/pod/node, make them searchable, etc.
Also, with the chosen approach the developer is responsible for selecting destination file names, how these files will be rotated, etc.
These reasons are example that backup the 12 factor app principle according to which apps should not attempt to manage their own logs: https://12factor.net/logs
So, a perfect solution would consist of a nice log collector that would centralize all the logs, distribute them in files, take care of file rotation, deletion, etc. I think it would be great to have such a tool, but it's probably very much out of the scope of this PR...
Can we make a compromise? You are probably most interested in tracking logs, right? I suggest we only redirect tracking logs to files: they would be both stored in files and output to stdout, with two different handlers. Maybe use a TimedRotatingFileHandler?
What do you think?
Thanks for the pointer to https://12factor.net: I was not aware of those guidelines.
The part that I find less convincing is that every app should have a single event stream.
I don't agree.
I think it's perfectly OK for a component to have more than one logical log stream.
A good example from this PR are the tracking logs: it's better not to mix them with python tracebacks.
I agree that ideally there should be a Logplex, Fluentd, Logstash, Graylog or similar to collect all logs.
My personal favourite is the Filebeat approach: apps still write to log files, and the
I propose to have tutor configured in a similar fashion to the nginx image: the daemon
I'm still of the opinion that tutor would better serve its users defaulting to logging to
I'll amend this PR to make sure all logs (except the tracking logs) are sent to stdout/stderr,
Yes, I think it's ok, too, on the condition that one of those log streams is stdout.
Well... tracking logs are helpful in debugging, too, right?
This approach makes sense -- again, as long as there is both a filestream and a console handler, so that we don't break the
Also, you may want to use a file-rotating handler; otherwise, logs are going to use more and more space until the containers are restarted.
However, I don't quite understand the comparison with the nginx image. AFAIU, the nginx daemon logs only to stdout/stderr, and there is no way to configure nginx to log to 2 different destinations at the same time (say: stdout and
I was a bit too terse wrt nginx. Let me elaborate.
The daemon is instructed to log to
These two files are symlinked in the image to
The effect is that, as a user of that image, I can get the default behaviour (logs go to docker) by leaving it as is.
But I also have the option to mount
My proposal is to follow the same pattern in tutor.
Do you have an idea how you would go about doing that? I suspect it would involve additional configuration options, right?
Let's take a step back. Currently, the docker containers use the default
Now, let's say I'm a regular Tutor user, and I'm interested in collecting logs. Maybe I'm interested both in nginx and tracking logs; maybe I want to investigate mysql logs for performance reasons; maybe I want to store my logs in an existing ELK cluster, or in Splunk, Loggly, Datadog or my data lake. The following solution addresses all those needs at once:
With this approach, not only do we address the needs from all users (including yours, right?), but we do so without touching the tutor code.
I agree that the proposed approach requires some engineering, and maybe we can provide some documentation on the best way to tackle this. But we address all problems at once.
On the other hand, if the docker containers are in charge of logging to dedicated files, then:
I'll add an exception to this for the tracking logs: often, users don't realise they will need access to the tracking logs until, well, they do. So it makes sense to keep an archive for them, and thus configure the
Maybe we could find an agreement if you described your exact needs? I guess you want to better debug production instances?
EDIT: I tried to bring more people in the conversation
I'll take a step back to better describe the issue I'm having.
Currently tutor can be used to deploy a production Open edX instance using docker.
When using the vanilla docker and the vanilla tutor configurations, logs will be entirely managed by docker, and live in
When an update is made (for instance because of a code change), a new image is pulled, and a new container is created. The new container logs and the old container logs now live in two different locations, and
Some people (like myself) might prefer having log files stored outside the containers in the same format the service/application produces.
But you can also mount
I understand the proposal to have an always running
It feels to me like the change I propose would not make any difference at all to users who are fine with
But users who want their log files living a regular logfile life on the filesystem outside of containers
Now that your concerns are clearer to me I'll open a different PR with the refactoring of common logic
On top of that, I'll work on a separate PR to change logging as I described above: use intermediate files
@silviot your use case is 100% valid. It makes sense to persist app logs on disk in production. I'll try to think of an approach that makes us both happy. I suspect it's possible to achieve the best of both worlds with a custom docker logging driver: https://docs.docker.com/config/containers/logging/configure/
@regisb I agree one possible way to achieve log-in-files-outside-the-container is to change the docker configuration, but what I'm proposing is way simpler, and would only require changes "inside" tutor, not "outside" of it. IMHO the simplest way is to make sure daemons write to files without the additional step of a log collection process, which brings complexity, along with the power it provides.
What if I told you that you could store logs for all your apps by running simply:
The future will have plugins and so that could be as simple as running: