New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker logs crashes every couple of hours #46699
Comments
At the other issue I was asked if there is anything that could be interfering with the log files. Up until this error we left the environment itself very clean: we use AWS beanstalk to deploy the instances. After the crashes started to occur, we have made some customizations to the environment with the most crashes, such as configuring a cloudwatch agent. We only did this after issues occurred with the environments in question, so I must assume any conflicting tooling is a default of AWS. |
I took a quick look and Amazon is applying some patches to Docker that change the log behavior. In particular, their patch /cc @stewartsmith How to see patches in Amazon's SRPMDo this inside an
|
I've executed the same commands on my instance and indeed the Limit-logger-errors-logged-into-daemon-logs.patch is there (see below). Regarding the 2GB log message, my underbelly feeling is that it is the result of a concatenation of many access log messages which get combined after a serialization error. This one to be precise:
|
I am seeing a similar problem and my vm is not in AWS: I'm using local logging driver and have rotation configured a certain way. In the container directory I have several files with the rotated logs but executing docker logs on the affected containers produce a huge output that suddely ends with the error message and logs are truncated (latest logs are lost). version:
Error:
Container logs dir:
Docker logging conf:
|
@lbergesio have you checked |
I don't have |
@wouthoekstra but I do have
|
Similar issue here. The issue only started occurring after upgrading the Beanstalk image from I've tried disabling colorized logs as I could see some unexpected Unicode characters in the Relevant configs:
Working instance:
Update
|
To sum up, there are two factors that are making this issue so confusing.
But the problem here is why we would get |
@PeterCai7 would it help if we removed the patch from our most problematic instance? On a particular instance compose logs crashes within a couple of hours most of the time. If it does not within a day after removing the patch, we have a stronger case in blaming the patch. I would need some instructions about removing the patch if you desire this, either through this issue or the case I created with AWS support. |
@wouthoekstra |
Good point. I was not aware of how much work this was going to be, and I rather not do all of that on our production environment. I'll just await your progress. |
After taking a further look, my theory is that this issue could be caused by unfitted encode/decode mechanism between Could you try the following steps as a walkaround?
If this issue is gone after this walkaround, it would be a support of my hypothesis and we would be more clear on what is our next step. |
@PeterCai7 I switched the driver from I've enabled log streaming for the instance, and i've set the required permissions to do so. The only logs that make it into cloudwatch currently are: I am missing my access logs and error logs. That's no issue for a short period, and I guess it is not an issue for debugging this issue either? I did ssh into the machine just now and manually ran docker logs, and the logs are all there. I will report back tomorrow and let you know if the instance remained healthy overnight. |
About twenty hours later, docker compose log is still running and the instance is still healthy. |
@PeterCai7 my case above I was already using the local log driver and my vm was not running on aws |
@wouthoekstra According to the documentation, there should be a CloudWatch log group named |
@lbergesio Yes, your case could be different. I see you mentioned there was no |
I dont have access to the vm anymore :(. But it is an ubuntu 20.04 based image running on IBM cloud.This is my
Also, although I think it is unrelated, I am filtering out some docker logs with this in
|
@PeterCai7 Is there any progress on this issue? I can understand if the issue is hard to solve, but if there is no solution in sight I can discuss with my team about moving several of our servers away from awslog for now. |
Can someone provide a sample of the logs in question? Log files are in I am trying to find a way to reproduce the issue in the meantime. |
@cpuguy83 I can send you log files. I've just downloaded some recent logs, and they do look weird. less considers it a binary file. I see you have an email address in your github profile, should I send it to you over there? |
Yes, the beanstalk team did not develop the |
@cpuguy83 in the (possibly related) IoT Edge issue, we provided some logs EDIT: also tagging Azure/iotedge#7177 as possibly related |
Thanks @jlian I was referring to the log files of the container. Sure you can send it there. |
I've been away for a couple of days, and I just wanted to make a fresh export of logs for @cpuguy83, when I noticed all my environments at AWS are reported to be healthy. Has there been an update? |
Sadly no, I don't think there has been any changes to this logging code. |
That is odd. Nothing changed on our side as far as I am aware off. And we had several environments with completely different tech stacks that were impacted by this issue, all of them currently reporting to be healthy. Nonetheless, I've send you docker container logs from one of the environments from when we did have issues. |
@wouthoekstra |
@PeterCai7 Yes that is correct. We did not update our tech stacks, nor did we change our logging mechanism. We do have automatic updates on our AWS servers, and our environments are all at "Docker running on 64bit Amazon Linux 2/3.7.1" now. |
The "log message too large" seems like its related to corruption in the log files. Basically the (local) logs are encoded like so:
In the case of corrupted logs, I believe dockerd thinks it is reading the Unfortunately we don't have any additional framing to determine the beginning or end of a message (and what we do have is not suitable for this purpose), so the log reader is unable to recover from this situation, e.g. by skipping ahead to the next message. If this is happening on >= moby v24 with only newly created logs, this would show there is still a problem. |
Description
Every couple of hours docker compose logs crashes on my production server. In the system logs (/var/log/docker) I find something like this:
error unmarshalling log entry (size=108554): proto: LogEntry: illegal tag 0 (wire type 6)
This error is then followed by several errors that the log message is too large:
Error streaming logs: log message is too large (1937007727 > 1000000)
Note that the too large log message in this case is almost 2GB!
Manually running something like this also triggers the log message is too large error:
sudo docker logs abc123 --since "2023-10-16T18:49:10" --until "2023-10-16T18:50:00"
The environment
I am encountering this issue on two different production environments, both running on AWS EC2 instances:
Both seem to fail on the access logs. The kong gateway crashes every couple of hours, while the environment running our nodejs application crashes about once a week. We use the docker awslog driver to send the logs to cloudwatch. Cloudwatch seems to receive all logs, while manually running docker logs seems to miss up to multiple hours of logs every time after the error occurs.
What I have tried
Reproduce
I have tried to get a reproducible situation, but I am unable to do so. I cannot pinpoint the log that crashes the system, nor can I import cached logs in another container. But this is my situation:
Expected behavior
Docker logs should not crash
docker version
docker info
Additional Info
I am aware I have not much info to go on, since I am not capable of providing reproducible steps. I can however access the production environment and execute debugging steps if this helps.
Note: I am sent here after creating an issue in the the docker cli repo. (docker/cli#4617). The commenter over there suggests this version of docker might be maintained by AWS. I did indeed make a ticket with them, their (business tier) support however remains unresponsive.
The text was updated successfully, but these errors were encountered: