-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build status not updated, but build is complete #2090
Comments
|
duplicate of #2065 |
@bradrydzewski Oh, I see that you are aware of the issue. Thank you. |
@zaa yes, I plan to resolve this in the upcoming 0.8 release (no planned release date). I am currently evaluating grpc which would give us heartbeat, reconnect, retry with backoff and more. I first need to understand the implications of such a decision, since grpc uses http2 and could complicate installation (nginx doesn't support http2, for example) |
@bradrydzewski would you consider to add a "build reset" functionality? That might be able to mitigate possibly network related and similar issues. |
@mrueg I would rather see the time and effort go into fixing the root cause. The existing code could be improved by implementing websocket ping/pong to keep the connection alive, and then slightly tweaking the retry logic here: |
copy of the gitter conversation where @zaa tracks down the possible root cause to the docker logs endpoint freezing.
Note that you can identify this issue if you see the following in 0.8
But do not see a subsequent log message indicating uploading the logs is complete:
|
Note that at this time I still have not been able to repeat, but I am running docker 17.05 (perhaps that is why). If one can reproduce this issue with docker 17.05 (and drone version 0.8 which uses grpc) then we can re-open. |
FWIW, the latest major release of Kubernetes 1.7 that just came out has this certification statement:
While it is possible to run newer or older versions of Docker than those, the above are the only officially supported versions for Kubernetes 1.7. |
I think one option would be to run Unfortunately most of the streaming code sits inside the docker library, as opposed to drone code, which limits our options for addressing the issue. (assuming we can verify this is a docker issue) |
@gtaylor I experienced this issue running older version of Send me a message and I can give you a preview of what we're doing. I hope to open source our Drone on GKE setup when it's more battle tested. |
Yep, we resolved the issue by running drone-agent and docker-in-docker in the same Kubernetes pod. But as was mentioned earlier, 1.12.x version is recommended by Kubernetes and most production like installations of Kubernetes are using it. Maybe it makes sense to add some kind of a warning or a recommendation into documentation about minimal Docker version and/or suggestion to use dind in Kubernetes. |
@Punitag I'll see if I can put something together for this week. |
@Punitag we run docker-agent and dind as two containers in the same kubernetes pod. The agent connects to the docker daemon via tcp (tcp://127.0.0.1:2375). |
Early preview of what I'm putting together for a PR to the official docs on running this on GKE, this is how we run it currently (on GKE). https://gist.github.com/tonglil/4108f5c74bf4e382511f4c1b633d2d9a A few things missing:
|
I wanted to provide a quick update, since it looks like there were multiple root causes to this issue and we have at least two solutions now. The below comment was copied from discourse, you can visit the origin thread here. I just merged a pull request that fixes an issue where large log output causes the upload to return an error due to exceeding the maximum grpc payload size. The agent will continue to retry the upload indefinitely because the error will always be the same, thus causing the build to get stuck. Thanks to @tboerger for pinpointing the exact error:
This fix will limit the size of the logs (per step) to ensure it does not execeed the grpc limits. A more permanent solution will be to implement grpc streaming, which is in the long term, is definitely how this should be implemented anyway. So in conclusion I believe there were at least two different root causes for builds getting stuck that we have discovered:
I therefore believe that both upgrading docker and getting the |
Nice find! Just so I am clear on behavior, we'll see a truncated build log (at the limit boundary) if our build goes over the limit? |
yes, each step in the pipeline will truncate the logs at 2mb. Note that the aggregate of all logs for all steps can exceed 2mb, so this is just a per-step limit. |
Thanks, it's a fair short-term workaround for allowing builds to complete; one can reduce log output in the mean time until grpc streaming is implemented. |
Hello,
I've got a fun issue with drone 0.7. All jobs of a build were successfully completed, but the build was shown as still being running. I clicked on "Cancel" button on the build page and the build info page shown that the build was killed (the build info page had information about the build in red).
However in the list of builds on the left and in output from "drone build info" I see that the build is still running.
And no new builds can start (they are waiting in the pending state).
I see the following logs in the agent:
I've tried to restart drone server and agent and the pending build started running, but the build that was stuck is still shown as "running".
Thank you.
The text was updated successfully, but these errors were encountered: