Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci : fix Docker workflow #3628

Open
ggerganov opened this issue Oct 15, 2023 · 16 comments
Open

ci : fix Docker workflow #3628

ggerganov opened this issue Oct 15, 2023 · 16 comments
Labels
build Compilation issues help wanted Extra attention is needed

Comments

@ggerganov
Copy link
Owner

I just disabled the workflow since it has been crashing for a while now:

image

@ggerganov ggerganov added help wanted Extra attention is needed build Compilation issues labels Oct 15, 2023
@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Oct 15, 2023

System.IO.IOException: No space left on device : '/home/runner/runners/2.310.2/_diag/Worker_20231014-105256-utc.log'

It's just that the disk is full, right? Or maybe just isn't allocated large enough to run the current tasks. (At least all the ones I saw were just No space left on device errors.)

@extradosages
Copy link

extradosages commented Oct 16, 2023

This might be relevant: docker/build-push-action#968.
Here is a suggested solution: actions/runner#1807 (comment).

@samm81
Copy link
Contributor

samm81 commented Nov 20, 2023

I thought I'd give this a shot, but couldn't manage to replicate the issue. after forking, the "Publish Docker image" workflow runs fine in my repo. I tried reverting back to 11bff29 since that was the commit of the most recent failure before the workflow was manually disabled, but even that succeeded 🤷

I went ahead and added the jlumbroso/free-disk-space action to the workflow since it seems straightforward. test runs in my repo indicate that it frees up 34gb on the github test runners (turns out they only have 24gb remaining free by default of the 80gb total disk they get) & that it doesn't interfere with building and publishing the image.

pr #4150

@rhuddleston
Copy link
Contributor

@ggerganov thoughts on that PR?

@ggerganov
Copy link
Owner Author

Should I try to merge #4150 and re-enable the docker action?

@rhuddleston
Copy link
Contributor

Yea let's try it

@ggerganov
Copy link
Owner Author

@rhuddleston
Copy link
Contributor

nice looks like it's worked!

@rhuddleston
Copy link
Contributor

On another note I see there is full-cuda-afefa319f1f59b002dfa0d1ef407a2c74bd9770b and full-cuda tags but should also have full-cuda-b1678 tag

@rhuddleston
Copy link
Contributor

#4584 here's my attempt at this

@ggerganov
Copy link
Owner Author

How to fix this:

image

@samm81
Copy link
Contributor

samm81 commented Dec 22, 2023

the timing of this coincides with f31b984

it's weird that the error is on 97, but yaml is weird sometimes 🤷 I think this is actually an issue with the tags entry on L101

I have what I think is a fix in #4603 (CI is still running)

@samm81
Copy link
Contributor

samm81 commented Dec 22, 2023

@ggerganov
Copy link
Owner Author

Not sure, will have to disable it again because my notifications are being spammed with Docker failures. Hopefully we can figure out the issue eventually

@ggerganov
Copy link
Owner Author

Just saw #4603 - let's see if this resolves it

@samm81
Copy link
Contributor

samm81 commented Dec 24, 2023

it looks like the problems in this issue have both been solved - seems there's some intermittent issue with the pip install step of the docker build, but that's probably best to open as a separate issue?

@github-actions github-actions bot added stale and removed stale labels Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants