Summary
The ubuntu-latest x86_64 CI jobs that build --features embeddings-all (which pulls in candle-core / candle-transformers / tokenizers / hf-hub / image / reqwest) intermittently die with:
System.IO.IOException: No space left on device : '/home/runner/actions-runner/cached/.../Worker_...log'
The "Run test" step never completes (no failed step is recorded; the job log truncates right after the cache-restore step). The cause is only visible via the check-run annotations:
gh api repos/mosuka/laurus/check-runs/<job_id>/annotations
Observed on PRs #772 and #774 (the Test laurus (ubuntu-latest, x86_64-unknown-linux-gnu, stable) job failed at ~4m25s twice in a row before a later re-run happened to land on a runner with enough free space). The ubuntu-24.04-arm job is unaffected because it sets skip_embedding_features: true; Windows is unaffected (different disk layout).
This is an environment/disk-margin problem, not a code or test problem. The restored Rust build cache (~790 MB compressed, much larger expanded) plus the embeddings-all build artifacts push the runner's root disk over its limit. It is intermittent because the margin is razor-thin.
Root cause
GitHub-hosted ubuntu-latest runners ship with several large preinstalled toolchains the build does not use (.NET, Android SDK, GHC/Haskell, CodeQL bundle, cached Docker images). None of the workflows free any of it, so the heavy embeddings-all compile occasionally runs out of space.
Proposed fix
Add a reusable composite action .github/actions/free-disk-space that removes the unused preinstalled toolchains (reclaims ~25-30 GB) and prints df -h / before/after, then reference it (gated if: runner.os == 'Linux') right after checkout in the heavy embeddings-all jobs:
regression.yml (PR-blocking) — clippy, test-laurus, test-server, test-mcp, test-cli, test-python, test-nodejs, test-ruby, test-php
periodic.yml and release.yml — the equivalent clippy / test-* jobs
A composite action keeps the removal list in one place instead of pasting it into ~25 jobs. Because pull_request runs use the PR's own workflow definitions, the change self-validates on its own CI run.
The removal targets only safe, unused paths (/usr/share/dotnet, /usr/local/lib/android, /opt/ghc, /usr/local/.ghcup, /opt/hostedtoolcache/CodeQL, dangling Docker images) — it does NOT touch the Rust / Python / Node toolchains the jobs rely on.
Acceptance criteria
Out of scope (possible follow-up)
release.yml build-* / publish-* artifact jobs (release-gated, not currently failing).
References
Summary
The
ubuntu-latestx86_64 CI jobs that build--features embeddings-all(which pulls incandle-core/candle-transformers/tokenizers/hf-hub/image/reqwest) intermittently die with:The "Run test" step never completes (no failed step is recorded; the job log truncates right after the cache-restore step). The cause is only visible via the check-run annotations:
Observed on PRs #772 and #774 (the
Test laurus (ubuntu-latest, x86_64-unknown-linux-gnu, stable)job failed at ~4m25s twice in a row before a later re-run happened to land on a runner with enough free space). Theubuntu-24.04-armjob is unaffected because it setsskip_embedding_features: true; Windows is unaffected (different disk layout).This is an environment/disk-margin problem, not a code or test problem. The restored Rust build cache (~790 MB compressed, much larger expanded) plus the
embeddings-allbuild artifacts push the runner's root disk over its limit. It is intermittent because the margin is razor-thin.Root cause
GitHub-hosted
ubuntu-latestrunners ship with several large preinstalled toolchains the build does not use (.NET, Android SDK, GHC/Haskell, CodeQL bundle, cached Docker images). None of the workflows free any of it, so the heavyembeddings-allcompile occasionally runs out of space.Proposed fix
Add a reusable composite action
.github/actions/free-disk-spacethat removes the unused preinstalled toolchains (reclaims ~25-30 GB) and printsdf -h /before/after, then reference it (gatedif: runner.os == 'Linux') right after checkout in the heavyembeddings-alljobs:regression.yml(PR-blocking) —clippy,test-laurus,test-server,test-mcp,test-cli,test-python,test-nodejs,test-ruby,test-phpperiodic.ymlandrelease.yml— the equivalentclippy/test-*jobsA composite action keeps the removal list in one place instead of pasting it into ~25 jobs. Because
pull_requestruns use the PR's own workflow definitions, the change self-validates on its own CI run.The removal targets only safe, unused paths (
/usr/share/dotnet,/usr/local/lib/android,/opt/ghc,/usr/local/.ghcup,/opt/hostedtoolcache/CodeQL, dangling Docker images) — it does NOT touch the Rust / Python / Node toolchains the jobs rely on.Acceptance criteria
.github/actions/free-disk-space/action.ymlcomposite action added.embeddings-alljobs ofregression.yml,periodic.yml,release.yml.actionlint(if available) reports no errors on the edited workflows.Out of scope (possible follow-up)
release.ymlbuild-*/publish-*artifact jobs (release-gated, not currently failing).References