Rewrite flash attention workflow to avoid using GH container #62

huydhn · 2025-08-05T08:21:48Z

GitHub container doesn't work with multi-tenant rootless Docker in Docker setup https://github.com/pytorch/pytorch-integration-testing/actions/runs/16742012333/job/47392166734, maybe we want to look closer into this to understand the why.

I'm rewriting the workflow to call Docker directly. Credit to Claude code.

Signed-off-by: Huy Do <huydhn@gmail.com>

Credit to Claude code Signed-off-by: Huy Do <huydhn@gmail.com>

Signed-off-by: Huy Do <huydhn@gmail.com>

huydhn · 2025-08-05T18:53:23Z

Some notes I have while debugging the issue:

The volume seems to be mounted in the right place https://github.com/pytorch/pytorch-integration-testing/actions/runs/16742012333/job/47392166734#step:2:312 -v "/home/alice/externals":"/__e":ro. That's where node20 bundle is found on the runner.
However, when docker exec was called later on, node20 wasn't there (maybe I should ls and see if any permission issue with the mounted volume)
The Initialize container step runs outside of the multi-tenant-gpu container I think because it refers to alice user directly there, that's the confusing part

jduprat

Minor nit regarding the output of nvidia-smi.
I can fix in post...

jduprat · 2025-08-05T19:22:58Z

.github/workflows/flash_attention.yml

-          export PYTHONPATH=$(pwd)
-          python benchmarks/benchmark_attn.py >> $GITHUB_STEP_SUMMARY
+            echo '<h1>B200 1000W</h1>' >> /tmp/workspace/fa4_output.txt
+            nvidia-smi >> /tmp/workspace/fa4_output.txt


The output of nvidia-smi makes the output hard to read. I prefer that we run it (so I can check the logs and confirm we are on the right class of machine) without piping into /tmp/workspace/fa4_output.txt

Oh, I can remove this quickly

Call setup-node before checkout

304d43c

Signed-off-by: Huy Do <huydhn@gmail.com>

meta-cla bot added the cla signed label Aug 5, 2025

huydhn added 4 commits August 5, 2025 01:46

Debug

6ed44c0

Signed-off-by: Huy Do <huydhn@gmail.com>

Attempt to mount volume

5a929b1

Signed-off-by: Huy Do <huydhn@gmail.com>

Let's try another image then

cc45aff

Signed-off-by: Huy Do <huydhn@gmail.com>

Rewrite flash attention workflow to avoid using GH container

4885f04

Credit to Claude code Signed-off-by: Huy Do <huydhn@gmail.com>

huydhn changed the title ~~Call setup-node before checkout~~ Rewrite flash attention workflow to avoid using GH container Aug 5, 2025

Checkout submodules

32f6578

Signed-off-by: Huy Do <huydhn@gmail.com>

huydhn requested review from jduprat and seemethere August 5, 2025 18:43

huydhn marked this pull request as ready for review August 5, 2025 18:46

jduprat approved these changes Aug 5, 2025

View reviewed changes

Remove redundant logs from Claude

e94f522

jduprat merged commit 120745b into main Aug 5, 2025
3 checks passed

seemethere mentioned this pull request Aug 6, 2025

Using H100 on CI via runs-on fails in actions/checkout@v4 pytorch/test-infra#6977

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite flash attention workflow to avoid using GH container #62

Rewrite flash attention workflow to avoid using GH container #62

Uh oh!

huydhn commented Aug 5, 2025 •

edited

Loading

Uh oh!

huydhn commented Aug 5, 2025 •

edited

Loading

Uh oh!

jduprat left a comment

Uh oh!

jduprat Aug 5, 2025

Uh oh!

huydhn Aug 5, 2025

Uh oh!

Uh oh!

Uh oh!

Rewrite flash attention workflow to avoid using GH container #62

Rewrite flash attention workflow to avoid using GH container #62

Uh oh!

Conversation

huydhn commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huydhn commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jduprat left a comment

Choose a reason for hiding this comment

Uh oh!

jduprat Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

huydhn Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

huydhn commented Aug 5, 2025 •

edited

Loading

huydhn commented Aug 5, 2025 •

edited

Loading