Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bazel] build doesn't use sccache #79348

Closed
vors opened this issue Jun 11, 2022 · 2 comments
Closed

[bazel] build doesn't use sccache #79348

vors opened this issue Jun 11, 2022 · 2 comments
Labels
module: bazel triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@vors
Copy link
Contributor

vors commented Jun 11, 2022

馃悰 Describe the bug

A build with sccache used to take 15 minutes according to @malfet .
Right now it take 65-70 minutes. This probably indicates that sccache is not used.
However this also could be because previously GPU build was not enabled and no it is.

This needs a quick investigation.

Versions

master

@mikaylagawarecki mikaylagawarecki added module: bazel triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 15, 2022
@vors
Copy link
Contributor Author

vors commented Dec 28, 2022

@jjsjann123 I think your experience with bazel tells us that sccache is in fact used, right?
It would be still good to understand why we don't get the time savings benefit (mostly for CI).

@jjsjann123
Copy link
Collaborator

@jjsjann123 I think your experience with bazel tells us that sccache is in fact used, right? It would be still good to understand why we don't get the time savings benefit (mostly for CI).

Yes. IIRC, when I pull CI docker image to build pytorch with bazel, there's something funny with sccache, but I never get it to work and switched to our development container to WAR it.

Would be great if we actually get CI container to be usable for community contributors. 馃憖

jhavukainen pushed a commit to kulinseth/pytorch that referenced this issue Mar 15, 2024
Fixes pytorch#79348

This change is mostly focused on enabling nvcc+sccache in the PyTorch CI.

Along the way we had to do couple tweaks:
1.  Split the rules_cc from the rules_cuda that embeeded them before. This is needed in order to apply a different patch to the rules_cc compare to the one that rules_cuda does by default. This is in turn needed because we need to workaround an nvcc behavior where it doesn't send `-iquote xxx` to the host compiler, but it does send `-isystem xxx`. So we workaround this problem with (ab)using `-isystem` instead. Without it we are getting errors like `xxx` is not found.

2. Workaround bug in bazel bazelbuild/bazel#10167 that prevents us from using a straightforward and honest `nvcc` sccache wrapper. Instead we generate ad-hock bazel specific nvcc wrapper that has internal knowledge of the relative bazel paths to local_cuda. This allows us to workaround the issue with CUDA symlinks. Without it we are getting `undeclared inclusion(s) in rule` all over the place for CUDA headers.

## Test plan

Green CI build https://github.com/pytorch/pytorch/actions/runs/4267147180/jobs/7428431740

Note that now it says "CUDA" in the sccache output

```
+ sccache --show-stats
Compile requests                    9784
Compile requests executed           6726
Cache hits                          6200
Cache hits (C/C++)                  6131
Cache hits (CUDA)                     69
Cache misses                         519
Cache misses (C/C++)                 201
Cache misses (CUDA)                  318
Cache timeouts                         0
Cache read errors                      0
Forced recaches                        0
Cache write errors                     0
Compilation failures                   0
Cache errors                           7
Cache errors (C/C++)                   7
Non-cacheable compilations             0
Non-cacheable calls                 2893
Non-compilation calls                165
Unsupported compiler calls             0
Average cache write                0.116 s
Average cache read miss           23.722 s
Average cache read hit             0.057 s
Failed distributed compilations        0
```
Pull Request resolved: pytorch#95528
Approved by: https://github.com/huydhn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: bazel triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
3 participants