Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD Use outer ccache wrapper #1805

Merged
merged 27 commits into from
Sep 11, 2021
Merged

BLD Use outer ccache wrapper #1805

merged 27 commits into from
Sep 11, 2021

Conversation

rth
Copy link
Member

@rth rth commented Aug 25, 2021

This should be a better way to use ccache with emsdk according to emscripten-core/emscripten#13498

Investigating into whether this would fix #1007

@rth
Copy link
Member Author

rth commented Aug 26, 2021

It's still not working. Either I'm missing something or there is an issue with the ccache setup in emdk. Opened emscripten-core/emsdk#876

@rth rth marked this pull request as ready for review August 26, 2021 15:07
@rth
Copy link
Member Author

rth commented Aug 30, 2021

The status on this is that ccache is correctly being called now during compilation. However for some reason it doesn't recognize the files to compile as being identical between different CI runs and so we consistently get cache misses. Need to investigate that next.

@rth rth added this to the 0.19.0 milestone Sep 10, 2021
@rth rth marked this pull request as draft September 10, 2021 11:45
Copy link
Member Author

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's finally working:

Build time (min) build emsdk+cpython+core build packages
Without ccache 20 80 ± 10
ccache (main) 17 ± 2 80 ± 10
ccache (this PR) 14 ± 2 44

@@ -8,6 +8,9 @@ defaults: &defaults
- EMSDK_NUM_CORES: 3
EMCC_CORES: 3
PYODIDE_JOBS: 3
# Make sure the ccache dir is consistent between core and package builds
# (it's not the case otherwise)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something to do with emsdk activate, by it in any case we were previously caching the wrong path for packages

- ~/.ccache
key: -{{ checksum "Makefile.envs" }}-v20201205-{{ .BuildNum }}
- /root/.ccache
key: -core-{{ checksum "Makefile.envs" }}-v20210910-
Copy link
Member Author

@rth rth Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer store the cache for each build, as then we are not sure which one gets used, and it tends to grows with each build if there are cache misses

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the assumption here that the ccache will only be updated if Makefile.envs changes? Maybe it would be better to key also on Makefile and cpython/Makefile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a balance minimizing cache resets and doing that if something major changes. Currently Makefile.envs includes all the global flags.
If I look at git history,

  • Makefile most of the changes there are either about js packaging, linting or something other about build that doesn't affect compilation
  • cpython/Makefile: most changes also also rather local (even libffi addition). Also I'd like to move most of those under the main build system as standalone packages eventually if possible.

The cost of rebuilding without cache is roughly 40min, and the issue is that the cache for each fork is independent as far as I can tell, so it's multiplied by the number of contributors. So I think even if a few compilation options change for a subset of files in a package (including CPython), and we get some cache misses overall it's probably still fine to keep the same cache. With the assumption that we update emsdk periodically which resets the cache anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes for the core build we can certainly include cpython/Makefile in the hash since the core/package caches are separate.

source pyodide_env.sh

# Set mtime for EM_CONFIG to avoid ccache cache misses
touch -m -d '1 Jan 2021 12:00' emsdk/emsdk/.emscripten
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source: https://github.com/juj/ccache/blob/d69e1d05d68b06a256f00a86a38de644ff23b66d/src/emcc.cpp#L627

This can probably be switched to hashing the file, but mtime+size is the default ccache strategy for compilers apparently (see CCACHE_COMPILERCHECK)

@rth rth marked this pull request as ready for review September 10, 2021 17:36
@rth rth requested a review from hoodmane September 10, 2021 17:36
@@ -41,20 +44,29 @@ jobs:

- restore_cache:
keys:
- -{{ checksum "Makefile.envs" }}-v20201205-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to keep the core and packages ccache separate? The packages ccache should include caches from building various packages? Perhaps we could use the same cache for both but only save_cache on build-packages? Perhaps keeping them separate is simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The packages ccache should include caches from building various packages?

Yes.

Perhaps we could use the same cache for both but only save_cache on build-packages? Perhaps keeping them separate is simpler.

I think would I rather prefer to have them separate,

  • first, there are surprisingly many things that could go wrong with ccache (particularly with this little documented emsdk fork we are using), and than you have to debug by, setting,
    export CCACHE_DEBUG=1
    export CCACHE_DEBUGDIR="/root/ccache-debug/"
    
    storing that folder as artifacts, than comparing *.ccache-input-text and *.ccache-log files for subsequent rebuilds to see why there are cache misses. It's easier to do that if there are no inter-dependencies for cache between jobs.
  • for instance on main, ccache is broken for packages because cache loaded contains only files from core. It's hard to detect that because you see "33MB of ccache loaded" and so you think well maybe that's what's supposed to happen. If caches were seprate you would see "0MB of ccache loaded" and also 0 files in ccache -s reports making it more obvious that something is wrong.
  • And more generally, it practical to be able to manually reset only the core cache (which is fast to rebuild) and keep the packages cache unchanged if needed. Which also provides a solution for BLD Use outer ccache wrapper #1805 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, given that build-core and build-packages don't overlap at all in what they build, there isn't even any benefit to sharing the caches.

- ~/.ccache
key: -{{ checksum "Makefile.envs" }}-v20201205-{{ .BuildNum }}
- /root/.ccache
key: -core-{{ checksum "Makefile.envs" }}-v20210910-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the assumption here that the ccache will only be updated if Makefile.envs changes? Maybe it would be better to key also on Makefile and cpython/Makefile?

@rth rth merged commit aa1ce3a into pyodide:main Sep 11, 2021
@rth rth deleted the fix-ccache branch September 11, 2021 12:24
jerinphilip added a commit to jerinphilip/bergamot-translator that referenced this pull request Feb 2, 2022
Enables ccache for emscripten. The configuration uses pyiodide for a
reference (pyodide/pyodide#1805).

Two workflows to run on macOS and Ubuntu, reduced to one on Ubuntu. As
emscripten and the target is cross-platform, also macOS runners being
limited - it makes sense to have this removed.

Upload artefact enabled in preparation for a release action to be
scheduled which will upload the bergamot*.wasm and bergamot*.js for
consumption.
jerinphilip added a commit to browsermt/bergamot-translator that referenced this pull request Feb 2, 2022
Enables ccache for emscripten. The configuration uses pyiodide for a
reference (pyodide/pyodide#1805).

Two workflows to run on macOS and Ubuntu, reduced to one on Ubuntu. As
emscripten and the target is cross-platform, also macOS runners being
limited - it makes sense to have this removed.

Upload artefact enabled in preparation for a release action to be
scheduled which will upload the bergamot*.wasm and bergamot*.js for
consumption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CPython build does not use ccache
2 participants