-
-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLD Use outer ccache wrapper #1805
Conversation
It's still not working. Either I'm missing something or there is an issue with the ccache setup in emdk. Opened emscripten-core/emsdk#876 |
The status on this is that ccache is correctly being called now during compilation. However for some reason it doesn't recognize the files to compile as being identical between different CI runs and so we consistently get cache misses. Need to investigate that next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's finally working:
Build time (min) | build emsdk+cpython+core | build packages |
---|---|---|
Without ccache | 20 | 80 ± 10 |
ccache (main) | 17 ± 2 | 80 ± 10 |
ccache (this PR) | 14 ± 2 | 44 |
@@ -8,6 +8,9 @@ defaults: &defaults | |||
- EMSDK_NUM_CORES: 3 | |||
EMCC_CORES: 3 | |||
PYODIDE_JOBS: 3 | |||
# Make sure the ccache dir is consistent between core and package builds | |||
# (it's not the case otherwise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something to do with emsdk activate, by it in any case we were previously caching the wrong path for packages
.circleci/config.yml
Outdated
- ~/.ccache | ||
key: -{{ checksum "Makefile.envs" }}-v20201205-{{ .BuildNum }} | ||
- /root/.ccache | ||
key: -core-{{ checksum "Makefile.envs" }}-v20210910- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer store the cache for each build, as then we are not sure which one gets used, and it tends to grows with each build if there are cache misses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the assumption here that the ccache will only be updated if Makefile.envs
changes? Maybe it would be better to key also on Makefile
and cpython/Makefile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a balance minimizing cache resets and doing that if something major changes. Currently Makefile.envs
includes all the global flags.
If I look at git history,
Makefile
most of the changes there are either about js packaging, linting or something other about build that doesn't affect compilationcpython/Makefile
: most changes also also rather local (even libffi addition). Also I'd like to move most of those under the main build system as standalone packages eventually if possible.
The cost of rebuilding without cache is roughly 40min, and the issue is that the cache for each fork is independent as far as I can tell, so it's multiplied by the number of contributors. So I think even if a few compilation options change for a subset of files in a package (including CPython), and we get some cache misses overall it's probably still fine to keep the same cache. With the assumption that we update emsdk periodically which resets the cache anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, yes for the core build we can certainly include cpython/Makefile
in the hash since the core/package caches are separate.
source pyodide_env.sh | ||
|
||
# Set mtime for EM_CONFIG to avoid ccache cache misses | ||
touch -m -d '1 Jan 2021 12:00' emsdk/emsdk/.emscripten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source: https://github.com/juj/ccache/blob/d69e1d05d68b06a256f00a86a38de644ff23b66d/src/emcc.cpp#L627
This can probably be switched to hashing the file, but mtime+size is the default ccache strategy for compilers apparently (see CCACHE_COMPILERCHECK
)
@@ -41,20 +44,29 @@ jobs: | |||
|
|||
- restore_cache: | |||
keys: | |||
- -{{ checksum "Makefile.envs" }}-v20201205- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to keep the core
and packages
ccache separate? The packages
ccache should include caches from building various packages? Perhaps we could use the same cache for both but only save_cache
on build-packages
? Perhaps keeping them separate is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The packages ccache should include caches from building various packages?
Yes.
Perhaps we could use the same cache for both but only save_cache on build-packages? Perhaps keeping them separate is simpler.
I think would I rather prefer to have them separate,
- first, there are surprisingly many things that could go wrong with ccache (particularly with this little documented emsdk fork we are using), and than you have to debug by, setting,
storing that folder as artifacts, than comparing
export CCACHE_DEBUG=1 export CCACHE_DEBUGDIR="/root/ccache-debug/"
*.ccache-input-text
and*.ccache-log
files for subsequent rebuilds to see why there are cache misses. It's easier to do that if there are no inter-dependencies for cache between jobs. - for instance on main, ccache is broken for packages because cache loaded contains only files from core. It's hard to detect that because you see "33MB of ccache loaded" and so you think well maybe that's what's supposed to happen. If caches were seprate you would see "0MB of ccache loaded" and also 0 files in
ccache -s
reports making it more obvious that something is wrong. - And more generally, it practical to be able to manually reset only the core cache (which is fast to rebuild) and keep the packages cache unchanged if needed. Which also provides a solution for BLD Use outer ccache wrapper #1805 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, given that build-core
and build-packages
don't overlap at all in what they build, there isn't even any benefit to sharing the caches.
.circleci/config.yml
Outdated
- ~/.ccache | ||
key: -{{ checksum "Makefile.envs" }}-v20201205-{{ .BuildNum }} | ||
- /root/.ccache | ||
key: -core-{{ checksum "Makefile.envs" }}-v20210910- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the assumption here that the ccache will only be updated if Makefile.envs
changes? Maybe it would be better to key also on Makefile
and cpython/Makefile
?
Enables ccache for emscripten. The configuration uses pyiodide for a reference (pyodide/pyodide#1805). Two workflows to run on macOS and Ubuntu, reduced to one on Ubuntu. As emscripten and the target is cross-platform, also macOS runners being limited - it makes sense to have this removed. Upload artefact enabled in preparation for a release action to be scheduled which will upload the bergamot*.wasm and bergamot*.js for consumption.
Enables ccache for emscripten. The configuration uses pyiodide for a reference (pyodide/pyodide#1805). Two workflows to run on macOS and Ubuntu, reduced to one on Ubuntu. As emscripten and the target is cross-platform, also macOS runners being limited - it makes sense to have this removed. Upload artefact enabled in preparation for a release action to be scheduled which will upload the bergamot*.wasm and bergamot*.js for consumption.
This should be a better way to use ccache with emsdk according to emscripten-core/emscripten#13498
Investigating into whether this would fix #1007