Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cache to reduce CI build time #15028

Open
ArkadiuszMichalski opened this issue Apr 22, 2024 · 10 comments
Open

Use cache to reduce CI build time #15028

ArkadiuszMichalski opened this issue Apr 22, 2024 · 10 comments
Assignees

Comments

@ArkadiuszMichalski
Copy link
Contributor

GitHub Actions offers us a 10 GB cache that can be used for some optimizations. For example we can remember frequently downloaded/generated files to eliminate performing constantly repetitive work.

I checked some of our bottlenecks and here are my conclusions:

  • Downloading and installing Python libraries takes about 20-25 seconds. This is still not much and there is little that can be improved, We have to remember that restoring from the cache also takes some time, so in this case the profit would be small.
  • Updating Msys2 and gcc/make takes about 45-55 seconds. It's a bit more, but it's hard to improve. The problem is that when there are new versions of gcc/make, they can take up a lot of space after unpacking (for example, 60MB becomes 400MB). Cache for man/pacman managers for downloaded packages (Msys2\var\cache folder) does not improve much, because restoring from the cache is comparable to downloading itself, and then we still need to install the packages. Caching the entire Msys2 folder after update (and packing/extracting it to the D drive for better performance) does not improve much, because such a package takes up 180MB (800MB after unpacking), which takes about 40 seconds (the problem here is the long recovery from the cache, not unpacking which takes only 7 seconds).
    We can still try to cache only the files that have changed after update, but a few seconds one way or another won't change much.

For the above reasons, I decided to skip both cases at this time because they do not improve much. Analyzing the topic further, I started looking for files that do not change often. The Scintilla/Lexilla libraries are an ideal candidate for this purpose. What if, after compiling them, we remember the result and restore it when nothing significant has changed? We will no longer have to constantly compile the same thing, which will certainly reduce the build time of all cases. A few solutions that are on my mind:

  • Only the branch (like master) will be able to update the cache, so PRs will only be able to load the cache (this will prevent unnecessary garbage entries in the cache storage).
  • If any significant factor changes, the cache will be skipped. For example, there will be a new version of Scintilla/Lexilla, a new compiler version, or changes in configuration files that affect compilation. In all such cases, the entire compilation will be performed. Here some branch (mainly master) will create a new cache for this specific case that future PRs can use (if they meet the same factors).
  • If we change any Scintilla/Lexilla file, we can consider what to do, whether we should cancel the entire cache or simply compile only the changed files. In practice, no one in PR changes these files, so it's still a marginal situation.
  • We can add to the commit title a [force nocache] flag to skip the cache if necessary (e.g. if we want to see logs and warnings for Scintilla/Lexilla compiling).

@donho What do you think about the cache for Scintilla/Lexilla? Should I try to implement this or would you prefer to always compile these files?

@donho
Copy link
Member

donho commented Apr 22, 2024

@ArkadiuszMichalski

What do you think about the cache for Scintilla/Lexilla? Should I try to implement this or would you prefer to always compile these files?

Sure, we can always try to optimize this part.
In the case the result is not what we expect, we can always revert it.

@donho
Copy link
Member

donho commented Apr 23, 2024

@ArkadiuszMichalski
Rethinking about that, I don't see how it could be possible to not compile Scintilla/Lexilla, without the project files being modified.
For me, the criteria is that vsproj, makefile, CMakeList.txt... etc. should not be modified. Only the GitHub CI conf files can be changed.
Are we on the same page?

@ArkadiuszMichalski
Copy link
Contributor Author

@donho
To achieve this I'm not going to change anything except CI_build.yml. For certain specific factors (which can be calculate in this file) we remember the current compilation result and if the next commits use the same factors, they will simply restore the cache. If the factors are not met, the cache is skipped and work continues as before (i.e. it does the full job). The state of the factors will be remembered in the cache entry name itself.

I just need to determine exactly which factors to take into account for each job. For testing I made a cache for installed Python libraries (because XML checking itself takes only 3 seconds and downloading/installing packages 20-30 seconds), which I personally cannot accept. In this case the factors are the OS name, Python versions and the libraries versions being installed, so the cache entry looks like this:
Windows-python_3.9.13-requests_2.31.0-rfc3987_1.3.8-pywin32_306-lxml_5.2.1
If a future commit has a different value for these factors, it simply does not restore the cache and does the full work.

When cache not exist:
https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8814826384/job/24195528314
image

When cache exist:
https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8814855995/job/24195617290
image
image

Something similar could be done for Scintilla and Lexilla files. For testing purposes I will try the same on Msys2 build because I have a better understanding of the individual stages of compilation here. We'll see what the actual time savings would be by using the cache.

@ArkadiuszMichalski
Copy link
Contributor Author

ArkadiuszMichalski commented Apr 25, 2024

@donho
Rethinking about that, I don't see how it could be possible to not compile Scintilla/Lexilla, without the project files being modified. For me, the criteria is that vsproj, makefile, CMakeList.txt... etc. should not be modified. Only the GitHub CI conf files can be changed. Are we on the same page?

In fact, when it comes to compilation files, it is not that simple. Make (I don't know how other tools work) relies on timestamp to detect file changes. The problem is that Git does not store and restore this data, so the checkout files always have the current time, and this means that from the MAKE point of view such files have always changed relative to the saved cache and it always performs the entire compilation. I completely missed this detail.

I started digging deeper into the topic and found out that it can be solved in two main ways:

  • Write the makefile in such a way that it generates a hash for the file's content and then compares it. This is probably out of the question because our makefile is complicated anyway and I don't intend to modify it too much.
  • Use existing tools that could do everything for us, and that would work with our makefile. Usually all recommended CCACHE, SCCACHE or SCONS.

All this tools are available in Msys2 as package and can be installed quickly. I started doing tests with CCACHE. This only required a slight modification to the build call:

$Env:CCACHE_DIR = "D:\ccache" 
mingw32-make -f PowerEditor\gcc\makefile CXX="ccache g++"

Additionally I changed where the cache for this program should be kept because I don't like the default location.

The problem is that our makefile calls other makefiles (for Scintilla and Lexilla) and all the data ends up in this one folder. So currently I did a test remembering the entire cache and these are the results:
https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8836425509/job/24262979568
image
image

I didn't modify anything, so the entire construction took only 12 seconds. It takes much longer to update Msys2.

Here I modified the Common.h file. Lexilla and Scintilla were completely restored from the cache, and only those files that required were compiled for NPP.
https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8836619889/job/24263643043
image

As I already mentioned, the entire cache for all makefiles is located in the same folder. I am unable to determine which entries are only for Scintilla/Lexilla. Is there anyone here who uses/has used CCACHE and knows how to easily make that Scintilla/Lexilla have its own separate cache folder and Notepad++ will use other?

Maybe it's possible to dynamically pass a different CCACHE_DIR value when calling the Scintilla/Lexilla makefile, which would require a slight modification to our makefile. I will try to do more tests because even locally it would be convenient to have such a tool to use.

One more thing, I don't know if CCACHE will properly track all the dependencies we have here and react to changes. If anyone has any comments why it won't work for building Notepad++, please let me know. It's a waste of time if the tool certainly won't work in our case.

@ArkadiuszMichalski
Copy link
Contributor Author

Okay, I found a way to cache only Scintilla/Lexilla, we need to slightly modify our makefile on the fly (in the same way as we change resource.h file).

  $content = Get-Content -Path 'PowerEditor\gcc\makefile'
  $newContent = $content -replace 'DIR_O', 'CXX="ccache g++" CCACHE_DIR="D:\.cache\ccache" DIR_O'
  $newContent | Set-Content -Path 'PowerEditor\gcc\makefile'

In this way we can limit caching to the mentioned libraries (don't cache everything).

Result without cache (compilation takes 4m 15s):
https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8859089857/job/24328507609
image

Result witch cache (compilation takes 2m 35s):
https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8859141894/job/24328608354
image

A very nice reduction, considering that practically no one touches the Scintilla/Lexilla files on PR (only we update these libraries to newer versions). From what I've seen (ccache/ccache#978) ccache can also be used together with our MSBuild and cmake jobs but at the moment I don't know how to achieve this (I don't use these tools).

@nyamatongwe
Copy link

nyamatongwe commented Apr 27, 2024

GitHub Windows runners have 4 cores or threads so can run make with multiple concurrent jobs like make --jobs=4. The actual number may need tweaking since there may already be action level concurrency between debug and release builds.

https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners

@ArkadiuszMichalski
Copy link
Contributor Author

@nyamatongwe We set this automatically based on the NUMBER_OF_PROCESSORS system variable.

SUBMAKEFLAGS += $(if $(NUMBER_OF_PROCESSORS),-j$(NUMBER_OF_PROCESSORS),)

But the actual profit probably depends on the current load of the entire system/machine.

@ArkadiuszMichalski
Copy link
Contributor Author

ArkadiuszMichalski commented May 2, 2024

@donho

Though makefile is not used for the releases, there are still some people need it for compiling Notepad++. The aim of CI is to verify the state of code so IMO makefile should not be modified, even on the fly. However, I didn't see this part in the PR. So what has been cached exactly? Only the python module?

This change does not affect the compilation of the files, it just sets different variables for Scintilla/Lexilla, because I wanted to cache only them, and this cannot be achieved from outside because all these makefiles use the same variables (names).

          $content = Get-Content -Path 'PowerEditor\gcc\makefile'
          $newContent = $content -replace 'DIR_O', 'CXX="ccache g++" CCACHE_DIR="D:\.cache\ccache\scilex" DIR_O'
          $newContent | Set-Content -Path 'PowerEditor\gcc\makefile'

But I have these observations when we use Ccache:

  • requires downloading it (because it is not available directly), after downloading I also cached it, so it is a one-time operation
  • in the case of Msys2 build the compilation time reduction is nice (as I pointed above )
  • in the case of Cmake build it's worse, we need turn off batch rules, and finally compilation time of Lexill is shortened by half, but with Scintilla it varies (generally the reduction is small, sometimes even longer).

For the above reason I decide to not use Ccache (at least for Msys2 and Cmake jobs). I returned to the concept of caching Scintilla/Lexilla compilation results. Assuming that changing some previously determined factors will cancel the entire cache (so full compilation will run), we can do this:

  • when cache not exist save necessary files after compilation
  • if the established factors have not changed, restore the cache
  • update timestamp for restored files to actual time (this will prevent makefile from doing unnecessary compilation Scintilla/Lexilla).
  • invoke makefile to finish job

I implemented this for Msys2 build and here are the results:
Without cache https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8923086771/job/24506598977
With cache https://github.com/ArkadiuszMichalski/notepad-plus-plus/actions/runs/8923288364/job/24507241026

Btw, for Msys2 job I still had to modify the makefile because we do not expose this condition $(BUILD_DIRECTORY), and I cannot invoke this action before restoring the cache. After restoring the cache, the build folder will be automatically created and then our makefile will not recreate the entire empty structure of this folder, so I have to invoke it manually.

Add-Content -Path 'PowerEditor\gcc\makefile' -Value 'BUILD_DIRECTORY : $(BUILD_DIRECTORY)'
mingw32-make -f PowerEditor\gcc\makefile BUILD_DIRECTORY

This only adds one target at the end that can be called externally, it does not affect the work of the makefile in any way. Of course this could be added permanently to the makefile, but I don't think there's a need to do it just because we need it in GitHub Action.

Something similar can be done for Cmake job. I'll try to do it today because I'm wondering the result compared to Ccache.

Edit: We can invoke build directory rule without modifying the makefile:

mingw32-make -f PowerEditor\gcc\makefile bin.${{ matrix.build_platform }}${{ matrix.build_configuration == 'Debug' && '-debug' || '' }}.build

@ArkadiuszMichalski
Copy link
Contributor Author

ArkadiuszMichalski commented May 3, 2024

Some results for cmake build:

Finally the reduction is noticeable.

Any advice why setting DIR_O variable doesn't work for the included nppSpecifics.mak in scintilla.mak? Both files use the same DIR_O variable, but when I set it from outside, the included one still uses default DIR_O=. value.

I tried the following things:

$Env:DIR_O = 'some path relative or absolute'
nmake -E -f scintilla.mak

or

nmake -f scintilla.mak DIR_O='some path relative or absolute'

or

$Env:DIR_O = 'some path relative or absolute'
nmake -E -f scintilla.mak DIR_O='some path relative or absolute'

I wanted to place the resulting files in a separate folder because by default they mix with the source files in the same folder. Anyone uses nmake and knows what the problem is?

https://learn.microsoft.com/en-us/cpp/build/reference/running-nmake?view=msvc-170

@nyamatongwe
Copy link

setting DIR_O variable doesn't work for the included nppSpecifics.mak

There is no use of $(DIR_O) to place the object in the compile lines of nppSpecifics.mak.

$(DIR_O)\UTF8DocumentIterator.obj:: ../../boostregex/UTF8DocumentIterator.cxx
	$(CC) $(CXXFLAGS) -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -c ../../boostregex/UTF8DocumentIterator.cxx	
# should include -Fo$(DIR_O)\ to be something like
$(DIR_O)\UTF8DocumentIterator.obj:: ../../boostregex/UTF8DocumentIterator.cxx
	$(CC) $(CXXFLAGS) -D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS -c -Fo$(DIR_O)\ ../../boostregex/UTF8DocumentIterator.cxx	

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants