-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(#56) blog about caching #58
Changes from 6 commits
b46d9c1
805d79c
ca11fee
b8789f7
198cb96
96e9f05
4d30d65
3cdae01
ed510bc
5c2fa3e
ad6e8eb
9e6a736
daab2ce
25396af
5c065a5
8f27368
f25314b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,241 @@ | ||
--- | ||
layout: post | ||
date: 2024-02-06 | ||
title: "Build cache in EO and other build systems" | ||
author: Alekseeva Yana | ||
--- | ||
|
||
|
||
## Introduction | ||
Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an | ||
assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools, | ||
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more. | ||
The subject of this article is caching, because completed tasks caching allows not to spend resources again. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "The subject of this article is caching." The other is obvious:
|
||
So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Caching speeds up a "build time" or "program execution", not "programs work". |
||
While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Do you have particular links to these issues? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @volodya-lombrozo do you mean that I should to attach a link to the issue where the error occurred? |
||
for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 It's hard to grasp without a context:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @volodya-lombrozo Do you have an example of context? Should it be code or diagram? |
||
compilation time and caching time is not the most reliable verification. Unit tests were written showing that | ||
cache does not work correctly. Also reading a file was necessary for getting a programme name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Unit tests were written to demonstrate that the cache does not function correctly. Additionally, reading a file was required to obtain a program name, which slowed down the assembly process." By the way, what is the "assembly proccess"? A reader might not be familiar with this term. |
||
that slowed down an assembly. | ||
That we came to conclusion that we need caching with a reliable verification which does not require reading a file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
from disk. And using cache should save us enough time for building a project. | ||
|
||
The goal of this article is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 This sentence might be connected with the previous one: "The subject of this article is caching." |
||
and to create effective caching in [EO](https://github.com/objectionary/eo). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 "create" -> "implement" |
||
|
||
<!--more--> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 "More"? |
||
|
||
## Build caching of existing build systems | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 What about "Caching in Build Systems" ? or " Caching in Other Build Systems". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @volodya-lombrozo I will choose " Caching in Other Build Systems" |
||
|
||
### ccache/sccache | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Is it a build system or what? Where is the link? Short description? |
||
In compiled programming languages, building a project takes a long time. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 "long time" ? How much is it? I build all my projects relatively fast. |
||
The reason of long compilation is time is spent on preparing, optimizing and checking the code, and so on. | ||
To speed up the assembly of compiled languages, ccache and sccache are used. | ||
Let's look at the compilation scheme using C++ as an example, | ||
to imagine the build process in compiled languages: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Maybe we can we change "Imagine" to "Visualize"? What do you think? |
||
|
||
<p align="center"> | ||
<img src="/images/ccache.svg"> | ||
</p> | ||
|
||
1) First, preprocessor gets the input files. Input files are code files and header files. | ||
The preprocessor removes comments from the code and converts the code into in accordance | ||
with macros and executes other directives, starting with the “#” symbol | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 No need to describe how exactly a preprocessor works. It's important that we get at the end of this phase. |
||
(such as #include, #define, various directives like #pragma). | ||
The result is a single edited file with human-readable code that can be submitted to the compiler. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Maybe we need to write a short summary 1-2 sentences about this type of caching? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @volodya-lombrozo The principle of caching in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 I mean ccache and sccache altogether. What is the difference with other types of caching? Why did you choose these tools? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @volodya-lombrozo I wrote above that I looked at well-known used build systems. Isn't this enough? |
||
|
||
2) The compiler receives the finished code file and converts it into machine code, presented in an object file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 "finished" code? What does it mean? |
||
At the compilation stage, parsing occurs, which checks whether the code matches | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
rules of a specific programming language. Next, the code is parsed into machine code according to the rules. | ||
At the end of its work, the compiler optimizes the resulting machine code and produces an object file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 "At the end of its work" -> "At the end" |
||
To speed up compilation, different files of the same project are compiled in parallel, | ||
that is, we receive several object files at once. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 This is redundant:
|
||
|
||
3) After all received project object files are passed to the linker. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 What does it mean:
Is it "After all, received project object files are passed to the linker." Maybe it's better just use "Then, object files are passed to the linker.", or better: |
||
Linker is a program that combines program components, written in assembly language or a high-level programming language, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 I though that Linker combines object files? |
||
to an executable file or library. The result of the linker is an executable .exe file. | ||
|
||
|
||
As a result, in compiled languages, multiple files are simultaneously and independently converted into machine code at the compilation stage. | ||
This machine code is then combined into one executable file. | ||
|
||
|
||
`ccache` has two main caching methods они: | ||
1) `Direct mode` - hashcode is generated based on the source code. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Which "hashcode" do you mean? You gave the definition below, that paragraph positioning confuses a lot. I have to skip this part and then return to it after. |
||
2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor. | ||
|
||
The hashcode includes information: file contents, directory, compiler information, compilation time, extensions | ||
used by the compiler. A compressed machine code file is placed in the cache using the received key. | ||
|
||
`Direct mode` compiles the program faster, since the preprocessor step is skipped. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 You explains two modes by using this template:
Looks strange, maybe it's better to explain one mode and the move to the another?
|
||
But header files are not checked for changes, so the wrong project may be built. | ||
`Preprocessor mode` is slower than `direct mode`, but right project is built always. | ||
|
||
Sccache, unlike ccache, allows to store the cache not only locally but also in the cloud, | ||
and it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate). | ||
|
||
|
||
### Maven | ||
`Maven` automates and manages Java-projects build. Building a project in `Maven` is completed in three | ||
maven [LifeCycles Maven](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html), | ||
which consist of `phases`. `Phases` in turn consist of sets of `goals`. | ||
|
||
`Maven` has default `phases` and `goals` which build any projects: | ||
|
||
<p align="center"> | ||
<img src="/images/defaultPhaseMaven.svg"> | ||
</p> | ||
|
||
In `Maven` all phases and goals are executed strictly in order, linearly. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 So, Maven doesn't use caching at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @volodya-lombrozo As far as I understand, that Maven can use added extensions from Gradle for caching. Or Maven can rebuild only changed project modules. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Maven has There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 It's not about Gradle, I guess: https://maven.apache.org/extensions/maven-build-cache-extension/ |
||
But in `Maven` there is no build-time caching as such. | ||
`Maven` suggests rebuilding only changed project modules to speed up the build process. | ||
|
||
### Gradle | ||
`Gradle`, like `Maven`, builds a project in | ||
[LifeCycles Gradle](https://docs.gradle.org/current/userguide/build_lifecycle.html), which consists of phases. | ||
But unlike `Maven`, `Gradle` builds projects using a task graph - | ||
[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Maybe it's better to give a link to a "Gradle task graph" instead? Why do I need to read about DAGs? |
||
in which some tasks can be executed synchronously. | ||
To speed up project builds, `Gradle` uses incremental builds | ||
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work). | ||
For an incremental build to work, the tasks that are used to build the project must have | ||
source and output files must be specified. | ||
``` | ||
task myTask { | ||
inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @volodya-lombrozo I have fixed this example:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 good |
||
outputs.dir 'build/classes/java/main/MyTask.somebody' // Specify the output directory | ||
|
||
doLast { | ||
// Task actions go here | ||
// This code will only be executed if the inputs or outputs have changed | ||
} | ||
} | ||
``` | ||
Every time before executing a task, `Gradle` makes a fingerprint of the path | ||
and contents of the source files and saves it. | ||
If the task completes successfully, then `Gradle` also makes a fingerprint from the resulting files. | ||
To avoid re-fingerprinting the original files, `Gradle` checks the last modification time and the size of the original | ||
files before reassembling. Thus, when the project is rebuilt, some or all of the tasks may be | ||
not completed, but to use the results already obtained. | ||
`Gradle` also stores fingerprints of previous builds so that projects can be built quickly, for example when switching | ||
from one branch to another - `Build Cache`. | ||
|
||
|
||
|
||
|
||
### EO build cache | ||
|
||
EO code is compiled using the `Maven` build system. | ||
For this purpose, the `eo-maven-plugin` plugin was written, | ||
which contains the goals necessary for working with EO code. | ||
As was written above, the assembly of projects in `Maven` occurs in a certain order of phases. | ||
In the diagram you can see the main phases and their goals for the EO version of the compiler (specify version): | ||
|
||
<p align="center"> | ||
<img src="/images/EO.svg"> | ||
</p> | ||
|
||
In [Picture 3](/images/EO.svg) the goals from the `eo-maven-plugin` | ||
are highlighted in green. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 What is the conclusion? Why did you mention Maven? Does this caching similar to Grade? to ccache? What is the difference? |
||
|
||
But the actual work with EO code takes place in `AssembleMojo`. | ||
`AssembleMojo` is the goal consisting of other goals that work with the EO file | ||
[Picture 4](/images/AssembleMojo.svg). | ||
|
||
|
||
<p align="center"> | ||
<img src="/images/AssembleMojo.svg"> | ||
</p> | ||
|
||
Each goal in `AssembleMojo` is a specific compilation step for EO code, and we need to use | ||
caching at each step to speed up the assembly of the EO program. | ||
|
||
In EO version `0.34.0`, | ||
caching for different `Mojo` was done using unrelated different `Footprint` and `Optimization` interfaces, | ||
within which mostly the same methods were used. | ||
The difference between interfaces is that in `Footprint` the EO version of the compiler is checked, | ||
while the rest of the checks are exactly the same. | ||
|
||
|
||
Now goals are `ParseMojo`, `OptimazeMojo` и `ShakeMojo` , in which caching can be applied, | ||
have directory of results and directory of cache. | ||
|
||
|
||
The disadvantages of initial caching in EO: | ||
* the compilation time and the time of saving to the cache must be equal. | ||
The problem with this verification is that the moment of compilation and the moment of saving to the cache must coincide. | ||
* verification data is read from a file on disk. This is a long and expensive operation. | ||
* each purpose uses its own classes and interfaces for data caching. | ||
This makes the code difficult to extensibility and readability. | ||
|
||
|
||
Therefore, our target is to create a single class responsible for caching data | ||
and loading the necessary data from the cache, which can be used for any `Mojo` from the `eo-maven-plugin`. | ||
|
||
|
||
How do we want to fix this disadvantages: | ||
1) Create a new class `Cache` that will be responsible for data verification, saving to cache and loading from cache. | ||
|
||
``` | ||
public class Cache { | ||
|
||
private List<CacheValidation> validations; | ||
|
||
public Cache(final List<CacheValidation> cv) { | ||
this.validations = cv; | ||
} | ||
|
||
public Optional<XML> load(final Path source, final Path cache) {...}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 Maybe it's better to make |
||
|
||
public void save(final Path cache, final Scalar<String> program, final Path relative) {...}; | ||
} | ||
``` | ||
|
||
|
||
`List<CacheValidation>` is a list of validations that are implemented from the `CacheValidation` interface. | ||
Different validations can be applied for different `Mojo`. | ||
|
||
|
||
``` | ||
public interface CacheValidation { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 I didn't grasp the idea why we might need this class and why it has exactly this implementation. |
||
boolean validate(final Path source, final Path cache) throws IOException; | ||
} | ||
``` | ||
|
||
2) To avoid reading from disk, we will use file paths `Path`. | ||
The classes `Path` and `Files` have methods to obtain the necessary information. | ||
|
||
|
||
3) The relevance of the cached data will be checked by the condition | ||
that the time of the last modification of the source file must be earlier than or equal to that saved in the cache. | ||
|
||
These solutions will speed up compilation in the build system `Maven`. | ||
|
||
|
||
### Conclusion | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Yanich96 I don't think we need such a conclusion in a blog post. It isn't a scientific article. Moreover it doesn't provide any useful information. Kinda "water". |
||
There is an EO program `program.eo`, which is launched for the first time. | ||
At each `Mojo` stage, the execution results will be saved to the cache of the current `Mojo`. | ||
If this program is run again, these `Mojo` will receive data from the cache, | ||
without wasting time and computer resources on recompilation. | ||
If we change something in the `program.eo` file, the program will have to be recompiled, | ||
since the last modification time the original file will be later than those stored in the cache. | ||
As a result of `Mojo` work, the cache was overwritten. | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96
"Empty words". We can remove them without losing any meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96
Why do I need this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I written it to start this blog. I will delete these suggestions if they are not necessary.