-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Improve deterministic builds #5965
Conversation
CT Test Results 33 files 824 suites 6h 45m 26s ⏱️ For more details on these failures, see this check. Results for commit da001d9. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It looks good aside from a few nits. I'll add it to our nightly builds as soon as they're fixed, but unfortunately we've already released the last release candidate for 25 so the changes won't make it in until OTP 25.1 or OTP 26.
@jhogberg thanks for the feedback, I've resolved those formatting issues. Please let me know if you need anything else 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'd like to see some more additions to the documentation of the various tools that are updated by this.
Regarding making this the default, we would have to make sure that tools such as Erlang-LS or meck do not rely on any information stripped. If they work with deterministic files (I don't know if they are affected or not) then I am for making this the default.
I think I've resolved all the failures related to this change |
Yes, I agree. My hope is that we can merge this, then try it out by explicitly opt-ing in with the new flag, which should let us flush out any issues with other tools. |
Just looking at one last unit test failure which I suspect is due to the test not being set up in a way that will execute correctly in the GitHub CI. |
Makes the EPP module use basenames rather than absolute paths for all -file attributes / ?FILE macros when +deterministic is set. Previously, EPP did not respect +deterministic in all scenarios, so build output could still contain absolute paths.
Makes asn1ct_gen filter out potentially non-deterministic attributes from generated .erl files when +deterministic is set.
Makes generated leex scanners only use basenames in generated -file attributes, rather than absolute paths when +deterministic is set.
Makes generated yecc parsers use only basenames in generated -file attributes rather than absolute paths when +deterministic is set.
Makes compile pass along the +deterministic flag for epp to utilise.
Makes test_lib avoid a crash if +deterministic is enabled to tests. +deterministic strips the compilation options entirely, which test_lib wasn't able to handle. Now, an empty list is returned in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the target branch to maint
so that we can include this in OTP 25.1.
I have two more nit picks. When you have fixed them, I will add this PR to our daily builds.
Thanks for your feedback, @bjorng. I am not sure precisely the formatting convention to use, but hopefully it's better now. |
Regarding CI: The test failures seem unrelated and transient: each time I rerun, a different thing seems to fail and I don't think it's related to my change. |
Yes, there unfortunately are some unstable test cases. |
What I meant about the indentation being off is that I have pushed to your branch a suggestion for a different way to handle non-existing option lists. I have now added this PR (along with updates to the primary bootstrap and the preloaded modules) to our daily builds. |
The test cases |
If I understand correctly, then I think them failing is unavoidable, since the installed Erlang will not have the appropriate determinism features? |
Ah, it has been explained to me that you build the current OTP version, but run |
Add a --enable-deterministic-build to the configure script, which sets ERL_DETERMINISTIC=yes throughout the relevant Makefiles, which then invoke the relevant build stages with the +deterministic option. This addresses absolute paths being included in generated .erl files and compiled .beam files that resulted in builds from different source directories generating different artefacts (which is a component of the issue in #4482). I think it would make sense to make this the default at some stage, but I've put the change behind a flag for now to decouple making deterministic OTP builds possible from making them the default. Having +deterministic set results in compiler options being removed from the module info for modules where this options was used. This may have other implications for users of OTP. For tests themselves, +determinism is not set, since many test cases depend on accessing the test module's compilation options, or other features not available in deterministic mode, in order to configure themselves. For tests of the determinism feature specifically, +deterministic must be explicitly passed to the compiler within the relevant test cases.
Thanks, added to our daily builds. |
Thanks for your pull request. |
Implements a new `dialyzer --incremental` mode. Incremental mode primarily differs from the previous, "classic", ways of running Dialyzer, in that its model is optimised around the common usecase of regularly analysing a single codebase, tweaking the code, analysing it again, and so on, without explicit reference to the building and checking of a PLT. In this mode the PLT file acts much more like a true cache, where users provide a codebase and a set of files they care about, and Dialyzer does the legwork in terms of deciding how to most efficiently report all of the relevant warnings given the cached results it may already have in the PLT (and if a PLT doesn't exist, incremental mode will make one). This change also includes additional supporting options to aid with debugging, for example, for dumping the dependency graph as incrementality sees it (which modules depend on which other modules via function calls, type definitions, etc.), and which modules Dialyzer thinks changed, and hence which modules it decided to (re-)analyse. Example usage scenarios: - Analyse two apps and report warnings on both: `dialyzer --incremental --apps dialyzer erts` - Analyse two apps, and report warnings on only one: `dialyzer --incremental --apps dialyzer erts --warning_apps dialyzer` - Analyse the apps given in config*: `dialyzer --incremental` - Debug an analysis by dumping the dependency graph Dialyzer computed: `dialyzer --incremental --apps dialyzer erts --dump_full_dependencies_graph ~/deps.dot` *Config defaults to standard location such as `~/.config/erlang/dialyzer.config`, but can be overridden via the `DIALYZER_CONFIG` environment variable. An example config: ``` {incremental, {default_apps, [stdlib, erts, kernel, compiler, dialyzer]}}. ``` Further design details ====================== Key considerations: - "Pay for what you use" - Correctness - Portability - Migratability "Pay for what you use" ---------------------- One key principle of the new incremental mode was that users should be able to invoke Dialyzer and have the tool itself work out precisely which parts of the analysis can be read straight back from the cache, and which need to be (re-)analysed, due to any changes in the code since the PLT was last updated. Relative to the typical cost of the Dialyzer analysis itself, the administrative overhead of incrementality is trivial in practice. There are some extra things kept in the PLT (such as a record of the warnings previously computed) which need to be read/written, and some changes to how modules are hashed when checking for changes, but these costs aren't big, and some other optimisations, such as indexing jobs more efficiently, mean that I've not seen an incremental Dialyzer analysis that is any slower than the corresponding building/checking of a PLT in the classical way. The gains from using incremental mode depend significantly on the changes made to the code between two incremental analyses, both in terms of the number of modules directly modified by the change, but also the size of the transitive closure of the modules which depend on the modified code. For usage scenarios with large changes, or changes to modules upon which many other modules transitively depend, the benefits of incremental mode will be more modest, because fewer of the cached results will still be unaffected by the change. For usage scenarios where the changes are at the "leaf" of the module dependency graph, the benefits of incremental mode will be more significant. Re-running an incremental analysis without making code changes should be very fast indeed, since the result of the analysis can be read entirely from the cached results in the PLT. Correctness ----------- The earlier PR to erlang/OTP erlang#5498 improved Dialyzer's tracking of dependencies in preparation for this change. The result should be that running Dialyzer in incremental mode for a set of modules should result in it "doing the right thing", giving the same results as an analysis from scratch, but leveraging an existing PLT to potentially avoid doing some unnecessary work. I've tested incremental mode in production for a large codebase for many months, comparing the issues it reports with those reported by building/checking a PLT in the classical way, and I have seen no discrepancy between the two in practice. I've noticed a practical improvement in correctness for some users who would attempt to build, merge and check PLTs for subsets of their code manually, in order to approximate something like incrementality. Unfortunately, doing this manually risked giving incorrect results, because it was easy to end up not analysing a module that was transitively affected by a change to the codebase. For simplicity and correctness reasons, incremental mode does not support merging PLTs as it stands. Portability ----------- Incremental mode can provide significant performance wins if most of the analysis can be cached. This benefit can be leveraged by sharing a PLT file between checkouts or even machines, allowing, for example, a previous run on a CI/CD server to generate a PLT file that a user can then base their local analyses on. In order to make this sort of PLT sharing portable, Dialyzer needed to stop using absolute paths for indexing modules in the PLT and internally, and switch to indexing on module name (which must already be unique within a single codebase). A corollary of this is that warning messages from incremental mode refer to only the module name, rather than the full path. To offer some mitigation for this, incremental mode can be configured to write out the mapping from module name to absolute path separately to the PLT file, for use cases which may need this information (for example, for reporting in IDE or CI/CD). When building code from a different checkout or on a different machine, build determinism becomes quite important. Previously, building code from a different absolute path would result in different build artefacts (see, for example, erlang/OTP issue erlang#4482), which can affect Dialyzer's perception of whether the code has changed. To mitigate this, Dialyzer's module hashing logic now ignores some non-semantic features, such as -file attribute paths, but running on top of a deterministic build will still give the best results. erlang/OTP PR erlang#5965 makes this much easier. Migratability ------------- Switching to the incremental mode from building/checking a PLT means skipping any `--build_plt` step, and passing `--incremental` rather than `--check_plt`, and providing those files, directories or apps to analyse, either as command line arguments or via config, rather than than relying on a list of files that were already stored in the PLT. This change introduces a distinction between classic PLT (CPLT) and the new incremental PLT (IPLT) files. The two formats contain slightly different information, and aren't compatible. This change adds new warnings which detect and report if an IPLT is used when a CPLT file was expected, or vice versa. The hope is that, by having a clear distinction between the two, mistakes during migration to incremental mode will be picked up quickly. The separation of the `dialyzer_plt` module into `dialyzer_cplt` and `dialyzer_iplt` for the serialisation-specific logic with some intentional duplication between the two should allow the two to naturally evolve independently (there's no technical reason why they must operate similarly), and give a route to deprecating the classic mechanism in time. Incremental mode offers the ability to analyse a set of modules, but only report the warnings on a subset of them. One usecase for this is where the codebase is made up of first-party code, for which you want warnings, and some third-party dependencies, which you don't want to see warnings for, but are needed as "context" for the analysis of the first-party code. This was implicitly possible in classic Dialyzer modes by virtue of the arguments used when building and checking a PLT file.
Improves build determinism for non-native code by:
+deterministic
option (for all code, OTP or otherwise)--enable-deterministic-build
flag toconfigure
that controls OTP's build determinism by setting+deterministic
appropriately in its MakefilesThis addresses absolute paths being included in generated
.erl
files and compiled.beam
files that resulted in builds from different source directories generating different artefacts (which is a component of the issue in #4482, but is also useful in its own right, for example around caching builds and for the upcoming Dialyzer incremental mode).OTP itself can now be built in deterministic mode by giving the
--enable-deterministic-build
flag toconfigure
. I think it would make sense to make this the default at some stage, but I've put the change behind an option for now in an attempt to decouple making deterministic OTP builds possible from making them the default.Having
+deterministic
set also results in the compiler options being removed from themodule_info
for modules where this options was used. This may have other implications for users of OTP.For tests themselves,
+deterministic
is disabled, since many test cases depend on accessing the test module's compilation options or other features not available in deterministic mode, in order to configure themselves. For tests of the determinism feature in particular,+deterministic
is explicitly passed to the compiler within the relevant test cases.To run a deterministic build, you can execute a script such as the following:
This PR is a reformulated version of #5863, which splits the change up into more manageable commits that can be reviewed separately.