-
Notifications
You must be signed in to change notification settings - Fork 2.4k
directives: lazy execution for package metadata #51881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
176ca08 to
3ce6b09
Compare
|
As a possible next step, not included in this PR, also Edit: submitted #51884 nonetheless, because it simplifies the when stack to |
3ce6b09 to
d973407
Compare
d359abf to
ed2a9aa
Compare
|
Looks like it doesn't affect hashes in spack/spack-packages#3139; combined with "lazy |
This refactors the directive system to execute directives lazily. Directives are now queued during class definition and only executed when the corresponding dictionary on the package class is accessed (e.g., `pkg.dependencies`, `pkg.versions`). This reduces overhead during cache population, and in non-forking build sub-processes where `spec.package` is used and metadata isn't needed. - `DirectiveDictDescriptor` is a singleton descriptor shared across package classes. It handles the lazy initialization of package attributes. - For each dictionary, the set of directives affecting it is computed automatically. For example, accessing `pkg.extendees` automatically triggers the execution of both `extends` and `depends_on` directives. - The actual dictionary is now stored at `pkg._dependencies` and so on. It is initially set to `None` to indicate the directives have not run yet. - `_execute_depends_on` now applies nested patches immediately via `_execute_patch` instead of going through `patch(...)` internally. This prevents state corruption in the global directive queue when `depends_on` is evaluated lazily. - `maintainers` directives remain eager to ensure backward compatibility with packages that define maintainers as a simple class-level list. - `PackageBase._patches_dependencies = True/False` is set without evaluation of `depends_on` and `extends` directives. This allows us to optimize patch indexing by avoiding evaluation of these directives if none of them produce patches. Signed-off-by: Harmen Stoppels <me@harmenstoppels.nl>
Signed-off-by: Harmen Stoppels <me@harmenstoppels.nl>
ed2a9aa to
dd4bb2c
Compare
alalazo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The case that seems most complicated is the one of patched dependencies, but we are still placing information in the same place as before, so ✔️
I agree with that. I'd say I would not see as breaking a change that can be fixed in a way that is also backward compatible. |
|
Waiting on @tgamblin to give ✔️ before merging |
|
FWIW, I can follow up with an audit that explicitly executes directives, like: for pkg in repo.all_package_classes():
for attr in directive_dicts:
getattr(pkg, attr) |
|
I'm ok with this (enthusiastic, even) if we can add an audit like the one @haampie mentions. I think we need that soon though -- otherwise I think we're going to start getting a lot of packages with invalid constraints. |
lib/spack/spack/package_base.py
Outdated
| #: Set to true when this package has directives that specify patches on dependencies. Can be | ||
| #: used as an optimization to avoid execution of :func:`~spack.directives.depends_on` and | ||
| #: :func:`~spack.directives.extends` directives when looking for all patches defined by this | ||
| #: package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment should be clear that the directive does this automatically, so that users (or future developers) don't think that packagers need to set this. I had to dig around to verify this.
lib/spack/spack/directives_meta.py
Outdated
| _descriptor_cache: Dict[str, "DirectiveDictDescriptor"] = {} | ||
| #: Set of all known directive dictionary names from `@directive(dicts=...)` | ||
| _directive_dict_names: Set[str] = set() | ||
| #: List of directives to be executed at class initialization time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe what the str key is in this dictionary.
lib/spack/spack/directives.py
Outdated
| def _execute_depends_on( | ||
| pkg: PackageType, | ||
| spec: spack.spec.Spec, | ||
| spec: Union[SpecType, spack.spec.Spec], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is SpecType set to str? If so, we should just use str. I would expect SpecType to be Union[str, spack.spec.Spec].
tgamblin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stuff - otherwise LGTM.
lib/spack/spack/directives_meta.py
Outdated
| #: Whether the package being defined patches dependencies | ||
| _patches_dependencies: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say who sets this.
lib/spack/spack/directives_meta.py
Outdated
| directives.remove(directive) # iterations ends, so mutation is fine | ||
| def _remove_kwarg_value_directives_from_queue(value) -> None: | ||
| """Remove directives found in a kwarg value from the execution queue.""" | ||
| # Certain keyword argument values of directives may themselve be (lists of) directives. An |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Certain keyword argument values of directives may themselve be (lists of) directives. An | |
| # Certain keyword argument values of directives may themselves be (lists of) directives. An |
Signed-off-by: Harmen Stoppels <me@harmenstoppels.nl>
|
Thanks, incorporated the feedback including an audit. Also avoided the duplication of |
This refactors the directive system to execute directives lazily. Directives
are now queued during class definition and only executed when the corresponding
dictionary on the package class is accessed (e.g.,
pkg.dependencies,pkg.versions). This reduces overhead during cache population, and in non-forkingbuild sub-processes where
spec.packageis used and metadata isn't needed.DirectiveDictDescriptoris a singleton descriptor shared across packageclasses. It handles the lazy initialization of package attributes.
automatically. For example, accessing
pkg.extendeesautomatically triggersthe execution of both
extendsanddepends_ondirectives.pkg._dependenciesand so on. It isinitially set to
Noneto indicate the directives have not run yet._execute_depends_onnow applies nested patches immediately via_execute_patchinstead of going throughpatch(...)internally. This preventsstate corruption in the global directive queue when
depends_onis evaluatedlazily.
maintainersdirectives remain eager to ensure backward compatibility withpackages that define maintainers as a simple class-level list.
PackageBase._patches_dependencies = True/Falseis set without evaluation ofdepends_onandextendsdirectives. This allows us to optimize patch indexing byavoiding evaluation of these directives if none of them produce patches.
In the current form, the PR has two "breaking" changes:
patches=...is kwarg-only in thedepends_onandextendsdirectives.Previously this was a positional argument.
patcheskwarg ofdepends_onand
extends. Previously all args and kwarg values where searched for nesteddirectives.
In principle (1) is a breaking change to the package API, but pragmatically speaking,
I would argue (a) it was an oversight from us to keep
patchesas a positional argument,and (b) it's not breaking in practice for all packages in
spack/spack-packages. If itbreaks an internal repo, the change of adding
patches=[...]is trivial.The descriptor indirection does not seem to impact the "setup" phase of the
solver.
The first use of Spack is significantly faster. Below are results from a best of 3.
Before (855e07a):
After (5ee3706):
The
list(spack.repo.PATH.all_package_classes())benchmark is now useless. Witha few other PRs combined (#51875, #51836, #51879, #48771), the first-use time reduces
further to under 5 seconds, or 3x faster than reference.