Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
feature: sharing a spack instance #11871
These changes add the ability for spack to operate in a "shared" mode where multiple users can use the same instance of spack without directly affecting other users. Previously, a similar solution was possible via users configuring their local
When shared mode is inactive spack behaves like a normal spack instance. This would allow system admins to configure repos, mirrors, environments, etc. These settings are shared by all users of this instance of spack.
When shared mode is enabled spack would treat the traditional installation locations as an upstream instance of spack, and the typical install/stage/cache/etc locations would be set to a directory that a user could specify by setting
Users could still make their own local setting configurations in
One additional change that is introduced in this feature is that attempting to uninstall from an upstream instance of spack now creates an error rather than uninstalling the package.
$ spack share activate $ spack share status ==> Shared mode enabled/disabled $ spack share deactivate
Some aspects of this are still a work in progress. Currently I have not implemented a good way to activate this version of spack. If a system-wide installation of spack, running the
My issue here is... this looks a lot like other features that have been proposed or integrated. I'd like to better understand how it is similar / different from them; and whether this is the way we want to go with Spack?
First, to better understand this PR:
I strongly encourage you to do this before writing more code on this feature.
What is shared
The main spack directory is shared between users. Everyone uses the same main installation and program files to run spack. As for packages and modules, some are shared and some are not. Right now, when shared mode is enabled a few things happen. First, spack changes where new packages are installed to wherever is specified by the user via an environment variable. Packages (and their respective modulefiles) installed this way are not shared between users.
Second, spack adds the typical install directory(
Typical Use Case
The use case that I envisioned is for use on a multi-user system that is providing their environment through spack. The system-wide environment would be managed through spack. Environments installed at the system level would be visible to all users and changes could be made by modifying the environment or creating different versions that co-exist. At the same time, if a user needed a package or install configuration that wasn't shipped with the system-wide environment, they could use the same instance of spack to install packages locally that only they are using.
This would also allow for testing of new environment configurations without disrupting the existing environment on a system.
This is not so much an alternative to chaining spacks, but this feature is built on top of it. Rather than each user having to configure their own unique instance of spack to point at an upstream instance, this would be handled by a single instance of spack that is treating itself as an upstream. This doesn't prevent other upstreams from being added if a user wanted to include other packages from another instance of spack.
My shared spack is actual fairly similar to spack server in what it allows users to do, but the actual architecture of the feature is different. Rather than having a unique client/server relationship, there is still only a single instance of spack that manages everything. This means that builds between users are not shared (though more on that later). Also, because when users are interfacing with spack they are installing locally, multiple users could be interfacing with the same instance of spack more easily than if communication had to pass through a server. This also avoids having to juggle multiple permissions and install locations for packages.
While thinking about how to make packages be truly shared between users as well, I had the idea of adding each user as an upstream as well. I have yet to try this, and I am not sure that the permissions between users would allow this, but when a user first interfaces with a shared instance of spack, they could be registered as an upstream so that their packages become visible to other users.
This feature seems ideal for the situation where a team of developers is working on a project with many dependencies. A single spack instance can install all of the dependencies in "unshared" mode and developers can then use that same spack instance in "shared" mode during their development workflow. eg, use spack in shared mode and
Thanks for explaining further. I am *really* not convinced. The core reason why is simple: if user A has built a package with a particular hash, and users A and B work together on a shared filesystem, then there is NO REASON for B to ever rebuild that same package with the same hash. Every user who builds the same package with the same hash will get the same result. That is a fundamental feature of how Spack works. Therefore, *not* sharing builds between users is never beneficial, IMHO. It makes sense to me that Spack Environments, modules, other stuff might not be shared. But the core builds themselves (i.e. the stuff that takes the most CPU time), it makes no sense to NOT share. A properly set-up "Spack Sever" system would replicate / automate how software was traditionally installed on HPC systems: I want a package, I make a request to the sysadmins, they install it and provide me a module. I agree there can be issues of queuing / executing build requests. Yes, more people can be building at once if they all build separately than if the requests are channeled through a server; unless Spack gets a little smarter about building multiple things in parallel. But I don't think that's a key concern here at this point, because builds happen so rarely. My computers, for example, might spend 3-4 hours *per year* building. I spend more time talking about Spack on GitHub than I do actually using Spack. I just don't think queue contention will be a big issue, even if we never think more deeply about parallelization.
but install in the user's workspace, without having to worry about
installing dependencies, clobbering others' work, setting up additional repos, configurations, etc. In theory you won't clobber other peoples' work because of Spack's use of hashes. I suppose this breaks down with `@master` and `@develop` versions. Argh.
I'm wondering if the following may serve (or if you point out issues with this suggestion for your use case it would help me understand
Say we have a system installation of Spack:
This would still require each user to make at least one configuration change themselves: to add the upstream Spack to their
With that the work to be done for each user would be something like configuring an alias so that users don't have to type
I use aliases for my spack commands…
On Mon, Jul 15, 2019 at 20:32 Peter Scheibel ***@***.***> wrote: I'm wondering if the following may serve (or if you point out issues with this suggestion for your use case it would help me understand spack share better): Say we have a system installation of Spack: spack-system. If we wanted a separate downstream Spack instance but did not want the user to have to replicate the config, we could place the desired configuration in a separate directory and when running either instance we could point to that config with -C like /path/to/downstream/spack -C /path/to/upstream/config ... /path/to/upstream/spack -C /path/to/upstream/config ... (Normally git clone would be sufficient for this but in that case the user would have to remember to git pull any changes that have occurred) This would still require each user to make at least one configuration change themselves: to add the upstream Spack to their upstreams.yaml. I think that could also be resolved by creating an additional config directory which contains the upstream config: # /upstream-pointer-cfg-dir just contains an upstreams.yaml file that points to the upstream instance /path/to/downstream/spack -C /path/to/upstream-pointer-cfg-dir/ -C /path/to/upstream/config ... /path/to/upstream/spack -C /path/to/upstream/config ... With that the work to be done for each user would be something like configuring an alias so that users don't have to type -C all the time. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11871?email_source=notifications&email_token=AAOVY527GEF432AGJKMOA7LP7UJKTA5CNFSM4H37WRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ7K2TQ#issuecomment-511618382>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOVY5YOU2FSYPQJFDOCGSTP7UJKTANCNFSM4H37WROQ> .
There are some rather painful ways this is not completely true at the moment:
Regardless, I think this point is valid -- we don't intend for things with the same hash to be different.
The use cases this addresses are:
We're aiming to support this type of thing through build farms and binary packages (see #11612). While that PR does not yet have a REST API as a front-end, you could imagine adding one -- that might be interesting at some point, but I can't say it's on the near-term priority list.
Builds happen every day, all the time at LLNL and other DOE sites. It's not a static environment.
Parallelization (via locking) is on the roadmap for this fall.
tgamblin left a comment
@carsonwoods: This is a good start! Thanks for taking it on. I have some change requests for you.
The main gist of what i'm requesting is that instead of having two modes (one for admins and one for users), there should still be only one mode, and the user (admin or otherwise) should be able to pick where to install packages. I think you should be able to do this in a general way by still leaning heavily on the chaining functionality we've already implemented, but a Spack instance will have:
Instead of using the
This will allow admins to log in and
There is one thing I don't see addressed at all in this PR, and that is permissions. See the package permissions docs for how we currently handle per-package permission settings. The shared install location should support these types of permission settings and we should ensure that the group and world bits are set properly on installations. If you want to see a model of how this can work in practice, look at how
Modules, environments, caches, etc.
Spack currently writes a bunch of other stuff into its prefix (modules, environments, caches) and I'd ideally like to see those moved to the home directory as well. I think we should think about how they factor into chaining -- environments in particular would be useful to have both in upstreams (sharable by anyone downstream) and locally in the home directory. Modules are already handled by chaining.
How does that sound? We should probably set up a telcon to discuss this, or cover it on a Spack weekly telcon.
This is an excellent discussion, I believe we should consider putting these points in the docs.…
On Tue, Jul 16, 2019 at 3:35 AM Todd Gamblin ***@***.***> wrote: @citibeth <https://github.com/citibeth>: Every user who builds the same package with the same hash will get the same result. There are some rather painful ways this is not completely true at the moment: 1. See #3206 <#3206> 2. Build dependencies are not (yet) included in the DAG hash (can result in subtle differences) 3. The hash of the package.py file is not included in the DAG hash. 4. To support platforms like Cray, we only blacklist certain environment variables to clean the build environment, which is not as thorough as starting from scratch and constructing the build environment. 5. We don't (yet) support building everything down to libc. 6. master and develop versions (as you mention) at the moment. Regardless, I think this point is valid -- we don't *intend* for things with the same hash to be different. Therefore, *not* sharing builds between users is never beneficial, IMHO. The use cases this addresses are: 1. I would consider it extremely beneficial not to share a build if, for example, I built something that was export-controlled. That's a very common use case for us. I would want to keep that private. 2. This PR is meant to make it possible to have a central system installation of Spack that is shared among unprivileged users *and* the facility. The facility may not want to support all the things users want (e.g., ours does not). They also want users to rely on a common core of shared dependencies. This increases sharing by basically giving you something that is chained out of the box. 3. Enabling this is critical for making Spack installable via PyPI. Currently, Spack requires write access to its own prefix. We need a version that does not require that to make it fit nicely into Python's provisioning model. 4. This is further out, but we *do* support relocation. If enough users install something and the facility decides to support it, users can "push" a local installation to the central Spack installation. Or they could make a binary package of it and share it with the facility, and other users could consolidate to have their Spack instances remove local duplicate installations and re-RPATH to a newly provisioned central install. These are things we'd like to have eventually. A properly set-up "Spack Sever" system would replicate / automate how software was traditionally installed on HPC systems: I want a package, I make a request to the sysadmins, they install it and provide me a module. We're aiming to support this type of thing through build farms and binary packages (see #11612 <#11612>). While that PR does not yet have a REST API as a front-end, you could imagine adding one -- that might be interesting at some point, but I can't say it's on the near-term priority list. But I don't think that's a key concern here at this point, because builds happen so rarely. Builds happen every day, all the time at LLNL and other DOE sites. It's not a static environment. even if we never think more deeply about parallelization. Parallelization (via locking) is on the roadmap for this fall. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11871?email_source=notifications&email_token=AAOVY5YHGP2NBPDDJUJLN63P7V23XA5CNFSM4H37WRO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ77IWQ#issuecomment-511702106>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOVY52WICYAV3YSA65SBGTP7V23XANCNFSM4H37WROQ> .
@tgamblin @citibeth @carsonwoods - I am following this issue with interest as it looks like it will be used for development of a code my group develops that has dozens of dependencies that are themselves being developed. The team's devops person will be the admin and developers the users. This makes #11919 all the more important - on some of my accounts at SNL, I have limited space and cannot install packages to my home directory.
I'll also echo what @tgamblin said earlier, it is not unusual for me to have different jobs compiling on many different machines most hours of the day.