Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: sharing a spack instance #11871

Open
wants to merge 151 commits into
base: develop
from

Conversation

@carsonwoods
Copy link
Contributor

carsonwoods commented Jun 27, 2019

Shared Spack

These changes add the ability for spack to operate in a "shared" mode where multiple users can use the same instance of spack without directly affecting other users. Previously, a similar solution was possible via users configuring their local ~/.spack configurations, however doing so didn't stop other users from accidentally affecting other users packages/specs.

When shared mode is inactive spack behaves like a normal spack instance. This would allow system admins to configure repos, mirrors, environments, etc. These settings are shared by all users of this instance of spack.

When shared mode is enabled spack would treat the traditional installation locations as an upstream instance of spack, and the typical install/stage/cache/etc locations would be set to a directory that a user could specify by setting $SPACK_PATH=/some/directory/ in their environment.

Users could still make their own local setting configurations in ~/.spack.

One additional change that is introduced in this feature is that attempting to uninstall from an upstream instance of spack now creates an error rather than uninstalling the package.

Commands Introduced

$ spack share activate
$ spack share status
==> Shared mode enabled/disabled
$ spack share deactivate

WIP

Some aspects of this are still a work in progress. Currently I have not implemented a good way to activate this version of spack. If a system-wide installation of spack, running the . $spack/share/spack/setup-env.sh could be hard to find. I experimented with creating a module file that runs that setup script and while that did work, it needs more work to be a viable way to load a shared spack.

@citibeth

This comment has been minimized.

Copy link
Member

citibeth commented Jun 27, 2019

I think we'd be better served by a shared spack that just lets everyone install stuff into the same place. That will produce the most efficient sharing of installations between users.

@carsonwoods

This comment has been minimized.

Copy link
Contributor Author

carsonwoods commented Jun 27, 2019

Ok, I can make that change. My only concern is that some users might not have permissions to install packages if the shared instance were placed at a system level.

@citibeth

This comment has been minimized.

Copy link
Member

citibeth commented Jun 27, 2019

My issue here is... this looks a lot like other features that have been proposed or integrated. I'd like to better understand how it is similar / different from them; and whether this is the way we want to go with Spack?

First, to better understand this PR:

  • Which of these are shared / not shared between users?

    • main spack directory?
    • tree of installed packages?
    • directory of generated modules?
  • Can you describe what problem this PR solves; and a typical use case?

  • Please look at Spack Chain (merged) and Spack Server (not implemented; #3156). Please compare / contrast this PR with those. And evaluate the merits of one approach vs the other.

I strongly encourage you to do this before writing more code on this feature.

@carsonwoods

This comment has been minimized.

Copy link
Contributor Author

carsonwoods commented Jun 28, 2019

What is shared

The main spack directory is shared between users. Everyone uses the same main installation and program files to run spack. As for packages and modules, some are shared and some are not. Right now, when shared mode is enabled a few things happen. First, spack changes where new packages are installed to wherever is specified by the user via an environment variable. Packages (and their respective modulefiles) installed this way are not shared between users.

Second, spack adds the typical install directory($spack/opt/spack) as an upstream (basically its treating itself as an upstream/chained instance of spack). That means that any packages that were installed when shared mode is disabled, become visible as upstream packages to users when shared mode is re-enabled. Because they are included in an upstream, packages installed when shared mode is disabled have their module files made visible to users.

Typical Use Case

The use case that I envisioned is for use on a multi-user system that is providing their environment through spack. The system-wide environment would be managed through spack. Environments installed at the system level would be visible to all users and changes could be made by modifying the environment or creating different versions that co-exist. At the same time, if a user needed a package or install configuration that wasn't shipped with the system-wide environment, they could use the same instance of spack to install packages locally that only they are using.

This would also allow for testing of new environment configurations without disrupting the existing environment on a system.

Differences/Similarities

Spack Chain

This is not so much an alternative to chaining spacks, but this feature is built on top of it. Rather than each user having to configure their own unique instance of spack to point at an upstream instance, this would be handled by a single instance of spack that is treating itself as an upstream. This doesn't prevent other upstreams from being added if a user wanted to include other packages from another instance of spack.

Spack Server

My shared spack is actual fairly similar to spack server in what it allows users to do, but the actual architecture of the feature is different. Rather than having a unique client/server relationship, there is still only a single instance of spack that manages everything. This means that builds between users are not shared (though more on that later). Also, because when users are interfacing with spack they are installing locally, multiple users could be interfacing with the same instance of spack more easily than if communication had to pass through a server. This also avoids having to juggle multiple permissions and install locations for packages.

While thinking about how to make packages be truly shared between users as well, I had the idea of adding each user as an upstream as well. I have yet to try this, and I am not sure that the permissions between users would allow this, but when a user first interfaces with a shared instance of spack, they could be registered as an upstream so that their packages become visible to other users.

@tgamblin tgamblin added the permissions label Jul 6, 2019
@tgamblin tgamblin requested a review from becker33 Jul 6, 2019
@tgamblin

This comment has been minimized.

Copy link
Member

tgamblin commented Jul 6, 2019

@becker33: can you please review this one? It looks like cool stuff is happening over at SNL 👍

@tjfulle

This comment has been minimized.

Copy link
Contributor

tjfulle commented Jul 8, 2019

This feature seems ideal for the situation where a team of developers is working on a project with many dependencies. A single spack instance can install all of the dependencies in "unshared" mode and developers can then use that same spack instance in "shared" mode during their development workflow. eg, use spack in shared mode and spack setup myproject -- myproject would use dependencies from the shared location, but install in the user's workspace, without having to worry about installing dependencies, clobbering others' work, setting up additional repos, configurations, etc.

@citibeth

This comment has been minimized.

Copy link
Member

citibeth commented Jul 9, 2019

@scheibelp scheibelp assigned scheibelp and unassigned becker33 Jul 15, 2019
@scheibelp scheibelp requested review from scheibelp and removed request for becker33 Jul 15, 2019
@scheibelp

This comment has been minimized.

Copy link
Member

scheibelp commented Jul 16, 2019

I'm wondering if the following may serve (or if you point out issues with this suggestion for your use case it would help me understand spack share better):

Say we have a system installation of Spack: spack-system. If we wanted a separate downstream Spack instance but did not want the user to have to replicate the config, we could place the desired configuration in a separate directory and when running either instance we could point to that config with -C like

/path/to/downstream/spack -C /path/to/upstream/config ...
/path/to/upstream/spack -C  /path/to/upstream/config ...

(Normally git clone would be sufficient for this but in that case the user would have to remember to git pull any changes that have occurred)

This would still require each user to make at least one configuration change themselves: to add the upstream Spack to their upstreams.yaml. I think that could also be resolved by creating an additional config directory which contains the upstream config:

# /upstream-pointer-cfg-dir just contains an upstreams.yaml file that points to the upstream instance
/path/to/downstream/spack -C /path/to/upstream-pointer-cfg-dir/ -C /path/to/upstream/config ...
/path/to/upstream/spack -C  /path/to/upstream/config ...

With that the work to be done for each user would be something like configuring an alias so that users don't have to type -C all the time.

@citibeth

This comment has been minimized.

Copy link
Member

citibeth commented Jul 16, 2019

@tgamblin

This comment has been minimized.

Copy link
Member

tgamblin commented Jul 16, 2019

@citibeth:

Every user who builds the same package with the same hash will get the same result.

There are some rather painful ways this is not completely true at the moment:

  1. See #3206
  2. Build dependencies are not (yet) included in the DAG hash (can result in subtle differences)
  3. The hash of the package.py file is not included in the DAG hash.
  4. To support platforms like Cray, we only blacklist certain environment variables to clean the build environment, which is not as thorough as starting from scratch and constructing the build environment.
  5. We don't (yet) support building everything down to libc.
  6. master and develop versions (as you mention) at the moment.

Regardless, I think this point is valid -- we don't intend for things with the same hash to be different.

Therefore, not sharing builds between users is never beneficial, IMHO.

The use cases this addresses are:

  1. I would consider it extremely beneficial not to share a build if, for example, I built something that was export-controlled. That's a very common use case for us. I would want to keep that private.
  2. This PR is meant to make it possible to have a central system installation of Spack that is shared among unprivileged users and the facility. The facility may not want to support all the things users want (e.g., ours does not). They also want users to rely on a common core of shared dependencies. This increases sharing by basically giving you something that is chained out of the box.
  3. Enabling this is critical for making Spack installable via PyPI. Currently, Spack requires write access to its own prefix. We need a version that does not require that to make it fit nicely into Python's provisioning model.
  4. This is further out, but we do support relocation. If enough users install something and the facility decides to support it, users can "push" a local installation to the central Spack installation. Or they could make a binary package of it and share it with the facility, and other users could consolidate to have their Spack instances remove local duplicate installations and re-RPATH to a newly provisioned central install. These are things we'd like to have eventually.

A properly set-up "Spack Sever" system would replicate / automate how software was traditionally installed on HPC systems: I want a package, I make a request to the sysadmins, they install it and provide me a module.

We're aiming to support this type of thing through build farms and binary packages (see #11612). While that PR does not yet have a REST API as a front-end, you could imagine adding one -- that might be interesting at some point, but I can't say it's on the near-term priority list.

But I don't think that's a key concern here at this point, because builds happen so rarely.

Builds happen every day, all the time at LLNL and other DOE sites. It's not a static environment.

even if we never think more deeply about parallelization.

Parallelization (via locking) is on the roadmap for this fall.

Copy link
Member

tgamblin left a comment

@carsonwoods: This is a good start! Thanks for taking it on. I have some change requests for you.

No modes

The main gist of what i'm requesting is that instead of having two modes (one for admins and one for users), there should still be only one mode, and the user (admin or otherwise) should be able to pick where to install packages. I think you should be able to do this in a general way by still leaning heavily on the chaining functionality we've already implemented, but a Spack instance will have:

  1. A default install location of ~/.spack, as you've proposed here
  2. An "upstream" (ala Spack Chains) configured by default within the Spack prefix.

Instead of using the spack shared command to switch modes, I think this should add an argument to spack install to say where to install. See my note on this below; I think you could start by just adding support for spack install --global (to install to location 2 above) instead of doing the fully general thing where you could install to arbitrary upstreams.

This will allow admins to log in and spack install --global <pkg>, or even to activate an environment and just run spack install --global to get everything installed in the shared location. This also allows any user to first get a build working in their home directory, then install to the global location once they've debugged stuff. I think that will be a common use case for us -- it can take a while to get a build debugged.

Permissions

There is one thing I don't see addressed at all in this PR, and that is permissions. See the package permissions docs for how we currently handle per-package permission settings. The shared install location should support these types of permission settings and we should ensure that the group and world bits are set properly on installations. If you want to see a model of how this can work in practice, look at how git repository sharing works in the filesystem. I think the model for Spack can be similar.

Modules, environments, caches, etc.

Spack currently writes a bunch of other stuff into its prefix (modules, environments, caches) and I'd ideally like to see those moved to the home directory as well. I think we should think about how they factor into chaining -- environments in particular would be useful to have both in upstreams (sharable by anyone downstream) and locally in the home directory. Modules are already handled by chaining. /var/spack/cache -- the download cache -- should probably be moved as well, though it probably makes sense to allow that to be cached globally somehow (perhaps similarly to the permissions model on the install directory).

How does that sound? We should probably set up a telcon to discuss this, or cover it on a Spack weekly telcon.

lib/spack/docs/shared.rst Outdated Show resolved Hide resolved
lib/spack/docs/shared.rst Outdated Show resolved Hide resolved
lib/spack/spack/cmd/install.py Outdated Show resolved Hide resolved
lib/spack/spack/stage.py Show resolved Hide resolved
lib/spack/spack/store.py Outdated Show resolved Hide resolved
lib/spack/spack/store.py Outdated Show resolved Hide resolved
@citibeth

This comment has been minimized.

Copy link
Member

citibeth commented Jul 16, 2019

@carsonwoods

This comment has been minimized.

Copy link
Contributor Author

carsonwoods commented Jul 16, 2019

@tgamblin
Thanks for the review! I think that these suggestions make a lot of sense. I can start implementing some of these changes on my end and I'd be happy to have a telcon to discuss all this further.

@tgamblin

This comment has been minimized.

Copy link
Member

tgamblin commented Jul 16, 2019

@carsonwoods: sounds good -- let us know on Slack or here if you've got questions, or we can set up a call sometime.

@tgamblin

This comment has been minimized.

Copy link
Member

tgamblin commented Jul 16, 2019

@carsonwoods: Just FYI: if you can rebase on develop instead of a lot of merges, it may be easier to preserve your commits when this is finally merged.

@tjfulle

This comment has been minimized.

Copy link
Contributor

tjfulle commented Jul 16, 2019

@tgamblin @citibeth @carsonwoods - I am following this issue with interest as it looks like it will be used for development of a code my group develops that has dozens of dependencies that are themselves being developed. The team's devops person will be the admin and developers the users. This makes #11919 all the more important - on some of my accounts at SNL, I have limited space and cannot install packages to my home directory.

I'll also echo what @tgamblin said earlier, it is not unusual for me to have different jobs compiling on many different machines most hours of the day.

carsonwoods added 3 commits Sep 20, 2019
Done to resolve merge conflicts that had arisen since work on this
feature completed.
@tgamblin tgamblin added this to In progress in Spack v0.14.0 release Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Spack v0.14.0 release
  
In progress
6 participants
You can’t perform that action at this time.