-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build_env: allowlist rather than blocklist #30015
base: develop
Are you sure you want to change the base?
Conversation
Stop removing environment variables to fix problems and instead nuke them all from orbit, it's the only way to be sure. Less flippantly, this adds a helper function to the env utility to restore an environment variable to its current state after it's been cleared. Then build_environment clears all environment variables, and restores only those variables that we explicitly allow to persist. This solves much of the fragility of building packages that take poorly named common environment variables silently as arguments, such as OpenSSL, but will likely cause breakage for cases where packages happened to accidentally work due to re-using external state that will no longer be visible.
…ly to env vars that are otheriwse cleared
lib/spack/spack/build_environment.py
Outdated
env_name_whitelist = list(map(re.compile, ( | ||
r'SPACK', | ||
r'^SPACK_.*', # keep SPACK_ROOT and company | ||
r'EDITOR', | ||
r'PATH', | ||
r'PATHEXT', | ||
))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me that we should keep any of these except SPACK_*
. Is there some reason not to start PATH out empty or with some very small explicit set of paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time, we relied so heavily on leaking that it seemed necessary, now I'm not so sure. It would be worth testing for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should wall the whole env off and be purely constructive. Things can still leak in through externals, but that's mostly unavoidable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but we are missing tons of dependencies on basic system tools like sed, awk, etc. It might be a big lift if /usr/bin
and /bin
are gone. Maybe we can include those for now and start tracking how many things like that are left?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth running a pipeline to see what happens... I agree we don't test everything but I wonder if we can handle a lot of these at the build tool level (e.g. for autotools
packages). I think we have to get them updated at some point.
It occurred to me we could test a version of path clearing with the spackos container. I was pleasantly surprised with the minimal set in path, but the minimal set is missing a couple of things to actually go from nothing. This is the set I found required to run spack and build basically anything:
If any of this stuff is missing, by which I mean they are installed by spack but not in path, things don't build. Like zlib doesn't build because it doesn't have an explicit dep on tar to add it to path. Similar issues happen for many of the others. Good thing is this isn't all that different from the normal requirements list, maybe worth just adjusting that and then making these things available in a separate bin as was suggested? In the end it's not all that different from how the spackos container works by default, all of that stuff (plus some bootstrap deps and an editor) get shoved into an env that isn't loaded but has its view in path. |
Some changes here appear to have been semantically superseded, like those in
but that logic has since changed the mechanism of determining |
Honestly I'm not sure, depends on what the third way is. That went in because of the loss of the USER environment variable causing getuser to fail. If the third is fine and doesn't cause issues then it can probably go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've synced this with develop but it
- seems risky: all I want to do is clear PATH, I don't know the effects of clearing all env variables and would probably want to consider the effects one by one. Moreover, if users want to set a single environment variable themselves, which is then passed to the build, then they have to perform dirty installs, so overall I'm not sure about the whitelist approach here
- I was mainly drawn here with hopes of sanitizing path, which is entirely TBD (although there is an excellent list of all the needed utilities here)
- I'm not clear on what the expectations are WRT
restore
name: name of the environment variable to be restored | ||
""" | ||
if name in os.environ: | ||
self.set(name, os.environ[name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trws nothing actually handles this request, does it? In that sense it seems incomplete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enclosing class is our list of environment modifications to be made after clearing the environment, so this re-uses the normal set method to add the contents of the current environment to the list to be restored. If there's something wrong with it I'm not sure what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense, e.g. for
env.restore("CRAY_LD_LIBRARY_PATH")
you are adding the original value as a new set
operation after unsetting it, the important thing being that the env modifications haven't been executed on the current environment so we are recording the unset followed by the set. Is there a reason to just not unset them in the first place (e.g. include them in the whitelist)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. I had to dig back through it a bit to try to remember. The end effect is very similar to adding it to the whitelist, I think it was meant to allow packages to explicitly do this in things like setup_build_environment or similar without having to pass around the whitelist. I can't remember why that particular one is done that way though.
r"SPACK", | ||
r"^SPACK_.*", # keep SPACK_ROOT and company | ||
r"EDITOR", | ||
r"PATH", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As observed in #30015 (comment), we'd have to construct a minimal PATH with a set of utilities in order to clean it (i.e. point to some new PATH dir with the minimal set of executables, and start with PATH set to this one entry to achieve the best isolation we can).
FWIW, I started tilting at this particular windmill because I had an environment variable I use all the time called something like |
Stop removing environment variables to fix problems and instead completely annihilate the entire environment, it's the only way to be sure.
Less flippantly, this adds a helper function to the env utility to
restore an environment variable to its current state after it's been
cleared. Then build_environment clears all environment variables, and
restores only those variables that we explicitly allow to persist.
This solves much of the fragility of building packages that take poorly
named common environment variables silently as arguments, such as
OpenSSL, but will likely cause breakage for cases where packages
happened to accidentally work due to re-using external state that will
no longer be visible.