Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] POSIX Environment Variables #822

Closed
ianlewis opened this issue Sep 7, 2022 · 12 comments
Closed

[discussion] POSIX Environment Variables #822

ianlewis opened this issue Sep 7, 2022 · 12 comments
Assignees
Labels
area:library Issue with the base reusable Go library type:discussion A point of discussion

Comments

@ianlewis
Copy link
Member

ianlewis commented Sep 7, 2022

Background:

For reproducibility, build steps are included in provenance buildConfig by some of our workflows. These build steps include information to reproduce the command, such as the working directory and environment variables.

Our build steps currently look like this:

"steps": [
    {
        "command": ["bash", "-c", "echo $STEP2"],
        "env": [
            "STEP2=fuga",
            "PWD=/home/user"
        ],
        "workingDir": "/home/user"
    },
    ...
]

POSIX defines some environment variables which mirror values set elsewhere. These include the PWD, USER, and HOME environment variables. Some programs may use the environment variables as a way to determine the current working directory current user for example. For that reason, we want our workflows to set some of these variables so that more builders, custom scripts, etc. will run properly.

Some values, such as the current working directory, are set in a separate field from the environment variables and thus the information in the environment variables would be redundant.

Question

Should these environment variables be included in the build steps in the provenance? If so, which ones?

@ianlewis ianlewis added type:discussion A point of discussion area:library Issue with the base reusable Go library labels Sep 7, 2022
@ianlewis
Copy link
Member Author

ianlewis commented Sep 7, 2022

Currently we are setting the PWD environment variable but are leaning towards removing it since it's redundant info and we can say that our buildConfig implicitly indicates that it should be set when reproducing the build step.

@ianlewis ianlewis changed the title [discussion] Include Environment Variables [discussion] POSIX Environment Variables Sep 7, 2022
@laurentsimon
Copy link
Collaborator

@mlieberman85
Copy link
Member

+1 to removing it. Does https://github.com/slsa-framework/slsa-github-generator/generic@v1 have a defined spec yet? Just making that clear in the spec would be useful.

@laurentsimon
Copy link
Collaborator

laurentsimon commented Sep 8, 2022

We don't have a "spec", just examples https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md#provenance-format

For go builder we have some explanation in https://github.com/slsa-framework/slsa-github-generator/tree/main/internal/builders/go#buildconfig-format, but it's probably not enough to explain what is recorded, what is not, and maybe we should take the view of a re-builder?

Great idea to have a spec, thanks! All the builders will have the same output, so it should be easy.

What would you like to see captured in the specs?

@shaunmlowry
Copy link

I think it's dangerous to start arbitrarily removing environment variables from provenance just because they're redundant on a specific platform (or set of platforms). What about build steps that run on Windows where PWD isn't a thing, and might validly be used for some user-defined purpose by such a build step?

@laurentsimon
Copy link
Collaborator

laurentsimon commented Sep 8, 2022

Note that we have not released our builders with PWD set, this is just a change we made a few days ago, but it's not released, which is why we're thinking of removing it.

I think you also need more context: we only record env variables the user asked us to set, and we don't record those that exist "by default" (HOME, LOGNAME, SHELL, etc). The user never asked us to set PWD in this case.

Recording all env variables is something we could do. In general we would not want to capture secrets, but in our case we don't have any.

So the bigger question probably is: what should ideally be recorded? Everything, or those explicitly set by the user for the build?

@shaunmlowry
Copy link

I think anything that could potentially affect the output of the build in any way needs to be recorded, e.g. PATH, LANG, LC_* could all have subtle effects on the output artifact

@ianlewis
Copy link
Member Author

ianlewis commented Sep 8, 2022

I tend to agree with @shaunmlowry. I think we should set PWD. The question is really whether we need to record it in the provenance. Though we do need to consider how much noise is added to the provenance, if it's set I think it should probably be included to make it more explicit. In practice, I don't think there are really a lot of these that we will need to set (probably PWD, HOME, and maybe USER); the rest usually cause sensible default behavior when not set.

Obviously we separately endeavor to not include secrets in the recorded environment variables but these variables are those set when the command is executed (any set by the build command itself aren't visible to provenance generation. You'd have to include secrets in your environment set in your goreleaser config or something like that for it to be potentially included here. However, we could sanitize obvious ones like AWS_SECRET_ACCESS_KEY etc. so it's obvious they were set, but their values are obscured. It's a bit hard to know what's a secret more generally though.

@mlieberman85
Copy link
Member

@shaunmlowry brings up a good point. I'm ok with some redundancy, especially in cases where someone might be doing something strange to normally read-only env vars.

@ianlewis I'm not super familiar with how the reusable workflows work, but could someone configure it with a list of environment variables to not include in the output? Give it to the user what should/shouldn't be included in provenance.

@laurentsimon
Copy link
Collaborator

laurentsimon commented Sep 9, 2022

I would like to challenge the assumption that "logging everything" is better, and that "logging anything that affect the build" is better. Below are thought experiments with use cases of provenance:

Re-builder

In order to re-build a project, a re-builder only needs the necessary steps the user provided. Default variables set by the OS, bash, etc are not needed. In fact, they are a source of noise: which env variables shold the builder need to set? Which are necessary? A complete list of env variables complicates a re-builder's job. Why the need to set HOME, USER when simply running go build would work?

Reproducible builds

This is a special case of the above, where a re-builder wants to verify a reproducible builds. A build pipeline whose output differs based on arbitrary variables set by the OS can never be reproducible. So in this scenario, the full list of env variables does not provide obvious benefits and seems like noise. There are various "types" of reproducibility, but I don't know any of them account for arbitrary variables set by the OS. I'm not sure we should care either. I know some toolchains (cmake) let users tweak their build steps based on the machine they are built on, but arguably this is something that needs to be tackled by keeping the "machine image" around, like proposed in #23, and using the exact same machine to re-build.

Incidence response

This is the only scenario where more seems better. However, I doubt that catching a compromised machine thru its env variables is reliable. In practice, it requires list of files and their hashes, a snapshot of running process, etc. So I would argue that the benefit of reporting all env variables in this scenario is marginal, and not needed. I think a better way to tackle this problem is to keep a copy of the machine (tarball, container image) for ever, like SLSA prescribes for source requirement. Maybe this is something that should be added as a level 4 requirement for builders? A complimentary solution is to keep a log of the build slsa-framework/slsa#401, which could also come as a level 4 build requirement.

So in general, I feel like reporting all env variables has only marginal benefits, and comes with additional complications for re-builders. There may be more nuances to what I describes above: maybe it depends on ecosystem / programming language, etc. We could even think of having the best of both worlds by having 2 types of env variables we report: provided by user, all on the machine.

Let's keep this conversation going. I think it's important and how to report building steps is not documented anywhere.

@ianlewis
Copy link
Member Author

I agree that, while I argued for it's inclusion, PWD is probably not really important to include and that PWD would probably just be noise. Most likely any others that have an analogous field in the buildConfig making it redundant would just be noise too. I'm not sure that I would feel comfortable saying that for every variable we might set though. We might need to consider them on a case-by-case basis and document them in some way. In practice though, I hope that few users will really need to care.

@ianlewis I'm not super familiar with how the reusable workflows work, but could someone configure it with a list of environment variables to not include in the output? Give it to the user what should/shouldn't be included in provenance.

This is possible. Though the inputs to a workflow aren't structured (only strings, integers, etc.) maybe it could be supported via a config file that is included in the repo. I'm not sure we're at the point of considering this option yet though.

@ianlewis
Copy link
Member Author

This is effectively addressed by #1321. Please see that PR for a description of the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:library Issue with the base reusable Go library type:discussion A point of discussion
Projects
None yet
Development

No branches or pull requests

4 participants