Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrubbing env vars from logs #1315

Merged
merged 2 commits into from
Mar 8, 2022
Merged

Scrubbing env vars from logs #1315

merged 2 commits into from
Mar 8, 2022

Conversation

helsaawy
Copy link
Contributor

@helsaawy helsaawy commented Mar 7, 2022

Scrubbing environment variables from logs of container configuration to prevent leaking sensitive information.

Added config parameter, ScrubLogs, to enable scrubbing.

Will have to be re-vendored into CRI fork to be deployed properly.

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com

Added code to remove environment variables from code path.
Need to add toggle mechanism

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
@helsaawy helsaawy requested a review from a team as a code owner March 7, 2022 17:57
@jterry75
Copy link
Contributor

jterry75 commented Mar 7, 2022

While this works, it doesnt prevent containerd from logging the env's right? ttrpc trace exporter right? have we considered using tagging on the task proto and trying to work with containerd to support this? It seems like something along the lines of

type test struct {
   a string `scrub:always`
   b string `scrub:when-disabled`
   c string `scrub:never`
}

And then the policy of the scrub can do something more generic when writing out the struct to text in a trace message?

@helsaawy
Copy link
Contributor Author

helsaawy commented Mar 7, 2022

I have another PR on our fork of containerd to scrub the environment variables from its logs.
I hadnt considered tagging the structs, but I'll look into that.
Since most of the logs here used already json-serialized strings that I had to unmarshal, scrub, and then re-marshal, I was hesitant to create a generic solution since that would involve repeated json parsing, which seemed excessive:

v = &A{}
if err := json.Unmarshal(b, &pp); err == nil {
    //scrub
    // ...
    return json.Marshal(b, v)
}
v = &B{}
if err := json.Unmarshal(b, &pp); err == nil {
    // ...
}

@jterry75
Copy link
Contributor

jterry75 commented Mar 7, 2022

I have another PR on our fork of containerd to scrub the environment variables from its logs. I hadnt considered tagging the structs, but I'll look into that. Since most of the logs here used already json-serialized strings that I had to unmarshal, scrub, and then re-marshal, I was hesitant to create a generic solution since that would involve repeated json parsing, which seemed excessive:

v = &A{}
if err := json.Unmarshal(b, &pp); err == nil {
    //scrub
    // ...
    return json.Marshal(b, v)
}
v = &B{}
if err := json.Unmarshal(b, &pp); err == nil {
    // ...
}

Ugh.. :). Logging right...

@helsaawy
Copy link
Contributor Author

helsaawy commented Mar 7, 2022

Ugh.. :). Logging right...

Yeah ... unfortunately the function we log receive serialized strings (or structs with serialized strings as fields) rather than structs as parameters, so the whole pars-scrub-reparse song and dance is unavoidable, and I am really hesitant to incur a serious performance hit :(

@@ -0,0 +1,174 @@
// This package scrubs objects of customer information to pass to logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost say maybe this makes sense in /internal/log instead to keep them close together (in a scrub.go file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I guess the scenario I'm picturing in my mind is you'd set an option on the logger itself and it would handle scrubbing internally.

Copy link
Contributor Author

@helsaawy helsaawy Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating adding a flag in the context, but that, along with setting a flag on the logger, wouldnt work because a lot of places create a new logger from scratch rather than inherit it from parent functions.
I do like the idea of moving it to internal/log though

internal/scrub/scrub.go Outdated Show resolved Hide resolved
internal/scrub/scrub.go Outdated Show resolved Hide resolved
internal/scrub/scrub.go Outdated Show resolved Hide resolved
@dcantah
Copy link
Contributor

dcantah commented Mar 8, 2022

While this works, it doesnt prevent containerd from logging the env's right? ttrpc trace exporter right? have we considered using tagging on the task proto and trying to work with containerd to support this? It seems like something along the lines of

type test struct {
   a string `scrub:always`
   b string `scrub:when-disabled`
   c string `scrub:never`
}

And then the policy of the scrub can do something more generic when writing out the struct to text in a trace message?

This seems like a great idea, I'd tried something similar for marking fields as deprecated (for a different project, not here).


// scrubbing utilities to remove user information from arbitrary objects

type genMap = map[string]interface{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the same as type genMap map[string]interface{}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, this is a type alias, so genMap is equivalent to a map[string]interface{} and I dont have to convert between the two.
This is just to save keystrokes and make functions look cleaner.

internal/scrub/scrub.go Outdated Show resolved Hide resolved
@helsaawy
Copy link
Contributor Author

helsaawy commented Mar 8, 2022

And then the policy of the scrub can do something more generic when writing out the struct to text in a trace message?

This seems like a great idea, I'd tried something similar for marking fields as deprecated (for a different project, not here).

So I am trying to implement this: hcsschema.ProcessParameters is auto-generated (via Swagger), and I'm nervous that if it is regenerated and the scrub:always tag isn't manually added back in, then scrubbing comes crashing down.

I do think that we should revisit this with a more robust solution after closing up these leaks.

@dcantah
Copy link
Contributor

dcantah commented Mar 8, 2022

And then the policy of the scrub can do something more generic when writing out the struct to text in a trace message?

This seems like a great idea, I'd tried something similar for marking fields as deprecated (for a different project, not here).

So I am trying to implement this: hcsschema.ProcessParameters is auto-generated (via Swagger), and I'm nervous that if it is regenerated and the scrub:always tag isn't manually added back in, then scrubbing comes crashing down.

I do think that we should revisit this with a more robust solution after closing up these leaks.

Ahh, I was replying about on the containerd end for the protos. Yea, not sure how we'd get away with our swagger files..

Moved scrubbing under `internal/log`.
Added proper godocs to exported functions

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
@helsaawy helsaawy merged commit d512c70 into microsoft:master Mar 8, 2022
@@ -365,6 +367,7 @@ func (brdg *bridge) recvLoop() error {
func (brdg *bridge) sendLoop() {
var buf bytes.Buffer
enc := json.NewEncoder(&buf)
enc.SetEscapeHTML(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default behavior is to escape characters for HTML, and since the replacement is <scrubbed>, that gets encoded as \u003cscrubbed\u0033, which i felt would cause issues for pattern matching down the line

type genMap = map[string]interface{}
type scrubberFunc func(genMap) error

const ScrubbedReplacement = "<scrubbed>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be exported right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, ill also unexport it

func ScrubProcessParameters(s string) (string, error) {
// todo: deal with v1 ProcessConfig
b := []byte(s)
if !IsScrubbingEnabled() || !hasKeywords(b) || !json.Valid(b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be returning an error if the string isn't a valid json?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be a go-formatted string, or some other parameter that is passed in erroneously, in which case I felt it best to just return it as is

}
pp.Environment = map[string]string{ScrubbedReplacement: ScrubbedReplacement}

buf := bytes.NewBuffer(b[:0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we need to use a NewBuffer type here but in scrubBytes we don't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we control the byte slice here (from converting the string to a byte string), we can reuse the storage
for scrubBytes, the byte slice is passed in from the caller and we dont own it, so using it here would overwrite the contents, which, for the bridge, is the payload being sent.

Comment on lines +150 to +155
func isRequestBase(m genMap) bool {
// neither of these are (currently) `omitempty`
_, a := m["ActivityId"]
_, c := m["ContainerId"]
return a && c
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of parsing all of the things in this file in string format, this seems like an unreliable way to determine fields. The alternative was marshalling and unmarshalling right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that required exporting the internal bridge protocol structs, and I didnt know if that was an issue

princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
Added code to remove environment variables from code path.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants