Support data redaction, so that sensitive data in protos doesn't leak into places like log files #1160

jhump · 2016-01-20T21:04:54Z

In every server language, it is trivial for someone to include a proto in an unstructured log line:
logger.info("something happened: %s", proto);

It is very common for such log output to be ingested into various systems (like exception tracking / alerting, full-text indexing for search) which suddenly makes it possible to accidentally leak sensitive or secret data into these systems (PII, account numbers and PANs, passwords, etc). Scraping all such sensitive data after it is accidentally leaked can be a huge pain, depending on the tools used. And identifying what should be redacted in the path of ingestion is error-prone. Pattern marching could identify things that look like social security numbers and credit card numbers, but phone numbers can get tricky (especially if supporting international phone number formats) and addresses moreso. Passwords? Forget about it...

So having a way to redact fields from operations that convert messages into strings (Object#toString in Java, for example) is extremely useful to prevent accidental logging of sensitive data. Even if this functionality doesn't belong in the core protobuf runtime libraries, I think there should at least be a hook point in the library so that redaction can be "plugged in" for a given deployment.

This problem is simpler when all logs are structured (e.g. not unstructured text output, log processors can then use custom message and field options to filter/modify the logs to prevent leaking sensitive data). But even when using structured logging for some things, there is likely always a need for unstructured log output from a server, even if just for debugging.

Strawman solution (using Java runtime as an example, but pattern can be applied to other runtimes/languages):

Add redacted fields to FieldOptions and MessageOptions in descriptor.proto.
TextFormat.Parser.Builder gets a new setting: setRedacted(boolean) (defaults to false, existing behavior). When true, redacted messages result in "{ <redacted> }" and redacted fields, if present, are shown as fieldname: <redacted>.
AbstractMessage#toString() updated to use redaction feature of TextFormat.

If adding redacted options into descriptor.proto is going too far, then TextFormat.Parser.Builder#setRedacted could instead take a GeneratedExtension<FieldOptions, Boolean> -- a custom field option that is queried to determine if field values should be skipped. This would require a way to configure a default option that can then be used from AbstractMessage#toString.

The text was updated successfully, but these errors were encountered:

jhump · 2016-01-20T21:07:30Z

@lukaszx0 @cconroy @zellyn

toddlipcon · 2016-12-22T15:31:01Z

We've just implemented essentially the same idea for Apache Kudu (https://gerrit.cloudera.org/#/c/5553/) and it was relatively painful to do without modifying the Protobuf library.

Another idea that could work (at least in the C++ library implementation) would be to change Message::ShortDebugString and Message::String from constructing a new Printer each time to using a singleton Printer instance each. I believe Printer is itself thread-safe once configured (i.e can be used to print concurrently from multiple threads), and thus having a singleton printer would allow users to configure a custom field printer to handle their own redaction needs.

I haven't looked at the Java library yet to see if the same is doable.

ghost · 2019-06-11T17:52:01Z

@acozzette This seems like a recurring feature request

elharo · 2021-10-01T20:25:53Z

Can use custom options to do this: https://developers.google.com/protocol-buffers/docs/proto3#customoptions

toddlipcon · 2021-10-01T21:21:19Z

I don't think that's a satisfactory workaround - as I mentioned above in #1160 (comment) one can do it, but you have to change all callsites away from using the normal DebugString functions to your own (in Kudu we wrote one called SecureDebugString). That was quite a pain and it's easy for people to forget.

cconroy · 2021-10-01T21:27:48Z

Chiming in, I can confirm we still have this as a hard requirement at Square more than 5 years later and still have to maintain a fork of protoc to get this functionality.

fowles · 2021-10-01T21:42:59Z

Proto cannot be in the business of enforcing corporate policy in this way. There will always be ways to extract data from them. Any company building on it will require their own linting and technical solutions to ensure something end-to-end.

runlilong · 2021-11-30T07:39:40Z

Chiming in, I can confirm we still have this as a hard requirement at Square more than 5 years later and still have to maintain a fork of protoc to get this functionality.

https://go.dev/blog/protobuf-apiv2
Did not this blog solve the problem? Though I write the same code but didn't get what I want.
@cconroy

jhump · 2021-11-30T15:41:04Z

@runlilong, that blog is about the new reflection and descriptor support. The issue is that the latest protoc plugin still generates a String() string method on all generated structs that provides no hooks for redaction capabilities (and the same is true of other language runtimes and generated code, e.g. toString() methods in generated Java code). So this latest API provides zero support for the kind of redaction functionality requested.

Proto cannot be in the business of enforcing corporate policy in this way.

@fowles, that is fine. But it sure would be nice if the runtime libraries had a level of pluggability/configurability when it comes to the generated "stringify" functions, so that corporations could actually enforce their own policies. I think @toddlipcon's suggestion could be applied to other language runtimes, allowing code to set an alternate "printer" during program initialization that could handle custom policies.

elharo · 2021-11-30T18:31:45Z

This feels similar in spirit to #9114 though the details are quite different.

Bottom line: the protobuf libraries do not support custom input or output formats, nor is this a goal. The proto libraries are designed to consume and produce very specific, documented formats as used by gRPC and many other systems. Any other format — protobuf, JSON, or something else — can be created or consumed by writing code to filter the standard inputs and output as needed by a specific application. Indeed this is such a common thing to do that it's become a Meme about SWE careers at Google. However this code is in a separate layer and does not need to be committed to the core libraries in this repo.

jhump · 2021-11-30T20:10:19Z

@elharo, I don't agree that's the request. I think it would be fine for the implementation of this redacted format to be external to the protobuf runtime. The real thing we're talking about is providing some sort of override or hook into the automatic "stringification" of messages -- e.g. the Message::ShortDebugString method in C++, Message.String() method in Go, AbstractMessage.toString() method in Java, etc.

zellyn · 2021-11-30T20:10:42Z

@elharo with respect, I'm not sure you're understanding the request. The problem is that if you have an SSN or credit card number in a protobuf field in a language-specific proto object in Java/Go/whatever, and you accidentally print it, you see the sensitive information in the output/logs.

We're all fully capable of writing code on top of protobufs to do any manner of filtering and conversion — not only are several of us ex-Googlers, but Square also worked with Google on gRPC before it was released.

We've chatted with the protobuf team about the idea of a "sensitive" field option, and they said it made sense. Indeed, they mentioned that the credit-card-processing parts inside Google have a similar mechanism. Unfortunately, they said it was unlikely to get merged: mainly for organizational/process reasons.

I've pondered squirreling away and dropping fully-complete PRs for protoc, the C proto library, the Java proto library, and the Go proto library, and seeing if it would shift things, but that's a lot of speculative work.

A more incremental approach would be to first create a "blessed" field annotation (sensitive seems reasonable) — by "blessed" I mean comparable to deprecated — and teach protoc to understand it. Then adding support to language libraries could be done one at a time. But that would require a vote of confidence for the approach.

@jhump and I initially wrote up a proposal here, but further feedback is welcome.

fowles · 2021-11-30T20:23:14Z

The need and desire here is real. Unfortunately, we are pretty far away from being able to prioritize the level of undertaking this would require. It is a pretty large chunk of design and implementation across many languages and bindings.

zellyn · 2021-11-30T21:57:02Z

The need and desire here is real. Unfortunately, we are pretty far away from being able to prioritize the level of undertaking this would require. It is a pretty large chunk of design and implementation across many languages and bindings.

The design is pretty simple: a "sensitive" field option. The implementation can be contributed by the community, language by language. It only affects the text serialization, which, IIRC correctly from my time at Google, is plastered with "not stable; don't depend on this" warnings. All we'd need is a go-ahead, and the initial addition of "sensitive" to protoc and the descriptor proto.

runlilong · 2021-12-01T03:16:14Z

that blog is about the new reflection and descriptor support. The issue is that the latest protoc plugin still generates a String() string method on all generated structs that provides no hooks for redaction capabilities (and the same is true of other language runtimes and generated code, e.g. toString() methods in generated Java code). So this latest API provides zero support for the kind of redaction functionality requested.

@jhump If there is a hook to custom String() is the best. Unfortunately, there is not; for the redaction on the log, maybe we can do our custom redact actions with the new reflection feature during the log process. I get some help from this blog and the repository.

jhump · 2023-01-03T18:18:30Z

@fowles, does this commit suggest that this is now being looked at? 9238c48
If so, could we re-open this issue and then later re-close it as resolved once the arc of work is complete?

zellyn · 2023-01-03T19:05:42Z

Oh, nice catch Josh!

fowles · 2023-01-03T19:25:06Z

We are doing some internal work in this space. I don't know how much we will have to build on in the near term though.

KomarDL · 2024-04-30T12:39:00Z

@fowles Hi! Do we have an update on this issue? Thank you

fowles · 2024-04-30T23:20:51Z

There are a bunch of things that enable one to turn this on for C++ right now. There is working ongoing for Java.

protobuf/src/google/protobuf/text_format.h

Line 50 in e1559c8

PROTOBUF_EXPORT bool ShouldRedactField(const FieldDescriptor* field);

protobuf/src/google/protobuf/text_format.cc

Lines 3030 to 3031 in e1559c8

    
           if (field->options().debug_redact()) return true; 
        
           return false;

https://github.com/protocolbuffers/protobuf/blob/e1559c8efdcfc2d66edfd10b02022ecd737534b9/src/google/protobuf/text_format.cc#L98C35-L98C66

KomarDL · 2024-05-02T07:10:24Z

Great work! I'm not professional in protobuf. Where can i find docs or some example of how to use it? As i understood i should set enable_debug_string_safe_format to true and... What else should i do?

fowles · 2024-05-02T14:41:20Z

We don't have public docs on this yet because we are focused on building out the Java implementation. Right now, you need to set those flags to true and then mark fields with debug_redact = true in the options for them.

KomarDL · 2024-05-03T10:13:23Z

Okey, thanks. And can you name approximate time frame for implementing this for Java?

fowles · 2024-05-03T17:50:36Z

Implementation is ongoing, but I would not expect to see it released until next year.

xfxyjwf added the enhancement label Jan 21, 2016

xfxyjwf added the P3 label Jun 8, 2018

mergeconflict mentioned this issue Aug 26, 2019

Leak of private key inline_string when specifying multiple tls certificates envoyproxy/envoy#4757

Closed

andrewhowdencom mentioned this issue Feb 17, 2020

API definitions littlemanco/the-golden-path.net#2

Open

elharo closed this as completed Oct 1, 2021

This was referenced Aug 16, 2023

Can be deleted #13575

Closed

Enable debug_redact for Java Logging #13576

Open

Falco20019 mentioned this issue Mar 11, 2024

Redaction of sensitive information serilog/serilog#2023

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support data redaction, so that sensitive data in protos doesn't leak into places like log files #1160

Support data redaction, so that sensitive data in protos doesn't leak into places like log files #1160

jhump commented Jan 20, 2016

jhump commented Jan 20, 2016

toddlipcon commented Dec 22, 2016

ghost commented Jun 11, 2019

elharo commented Oct 1, 2021

toddlipcon commented Oct 1, 2021

cconroy commented Oct 1, 2021

fowles commented Oct 1, 2021

runlilong commented Nov 30, 2021

jhump commented Nov 30, 2021

elharo commented Nov 30, 2021

jhump commented Nov 30, 2021

zellyn commented Nov 30, 2021 •

edited

Loading

fowles commented Nov 30, 2021

zellyn commented Nov 30, 2021

runlilong commented Dec 1, 2021

jhump commented Jan 3, 2023

zellyn commented Jan 3, 2023

fowles commented Jan 3, 2023

KomarDL commented Apr 30, 2024

fowles commented Apr 30, 2024

KomarDL commented May 2, 2024 •

edited

Loading

fowles commented May 2, 2024

KomarDL commented May 3, 2024

fowles commented May 3, 2024

Support data redaction, so that sensitive data in protos doesn't leak into places like log files #1160

Support data redaction, so that sensitive data in protos doesn't leak into places like log files #1160

Comments

jhump commented Jan 20, 2016

jhump commented Jan 20, 2016

toddlipcon commented Dec 22, 2016

ghost commented Jun 11, 2019

elharo commented Oct 1, 2021

toddlipcon commented Oct 1, 2021

cconroy commented Oct 1, 2021

fowles commented Oct 1, 2021

runlilong commented Nov 30, 2021

jhump commented Nov 30, 2021

elharo commented Nov 30, 2021

jhump commented Nov 30, 2021

zellyn commented Nov 30, 2021 • edited Loading

fowles commented Nov 30, 2021

zellyn commented Nov 30, 2021

runlilong commented Dec 1, 2021

jhump commented Jan 3, 2023

zellyn commented Jan 3, 2023

fowles commented Jan 3, 2023

KomarDL commented Apr 30, 2024

fowles commented Apr 30, 2024

KomarDL commented May 2, 2024 • edited Loading

fowles commented May 2, 2024

KomarDL commented May 3, 2024

fowles commented May 3, 2024

zellyn commented Nov 30, 2021 •

edited

Loading

KomarDL commented May 2, 2024 •

edited

Loading