-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support data redaction, so that sensitive data in protos doesn't leak into places like log files #1160
Comments
We've just implemented essentially the same idea for Apache Kudu (https://gerrit.cloudera.org/#/c/5553/) and it was relatively painful to do without modifying the Protobuf library. Another idea that could work (at least in the C++ library implementation) would be to change Message::ShortDebugString and Message::String from constructing a new Printer each time to using a singleton Printer instance each. I believe Printer is itself thread-safe once configured (i.e can be used to print concurrently from multiple threads), and thus having a singleton printer would allow users to configure a custom field printer to handle their own redaction needs. I haven't looked at the Java library yet to see if the same is doable. |
@acozzette This seems like a recurring feature request |
Can use custom options to do this: https://developers.google.com/protocol-buffers/docs/proto3#customoptions |
I don't think that's a satisfactory workaround - as I mentioned above in #1160 (comment) one can do it, but you have to change all callsites away from using the normal DebugString functions to your own (in Kudu we wrote one called SecureDebugString). That was quite a pain and it's easy for people to forget. |
Chiming in, I can confirm we still have this as a hard requirement at Square more than 5 years later and still have to maintain a fork of protoc to get this functionality. |
Proto cannot be in the business of enforcing corporate policy in this way. There will always be ways to extract data from them. Any company building on it will require their own linting and technical solutions to ensure something end-to-end. |
https://go.dev/blog/protobuf-apiv2 |
@runlilong, that blog is about the new reflection and descriptor support. The issue is that the latest protoc plugin still generates a
@fowles, that is fine. But it sure would be nice if the runtime libraries had a level of pluggability/configurability when it comes to the generated "stringify" functions, so that corporations could actually enforce their own policies. I think @toddlipcon's suggestion could be applied to other language runtimes, allowing code to set an alternate "printer" during program initialization that could handle custom policies. |
This feels similar in spirit to #9114 though the details are quite different. Bottom line: the protobuf libraries do not support custom input or output formats, nor is this a goal. The proto libraries are designed to consume and produce very specific, documented formats as used by gRPC and many other systems. Any other format — protobuf, JSON, or something else — can be created or consumed by writing code to filter the standard inputs and output as needed by a specific application. Indeed this is such a common thing to do that it's become a Meme about SWE careers at Google. However this code is in a separate layer and does not need to be committed to the core libraries in this repo. |
@elharo, I don't agree that's the request. I think it would be fine for the implementation of this redacted format to be external to the protobuf runtime. The real thing we're talking about is providing some sort of override or hook into the automatic "stringification" of messages -- e.g. the |
@elharo with respect, I'm not sure you're understanding the request. The problem is that if you have an SSN or credit card number in a protobuf field in a language-specific proto object in Java/Go/whatever, and you accidentally We're all fully capable of writing code on top of protobufs to do any manner of filtering and conversion — not only are several of us ex-Googlers, but Square also worked with Google on gRPC before it was released. We've chatted with the protobuf team about the idea of a "sensitive" field option, and they said it made sense. Indeed, they mentioned that the credit-card-processing parts inside Google have a similar mechanism. Unfortunately, they said it was unlikely to get merged: mainly for organizational/process reasons. I've pondered squirreling away and dropping fully-complete PRs for A more incremental approach would be to first create a "blessed" field annotation ( @jhump and I initially wrote up a proposal here, but further feedback is welcome. |
The need and desire here is real. Unfortunately, we are pretty far away from being able to prioritize the level of undertaking this would require. It is a pretty large chunk of design and implementation across many languages and bindings. |
The design is pretty simple: a "sensitive" field option. The implementation can be contributed by the community, language by language. It only affects the text serialization, which, IIRC correctly from my time at Google, is plastered with "not stable; don't depend on this" warnings. All we'd need is a go-ahead, and the initial addition of "sensitive" to |
@jhump If there is a hook to custom String() is the best. Unfortunately, there is not; for the redaction on the log, maybe we can do our custom redact actions with the new reflection feature during the log process. I get some help from this blog and the repository. |
Oh, nice catch Josh! |
We are doing some internal work in this space. I don't know how much we will have to build on in the near term though. |
@fowles Hi! Do we have an update on this issue? Thank you |
There are a bunch of things that enable one to turn this on for C++ right now. There is working ongoing for Java.
protobuf/src/google/protobuf/text_format.cc Lines 3030 to 3031 in e1559c8
https://github.com/protocolbuffers/protobuf/blob/e1559c8efdcfc2d66edfd10b02022ecd737534b9/src/google/protobuf/text_format.cc#L98C35-L98C66 |
Great work! I'm not professional in protobuf. Where can i find docs or some example of how to use it? As i understood i should set |
We don't have public docs on this yet because we are focused on building out the Java implementation. Right now, you need to set those flags to true and then mark fields with debug_redact = true in the options for them. |
Okey, thanks. And can you name approximate time frame for implementing this for Java? |
Implementation is ongoing, but I would not expect to see it released until next year. |
In every server language, it is trivial for someone to include a proto in an unstructured log line:
logger.info("something happened: %s", proto);
It is very common for such log output to be ingested into various systems (like exception tracking / alerting, full-text indexing for search) which suddenly makes it possible to accidentally leak sensitive or secret data into these systems (PII, account numbers and PANs, passwords, etc). Scraping all such sensitive data after it is accidentally leaked can be a huge pain, depending on the tools used. And identifying what should be redacted in the path of ingestion is error-prone. Pattern marching could identify things that look like social security numbers and credit card numbers, but phone numbers can get tricky (especially if supporting international phone number formats) and addresses moreso. Passwords? Forget about it...
So having a way to redact fields from operations that convert messages into strings (
Object#toString
in Java, for example) is extremely useful to prevent accidental logging of sensitive data. Even if this functionality doesn't belong in the core protobuf runtime libraries, I think there should at least be a hook point in the library so that redaction can be "plugged in" for a given deployment.This problem is simpler when all logs are structured (e.g. not unstructured text output, log processors can then use custom message and field options to filter/modify the logs to prevent leaking sensitive data). But even when using structured logging for some things, there is likely always a need for unstructured log output from a server, even if just for debugging.
Strawman solution (using Java runtime as an example, but pattern can be applied to other runtimes/languages):
redacted
fields toFieldOptions
andMessageOptions
indescriptor.proto
.TextFormat.Parser.Builder
gets a new setting:setRedacted(boolean)
(defaults to false, existing behavior). When true, redacted messages result in"{ <redacted> }"
and redacted fields, if present, are shown asfieldname: <redacted>
.AbstractMessage#toString()
updated to use redaction feature ofTextFormat
.If adding
redacted
options intodescriptor.proto
is going too far, thenTextFormat.Parser.Builder#setRedacted
could instead take aGeneratedExtension<FieldOptions, Boolean>
-- a custom field option that is queried to determine if field values should be skipped. This would require a way to configure a default option that can then be used fromAbstractMessage#toString
.The text was updated successfully, but these errors were encountered: