Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide client command arguments #11747

Open
wants to merge 4 commits into
base: unstable
Choose a base branch
from
Open

Conversation

naglera
Copy link

@naglera naglera commented Jan 23, 2023

In the event of an assertion failure, hide command arguments from the operator.

In some cases, private client information can be voluntarily exposed when a redis instance crashes due to an assertion failure.
This commit prevent וnintentional client info exposure.
Operators can still access the hidden data, but they must actively request it.
Any of the client info commands remains the unchanged.

@naglera naglera marked this pull request as ready for review January 23, 2023 12:34
@madolson
Copy link
Contributor

madolson commented Jan 24, 2023

This is from AWS, and I'll add some more context about the intention. We wanted to make sure that user data wasn't getting stored into log files, which has use cases around data sovereignty and security. The idea is that we should be able to produce a clean log file that can be stored and moved and only contain server information.

Some thoughts about this:

  1. I think crashes should have this information by default. Crashes are a pretty bad end state, and the end user is voluntarily choosing to send this information to us. We could add a disclaimer that their may be PII, but I don't want users omitting information that could help us debug it.
  2. I think it's generally assumed that logs may contain PII, other databases provide a parser ontop that will filter out data. We could provide some type of symbol that indicates that something might contain PII that could be parsed.
  3. There are other places in the logs where PII might get exposed, so I'm not sure why we are just targeting this place.

@yossigo
Copy link
Member

yossigo commented Jan 29, 2023

There are other places in the logs where PII might get exposed, so I'm not sure why we are just targeting this place.

I agree. We should start by coming up with a list of those places and what could fall under this definition (e.g. client names, file names, redis.log() Lua output?).

@madolson
Copy link
Contributor

I agree. We should start by coming up with a list of those places and what could fall under this definition (e.g. client names, file names, redis.log() Lua output?).

Just adding on ACL users and any command arguments or partial query information (we dump that in a couple of places) since we can't know their content.

@oranagra Do you have any thoughts about providing some way to sanitize user data out of log files?

src/networking.c Outdated
@@ -2762,7 +2769,7 @@ sds catClientInfoString(sds s, client *client) {
getClientPeerId(client),
getClientSockname(client),
connGetInfo(client->conn, conninfo, sizeof(conninfo)),
client->name ? (char*)client->name->ptr : "",
(client->name && !shouldHideClientInfo()) ? (char*)client->name->ptr : "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we hide the client name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client name can hold sensitive information about the client purpose/ what the client does

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding client-list, I hide clients names because it is printed inside crash reports.
The client names will not be hidden when calling client-list directly.

ok, now i understand that change better (had difficulties wrapping my head around the code).

i think the solution of using isBugReportStart and shouldHideClientInfo is ugly, and instead, it would be much nicer to just add a hide_user_info argument to catClientInfoString and getAllClientsInfoString.
the problem with that is that catClientInfoString is widely used, and we don't wanna modify all the calls.
the solution to that is to add an alias function, i.e. catClientInfoStringGeneric will take the argument and the existing catClientInfoString will just be a wrapper.

src/config.c Outdated Show resolved Hide resolved
@ranshid
Copy link
Collaborator

ranshid commented Jan 30, 2023

I agree regarding the extended effort needed in this case. I think we should provide some sort of mask-pii mode for redis so it will avoid reporting internal data (key names, values, client names, addresses etc... as they might hold some info regarding customer application) but we will also need to provide the customer way to get that data in case he needs it (like turning off the pii masking per connection?)

@naglera
Copy link
Author

naglera commented Jan 30, 2023

but we will also need to provide the customer way to get that data in case he needs it

Customer will be able to watch those details by disabling hide-client-info (or in case it is disabled by default, they will not have to do anything)

Independent parser to run over logs and omit client parts is a complex alternative. It gets strings as input while in the config case we have the full logical context.

@ranshid
Copy link
Collaborator

ranshid commented Jan 30, 2023

but we will also need to provide the customer way to get that data in case he needs it

Customer will be able to watch those details by disabling hide-client-info (or in case it is disabled by default, they will not have to do anything)

Independent parser to run over logs and omit client parts is a complex alternative. It gets strings as input while in the config case we have the full logical context.

you mean that admin would be able to do multi; config set hide-client-info no; client list; config set hide-client-info yes; exec?
so yes.. it would be possible but I suspect it might have some strange issues, like keyspace notification data and other potential data which can propagate to other clients. I agree that we can offer this at first stage, but I think this will require some thoughts

@oranagra
Copy link
Member

Not a fan of the idea, but It could be an acl user flag.
This way an existing monitoring application will not require modifications (like the above multi)

But I still don't understand why we want to filter client list. Unlike the crash log, it's just a command, like KEYS or GET, and can be blocked by acl

@ranshid
Copy link
Collaborator

ranshid commented Jan 31, 2023

Not a fan of the idea, but It could be an acl user flag. This way an existing monitoring application will not require modifications (like the above multi)

But I still don't understand why we want to filter client list. Unlike the crash log, it's just a command, like KEYS or GET, and can be blocked by acl

It is true that it can be blocked, but in cases were Redis is provided as a service sometimes clients would like to prevent operators from accessing private data while the operators would still need some tools in order to debug/manage the service .commands like slowlog, info client list etc... can be used by both operator and clients while having this flag can help reduce the potential pii leakage

src/debug.c Outdated Show resolved Hide resolved
src/debug.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
@yossigo
Copy link
Member

yossigo commented Jan 31, 2023

@ranshid I think we need to consider the use case you describe as a separate feature. Preventing PII from getting into log files is one thing. Removing PII from the output of administrative commands is another thing - it takes us to multi-level admin privileges, which is a deeper rabbit hole.

@ranshid
Copy link
Collaborator

ranshid commented Jan 31, 2023

@ranshid I think we need to consider the use case you describe as a separate feature. Preventing PII from getting into log files is one thing. Removing PII from the output of administrative commands is another thing - it takes us to multi-level admin privileges, which is a deeper rabbit hole.

I agree that some deeper thought should be placed to this, I was only pointing out some use cases which we encountered in AWS. As said it is possible to use some external tool to remove PII from logs and command outputs, but in such cases customers cannot realy validate the efficiency of such tool so having a redis built-in mechanism can help build trust with external users.

@naglera
Copy link
Author

naglera commented Jan 31, 2023

@oranagra & @ranshid - regarding client-list, I hide clients names because it is printed inside crash reports.
The client names will not be hidden when calling client-list directly.
Thus the multi you suggested isn't necessary

src/config.c Outdated
@@ -3061,6 +3061,7 @@ standardConfig static_configs[] = {
createBoolConfig("latency-tracking", NULL, MODIFIABLE_CONFIG, server.latency_tracking_enabled, 1, NULL, NULL),
createBoolConfig("aof-disable-auto-gc", NULL, MODIFIABLE_CONFIG, server.aof_disable_auto_gc, 0, NULL, updateAofAutoGCEnabled),
createBoolConfig("replica-ignore-disk-write-errors", NULL, MODIFIABLE_CONFIG, server.repl_ignore_disk_write_error, 0, NULL, NULL),
createBoolConfig("hide-client-info", NULL, MODIFIABLE_CONFIG, server.hide_client_info, 0, NULL, NULL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the name of this config is not clear enough.
if it keeps it's current purpose, it should say "log".
alternatively we need to design where this is going before making any changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. documentation in redis.conf is missing.

src/debug.c Outdated
@@ -1022,6 +1022,11 @@ void _serverAssert(const char *estr, const char *file, int line) {
bugReportEnd(0, 0);
}

/* Returns the amount of client's command arguments we allow logging */
int clientArgsToShow(const client *c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename "Show" to "Log"

src/networking.c Outdated
@@ -2712,7 +2712,7 @@ char *getClientSockname(client *c) {

/* Concatenate a string representing the state of a client in a human
* readable format, into the sds string 's'. */
sds catClientInfoString(sds s, client *client) {
sds catClientInfoStringGeneric(sds s, client *client, bool hide_user_info) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hide_user_info is misleading, it could mean that we hide the ACL user name.

actually, i'll argue that again that i don't see why CLIENT SETNAME is PII, and if it is, then the ACL user name could be too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, client name will not be redacted

src/networking.c Outdated
@@ -2762,7 +2762,7 @@ sds catClientInfoString(sds s, client *client) {
getClientPeerId(client),
getClientSockname(client),
connGetInfo(client->conn, conninfo, sizeof(conninfo)),
client->name ? (char*)client->name->ptr : "",
(client->name && !hide_user_info) ? (char*)client->name->ptr : "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should say --redacted-- instead of remain empty?

src/debug.c Outdated
serverLog(LL_WARNING|LL_RAW,"%s\n", client);
sdsfree(client);
for (j = 0; j < cc->argc; j++) {
for (j = 0; j < clientArgsToShow(cc); j++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we hide the arguments, let's at least print the argc instead, and maybe a bold "redacted" message.

@oranagra
Copy link
Member

oranagra commented Feb 8, 2023

we discussed this in a core-team meeting and concluded that we would like to proceed only after we prepare some detailed list of everything we could want to redact (maybe host names and other things).

@naglera
Copy link
Author

naglera commented Feb 8, 2023

Ack, I will create a draft list for discussion

@naglera
Copy link
Author

naglera commented Feb 12, 2023

Those are the points I found which we might spill to logs some PII.
If we choose to include scripts and files names, the list my expand.
Also if there are places where we spill lua outputs let me know (I am not aware of such)

File Method Comments
debug.c _serverAssertPrintClientInfo command argumants
  logCurrentClient command arguments
replication.c showLatestBacklog  
  replicationFeedStreamFromMasterStream in if(0)
slowlog.c slowlogCommand Not sure if its counted as log or command

Please let me know if I missed something

@oranagra
Copy link
Member

I don't think SLOWLOG is part of it, if it did then MONITOR and CLIENT LIST are too.
what about logStackContent?

@naglera
Copy link
Author

naglera commented Feb 13, 2023

I agree regarding slowLogCommand lets leave it out of scope.
Isn't logStackContent logs the stack trace? If so in what cases does stack trace contains PII?

@oranagra
Copy link
Member

it doesn't log the stack "trace". it logs the stack contents (could contain variables)

@madolson
Copy link
Contributor

If we choose to include scripts and files names, the list my expand.

I would be inclined to say function names count and file names do not. Typically file names are administrator, and we can ask them to set it to something not related to user data.

@naglera
Copy link
Author

naglera commented Feb 15, 2023

Updated list

File Method Comments
debug.c _serverAssertPrintClientInfo Command arguments
  logCurrentClient Command arguments
logStackContent Stack frames content
replication.c showLatestBacklog Log latest commands
  replicationFeedStreamFromMasterStream In if(0)

@naglera
Copy link
Author

naglera commented Feb 27, 2023

Hide replication backlog & stack frame and fix comments

@naglera naglera reopened this Feb 27, 2023
@naglera naglera closed this Jul 2, 2023
@naglera naglera force-pushed the unstable branch 2 times, most recently from e848b0c to 2617412 Compare July 2, 2023 08:22
@madolson
Copy link
Contributor

madolson commented Jul 2, 2023

@naglera Are you abandoning this effort?

@naglera
Copy link
Author

naglera commented Jul 3, 2023

No, horrible git accident, I started this commit from unstable, and it closed automatically when I pulled from upstream.

By the way, @madolson, are you also approving the list? #11747 (comment)

@madolson
Copy link
Contributor

madolson commented Jul 3, 2023

@naglera Yeah, that looked good to me :)

@sundb
Copy link
Collaborator

sundb commented Mar 11, 2024

@naglera What about this test code? since replicationFeedStreamFromMasterStream needs to be hidden, i think it should too.

redis/src/t_stream.c

Lines 352 to 360 in 5fdaa53

void streamLogListpackContent(unsigned char *lp) {
unsigned char *p = lpFirst(lp);
while(p) {
unsigned char buf[LP_INTBUF_SIZE];
int64_t v;
unsigned char *ele = lpGet(p,&v,buf);
serverLog(LL_WARNING,"- [%d] '%.*s'", (int)v, (int)v, ele);
p = lpNext(lp,p);
}

src/server.h Outdated Show resolved Hide resolved
src/debug.c Outdated Show resolved Hide resolved
src/replication.c Outdated Show resolved Hide resolved
naglera and others added 3 commits March 11, 2024 10:45
fix spaces

Co-authored-by: debing.sun <debing.sun@redis.com>
fix spaces

Co-authored-by: debing.sun <debing.sun@redis.com>
fix log

Co-authored-by: debing.sun <debing.sun@redis.com>
@naglera
Copy link
Author

naglera commented Mar 11, 2024

Hi @sundb, I'm not sure if we actually use streamLogListpackContent, but if it's being used locally to debug, I don't want to complicate things. I'm also not sure if this PR is still relevant.

Comment on lines +1085 to +1086
if (j >= clientArgsToLog(c)){
serverLog(LL_WARNING|LL_RAW,"client->argv[%d]: *redacted*\n", j);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (j >= clientArgsToLog(c)){
serverLog(LL_WARNING|LL_RAW,"client->argv[%d]: *redacted*\n", j);
if (j >= clientArgsToLog(c)) {
serverLog(LL_WARNING|LL_RAW,"client->argv[%d]: *redacted*", j);

@@ -1917,6 +1930,10 @@ void logCurrentClient(client *cc, const char *title) {
sdsfree(client);
serverLog(LL_WARNING|LL_RAW,"argc: '%d'\n", cc->argc);
for (j = 0; j < cc->argc; j++) {
if (j >= clientArgsToLog(cc)){
serverLog(LL_WARNING|LL_RAW,"argv[%d]: *redacted*\n", j);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serverLog(LL_WARNING|LL_RAW,"argv[%d]: *redacted*\n", j);
serverLog(LL_WARNING|LL_RAW,"argv[%d]: *redacted*", j);

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

None yet

7 participants