Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): redact sql option in log #16871

Merged
merged 8 commits into from
May 28, 2024
Merged

feat(frontend): redact sql option in log #16871

merged 8 commits into from
May 28, 2024

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented May 21, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR might be reverted in the future, once the more universal secure store solution is prepared.

This PR redacts SQL options when recording CREATE queries in the log.

  • For queries that can be successfully parsed, it redacts the strongly typed SQL options within the parsed Statement.
  • For queries that can not be successfully parsed, such as those with syntax errors, it's not redacted.
  • Find the unit tests for examples.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@zwang28 zwang28 requested a review from a team as a code owner May 21, 2024 08:54
@zwang28

This comment was marked as outdated.

v4 = '',
) FORMAT plain ENCODE json (a='1',b='2')
";
assert_eq!(redact_sql(sql), "CREATE SOURCE temp (k BIGINT, v CHARACTER VARYING) WITH (connector = [REDACTED], v1 = [REDACTED], v2 = [REDACTED], v3 = [REDACTED], v4 = [REDACTED]) FORMAT PLAIN ENCODE JSON (a = [REDACTED], b = [REDACTED])");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it might be too aggressive to me... Can we just redact some sensitive fields? For example, any fields that contain "password"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just redact some sensitive fields? For example, any fields that contain "password"

By maintaining a blacklist of sql options in the kernel?

Copy link
Contributor

@fuyufjh fuyufjh May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's acceptable. As a quick fix, we can do it in some quick & dirty way, such as checking whether field_name.contains("password") or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

src/sqlparser/src/ast/mod.rs Outdated Show resolved Hide resolved
.into_iter()
.map(|sql| sql.to_redacted_string())
.join(";"),
Err(_) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I also find this not that necessary. 😕 We can just leave the statements that failed to parse as they are.

In other words, what if users misspell "with" as "wth"? It appears that we are unable to handle all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@zwang28
Copy link
Contributor Author

zwang28 commented May 23, 2024

example:

The log below was redacted. abctokenabc was redacted because it contained keyword token.

2024-05-23T16:10:00.243564+08:00 INFO handle_query{mode="simple query" session_id=0 sql=CREATE TABLE tomb (v1 INT) WITH (connector = 'datagen', fields.v1.kind = 'sequence', datagen.rows.per.second = '1', abctokenabc = [REDACTED]) FORMAT PLAIN ENCODE JSON}: pgwire_query_log: status="ok" time=70ms

The log below was not redacted because the statement failed to parse.

2024-05-23T16:10:07.48916+08:00 ERROR handle_query{mode="simple query" session_id=0 sql=CREATE TABLEXXXXX tomb (v1 int)
WITH (
  connector = 'datagen',
  fields.v1.kind = 'sequence',
  datagen.rows.per.second = '1',
  abctokenabc = '222'
)
FORMAT PLAIN
ENCODE JSON;}: pgwire::pg_protocol: error when process message error=Failed to run the query: sql parser error: Expected an object type after CREATE, found: TABLEXXXXX at line:1, column:18

@zwang28 zwang28 requested review from fuyufjh and BugenZhao May 23, 2024 08:16
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM as it will be eventually superseded by the SECRET approach.

src/common/src/config.rs Outdated Show resolved Hide resolved
@@ -27,6 +27,7 @@ normal = ["workspace-hack"]
[dependencies]
itertools = { workspace = true }
serde = { version = "1.0", features = ["derive"], optional = true }
tokio = { version = "0.2", package = "madsim-tokio" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Actually tokio::task_local does not rely on any components of the tokio runtime. We may consider extracting it into a separate crate in the future.

Comment on lines +183 to +188
pg_serve(
&listen_addr,
session_mgr,
TlsConfig::new_default(),
Some(redact_sql_option_keywords),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a little bit dirty though. 🤣

Copy link
Contributor Author

@zwang28 zwang28 May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes...
We can wrap it with another context object.
Anyway the PR will be reverted in the near future..

@BugenZhao BugenZhao requested a review from xxchan May 27, 2024 02:24
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a quick fix.

@zwang28 zwang28 enabled auto-merge May 28, 2024 02:20
@zwang28 zwang28 disabled auto-merge May 28, 2024 02:27
@zwang28 zwang28 enabled auto-merge May 28, 2024 02:35
@zwang28 zwang28 added this pull request to the merge queue May 28, 2024
Merged via the queue into main with commit 6c81beb May 28, 2024
27 of 28 checks passed
@zwang28 zwang28 deleted the wangzheng/redact_log branch May 28, 2024 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants