Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Redis instrumentation query sanitization #1572

Merged
merged 16 commits into from
Feb 4, 2023

Conversation

tombruijn
Copy link
Contributor

Description

Add a query sanitizer to the Redis instrumentation. This can be disabled
with the sanitize_query = False config option.

Given the query SET key value, the sanitized query becomes SET ? ?.
Both the keys and values are sanitized, as both can contain PII data.

The Redis queries are sanitized by default. This changes the default
behavior of this instrumentation. Previously it reported unsanitized
Redis queries.

This was previously discussed in the previous implementation of this PR
in PR #1571

Closes #1548

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • tox -e test-instrumentation-redis

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Add a query sanitizer to the Redis instrumentation. This can be disabled
with the `sanitize_query = False` config option.

Given the query `SET key value`, the sanitized query becomes `SET ? ?`.
Both the keys and values are sanitized, as both can contain PII data.

The Redis queries are sanitized by default. This changes the default
behavior of this instrumentation. Previously it reported unsanitized
Redis queries.

This was previously discussed in the previous implementation of this PR
in PR open-telemetry#1571

Closes open-telemetry#1548
@avzis
Copy link
Contributor

avzis commented Jan 10, 2023

this looks great,
will solve #1548

The Redis test that performs the tests with the default options, doesn't
need to uninstrument and then instrument the instrumentor. This commit
removes the unnecessary setup code. The setup code is already present at
the top of the file.
@tombruijn tombruijn requested a review from avzis January 11, 2023 06:58
@tombruijn
Copy link
Contributor Author

I fixed the lint issue that failed the build previously. I couldn't get the linter to run locally first, but I confirmed it now passes locally.

@shalevr
Copy link
Member

shalevr commented Jan 15, 2023

I'm not sure why to change the default behavior,
you need to update the test here https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/tests/opentelemetry-docker-tests/tests/redis/test_redis_functional.py
and other users have to change their code too or get different results than expected

@tombruijn
Copy link
Contributor Author

I'm not sure why to change the default behavior,

This was briefly discussed in the previous implementation of this, in PR #1571.

I think sanitizing the query is a good default. It's better because users won't send any Personal Identifying Information by default, and need to explicitly enable full query reporting if they really need it.

you need to update the test here https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/tests/opentelemetry-docker-tests/tests/redis/test_redis_functional.py
and other users have to change their code too or get different results than expected

Okay, sounds good. I'm just running into another issue where I can't run this test suite. I'm getting connection issues for something. I'll have a look.

- Update the sanitizer to also account for a max `db.statement`
  attribute value length. No longer than 1000 characters.
- Update the functional tests to assume the queries are sanitized by
  default.
- Add new tests that test the behavior with sanitization turned off.
  Only for the tests in the first test class. I don't think it's needed
  to duplicate this test for the clustered and async setup combinations.
@tombruijn
Copy link
Contributor Author

@shalevr I've updated the functional tests, added new tests to test sanitize_query=False and made sure the db.statement is not longer than 1000 chars. Let me know what you think.

Change the Redis functional tests so that they test the unsanitized
query by default, and test the sanitized query results in the separate
test functions.

This is a partial revert of the previous commit
8d56c2f
@tombruijn
Copy link
Contributor Author

tombruijn commented Jan 24, 2023

@srikanthccv with the discussion about query sanitization by default ongoing in the spec repo, shall I change this PR to not sanitize the queries by default, but make the option available to sanitize query opt-in?
Then we can merge this PR at least.

@srikanthccv
Copy link
Member

Yes, please do.

Update the Redis instrumentation library to not change the default
behavior for the Redis instrumentation. This can be enabled at a later
time when the spec discussion about this topic has concluded.

open-telemetry/opentelemetry-specification#3104
@tombruijn
Copy link
Contributor Author

@srikanthccv I've changed the sanitize_query config option to be False by default and resolved the merge conflict with the main branch. Can you retry the build? (I ran the tests, black and flake8 locally, so they should be good.)

Remove else statement.
@tombruijn
Copy link
Contributor Author

tombruijn commented Jan 25, 2023

@srikanthccv I pushed the fix for pylint issue.

I really hope it works now. I just cannot get it to fully run all linters locally. Works!

tombruijn and others added 2 commits February 3, 2023 14:58
[ci skip]

Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
Check the length of the args array and return an empty string if there
are no args.

That way it won't cause an IndexError if the args array is empty and it
tries to fetch the first element, which should be the Redis command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement sensitive data sanitization for redis instrumentation
4 participants