Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to disable db.statement #1659

Merged

Conversation

pellared
Copy link
Member

@pellared pellared commented Apr 28, 2021

Why

db.statement can expose a lot of sensitive information

Currently db.statement is specified to be required if applicable.

Unfortunately, the SQL db.statement (and others database types most probably as well) are often not properly parametrized and therefore leak data that is inside the databases. Even all examples in the spec are "leaking" some data. Additionally it is not easy to introduce parameterized queries to the legacy systems or it will take a very long time to fix the queries (sometimes it may be even a problem with some data access library). At last, not everything in the query can be parametrized (e.g. table names for MS SQL Server).

What

Mention in the specs that it should be possible to disable adding the db.statement attribute via instrumentation configuration (thanks @yurishkuro).
This change is backward compatible.

@pellared pellared changed the title db.statement MAY be disabled Add possibility to disable db.statement Apr 28, 2021
@pellared pellared marked this pull request as ready for review April 28, 2021 21:59
@pellared pellared requested review from a team as code owners April 28, 2021 21:59
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what mechanism is supposed to be used for disabling this attribute. Either we need to (1) expose a common API that instrumentation can query to decide whether to provide db.statement attribute to the span or not (a bad option imo), or (2) there can be a configuration for the exporter to strip out db.statement from the spans (a reasonable approach, but not super efficient), or (3) the best option, I think, is to have the instrumentation itself be configurable to include / not include db.statement.

@pellared
Copy link
Member Author

pellared commented Apr 28, 2021

It is not clear what mechanism is supposed to be used for disabling this attribute. [...] (3) the best option, I think, is to have the instrumentation itself be configurable to include / not include db.statement.

@yurishkuro Great idea 👍

I tried to capture it here: 656cee2

@Oberon00
Copy link
Member

Oberon00 commented May 3, 2021

Should parsing the SQL to provide at least something like db.operation=SELECT be recommended if db.statement is disabled? Otherwise you could end up with spans that only have db.system.

@pellared
Copy link
Member Author

pellared commented May 3, 2021

Should parsing the SQL to provide at least something like db.operation=SELECT be recommended if db.statement is disabled? Otherwise you could end up with spans that only have db.system.

  1. As far as I understand, currently the specification discourages parsing the statement.
  2. It may not work correctly for all cases. Parsing may be so easy and it will rather be database specific code. Also in case of a bug it may provide disinformation with is far worse than lack of information.
  3. Parsing may decrease performance. It should be configurable.

I feel that such recommendation could give more harm than benefits. SRE / DBA can always use additional database specific profilers or other DB monitoring tools to analyze DB specific issues.

@Oberon00
Copy link
Member

Oberon00 commented May 3, 2021

I agree that parsing has disadvantages. But I wonder how useful a span without db.statement and db.operation usually is.

@pellared
Copy link
Member Author

pellared commented May 3, 2021

I agree that parsing has disadvantages. But I wonder how useful a span without db.statement and db.operation usually is.

A lot. Still can be used to e.g. identify if DB is a bottleneck for some service's endpoint/operation/method. IMO duration is the most valuable and critical info here

@iNikem
Copy link
Contributor

iNikem commented May 3, 2021

Maybe totally unrelated, but duration can/should be reported as metric.

@carlosalberto
Copy link
Contributor

I will merge this as it's being standing for a few days now. @Oberon00 @iNikem please fill issues for potential improvements if/as you deem them fit.

@carlosalberto carlosalberto merged commit 492718f into open-telemetry:main May 3, 2021
@pellared pellared deleted the add-disable-db.statement-option branch May 3, 2021 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants