Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(db.values): add db.bind.values to yaml #3092

Closed
wants to merge 22 commits into from

Conversation

haddasbronfman
Copy link
Member

@haddasbronfman haddasbronfman commented Jan 10, 2023

Fixes open-telemetry/semantic-conventions#716

Changes

Add db.bind.values to DB span. it contains the values that come with the query (like in pg/mysql and more)

@haddasbronfman haddasbronfman requested a review from a team as a code owner January 12, 2023 07:39
@haddasbronfman
Copy link
Member Author

@joaopgrassi This is the "real" PR. I closed the other one (#3091 )

semantic_conventions/trace/database.yaml Outdated Show resolved Hide resolved
semantic_conventions/trace/database.yaml Outdated Show resolved Hide resolved
semantic_conventions/trace/database.yaml Outdated Show resolved Hide resolved
@joaopgrassi
Copy link
Member

Also, please add a change log entry for this change under the Semantic Conventions section:
https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md?plain=1#L39

@@ -204,6 +204,15 @@ groups:
The database statement being executed.
note: The value may be sanitized to exclude sensitive information.
examples: ['SELECT * FROM wuser_table', 'SET mykey "WuValue"']
- id: values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using db.bind.values of types string[] and additionally db.bind.keys of type string[]?
Or a bit longer db.bind_params.keys/db.bind_params.values?

If there are no bind keys it should be omitted, if they exist the size of the two arrays must match and key[x] fits to value[x].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss about naming, but I think capturing the keys also makes sense. Then the bind_values requirement level can be Conditionally Required based if the keys are also recorded and they must match in index as you said.

What do you think @haddasbronfman?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can also be done in a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more attributes in a followup should be fine but I think we should avoid renames in followups. In special the decision to use a namespace (db.bind.XXX) or not not should happen now.

Or is there a specific reason to merge fast? Once merge instrumentations may use it and if we rename this creates unneeded friction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean something like this? db. bind_params.keys = [name, age, city] and db. bind_params.values = [David, 38, NY] ?
Just to make sure I understood you.

I'm not sure if this is necessary because the db.statement attribute already gives us the keys of the query (e.g. "select * from TABLE where name=? and age=? and city=?")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me to have the "db.bind" namespace and store arrays under this namespace. if I understand correctly, the suggestion is to spec "db.bind.values" now and leave the other attributes to follow-up PRs, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw your comment below now. I think the idea was not to introduce a new namespace.. but maybe we should? Not sure now 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the only thing left is to decide whether to use db.bind_values or db.bind.values. @joaopgrassi What do you think after what @blumamir wrote below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on it.. as long as the namespace doesn't collide later with something "standalone" and has an actual need to be the home of multiple attributes then I think it makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense as it would allow us to introduce db.bind.keys and db.bind.types in future.

@joaopgrassi
Copy link
Member

@haddasbronfman are you aware of instrumentation recording such values already? Would be interesting to know and could help us define things here.

@haddasbronfman
Copy link
Member Author

Hi, yes.
I added it in mysql and it exists in postgres too.

joaopgrassi
joaopgrassi previously approved these changes Feb 1, 2023
CHANGELOG.md Outdated Show resolved Hide resolved
@joaopgrassi
Copy link
Member

I added it in mysql and it exists in postgres too.

@haddasbronfman thanks for the links. I think this looks good. It's still experimental, so if when updating the implementations something comes up we can always tweak it. Also about the topic on recording the "bind keys"

change db.values to db.bind_values

Co-authored-by: Joao Grassi <joao@joaograssi.com>
@blumamir
Copy link
Member

blumamir commented Feb 5, 2023

Another example supporting the db.bind namespace: aws dyanmodb has many places in the API where user needs to specify the "type" of the bind values. For example:

const params = {
  // Specify which items in the results are returned.
  FilterExpression: "Subtitle = :topic AND Season = :s AND Episode = :e",
  // Define the expression attribute value, which are substitutes for the values you want to compare.
  ExpressionAttributeValues: {
    ":topic": {S: "SubTitle2"},
    ":s": {N: 1},
    ":e": {N: 2},
  },
  // Set the projection expression, which are the attributes that you want.
  ProjectionExpression: "Season, Episode, Title, Subtitle",
  TableName: "EPISODES_TABLE",
};

ddb.scan(params, function (err, data) {
  if (err) {
    console.log("Error", err);
  } else {
    console.log("Success", data);
    data.Items.forEach(function (element, index, array) {
      console.log(
          "printing",
          element.Title.S + " (" + element.Subtitle.S + ")"
      );
    });
  }
});

Here, for each bind value we have: "value", "key" (or name) and type. I imagine that storing those arrays into a single semantic attribute namespace could be helpful

@haddasbronfman
Copy link
Member Author

According to the suggestions that came up here, I changed the name to db.bind.values. Please approve @joaopgrassi joaopgrassi, @arminru, @Flarna, @blumamir

type: string[]
requirement_level: optional
brief: >
An array with the values of the SQL statement bind parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in this PR, the attribute can also be used for non-SQL statements. Do you think we should update the text accordingly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An array with the values of the SQL statement bind parameters.
An array with the values of the db statement bind parameters.

or:

Suggested change
An array with the values of the SQL statement bind parameters.
An array with the values of the db operation bind parameters.

?

@haddasbronfman haddasbronfman changed the title feat(db.values): add db.values to yaml feat(db.values): add db.bind.values to yaml Feb 7, 2023
@jsuereth
Copy link
Contributor

jsuereth commented Feb 7, 2023

I have a few concerns with this PR:

  1. It seems to only apply to positional templates. Specifically, something like SELECT * FROM my_table WHERE my_column = ? AND other_column = ?. What about "named" parameters where you have SELECT * FROM my_table WHERE my_column = ?some_key??
  2. Is it implicit that the "template" SQL is the sql statement or do we need to put that SQL statement in the same "bind" namespace?
  3. Should we put caveats about leaked PII or sensitive data in these fields and ask instrumentation others to provide on/off signals for this data? This is likely a more GENERAL concern for instrumentation that need to be handled outside the scope of this PR.

@haddasbronfman
Copy link
Member Author

  1. you're right. this PR covers only the first concern (a query with '?' in it). For the second concern (a query with "named" parameters) there is an issue one of my colleague opened for that: DB sanitization uniform format semantic-conventions#717.
    As far as I know, currently, we don't edit/change/sanitize queries. But for the case that a query comes with accompanying parameters - we want to let the user have the option to see them in the span.

  2. (If I understood you correctly) Yes. the "templated" query is in the db.statement. (e.g db.statement: select...where column = ?...). It won't be in the db.dinb namespace.

  3. As I mentioned in 1 - there is an issue in the specification that tries to answer this concert.

@blumamir
Copy link
Member

  1. you're right. this PR covers only the first concern (a query with '?' in it). For the second concern (a query with "named" parameters) there is an issue one of my colleague opened for that: DB sanitization uniform format #3150.

I support making this explicit in the current proposal (adding the text "An array with the values for a statement positional bind parameters."), or adding the db.bind.keys and describe the behavior for both "positional" and "named" parameters.

2. Is it implicit that the "template" SQL is the sql statement or do we need to put that SQL statement in the same "bind" namespace?

IMO, it might be confusing to have the statement in the db.bind. namespace. It could have been all hosted under the db.statement namespace, e.g. db.statmenet.values but then the db.statement itself cannot be used to store a string value which can be painful to change at this point.

@haddasbronfman
Copy link
Member Author

IMO, it might be confusing to have the statement in the db.bind. namespace. It could have been all hosted under the db.statement namespace, e.g. db.statmenet.values but then the db.statement itself cannot be used to store a string value which can be painful to change at this point.

I think it is better to keep db.statement as is because the overhead of changing its name can be very unexpected (although we change it only in the spec, It still appears in the SDKs, and docs... it would take a long time to rename it).

@haddasbronfman
Copy link
Member Author

@jsuereth Consulting with Amir we thought we should clearly write that the db.bind.values is only for positional templates. What about one of the following options? :

  • "An array with the values of the SQL statement positional bind parameters"
  • "An array with the values of the SQL statement bind parameters. relevant/applies only to positional templates.

@joaopgrassi joaopgrassi dismissed their stale review February 20, 2023 15:16

Given the discussion/direction changed, I'm removing my approval for now

@github-actions
Copy link

github-actions bot commented Mar 7, 2023

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 7, 2023
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add attribute db.values to database semantic conventions
7 participants