Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mysql2 adapter #278

Merged
merged 11 commits into from Jul 1, 2020
Merged

Conversation

ericmustin
Copy link
Contributor

@ericmustin ericmustin commented Jun 16, 2020

Summary

This PR addresses #243 and adds a Mysql2 adapter.

The instrumentation is largely a port from dd-trace-rb, found here, with a few modifications to account for opentelemetry span conventions.

Notes / Open Questions

  • For the actual adapter, I think things are pretty straightforward, but as this is my first adapter, any feedback is appreciated. Please let me know if I've missed any conventions specific to otel, I tried to follow the conventions of the other adapters.
  • It's worth pointing out that, as this is the first instrumentation collecting SQL queries, there's potentially going to be PII that get's extracted here. Either obfuscating at the adapter level, or, making clear how to obfuscate at the collector level (Perhaps documenting where to add in the regex, or provide a sample regex to use), should be considered from a user experience perspective.

I do have some more open ended questions on the test suite though:

  • The tests are relatively straightforward but rely on a containerised environment that also has a dependancy on a separate container running a basic mysql image to test against.
  • This is a new convention in the test suite, as every other integration has been able to rely on full featured mocking libraries like webmock or fakeredis.
  • This also required an additional container for testing the adapter added in the docker-compose.yml, as well as a minor modification to the base app's Dockerfile to add support for mysql.
  • To run the tests for mysql2
# To run tests:
# 1. Build the opentelemetry/opentelemetry-ruby image
# - docker-compose build
# 2. Bundle install
# - docker-compose run ex-adapter-mysql2-test bundle install
# 3. Run test suite
# - docker-compose run ex-adapter-mysql2-test bundle exec rake test
  • I tried to modify the circeci config as well but have almost surely botched it, so I assume the CI will probably fail, as I wasn't able to test circleci version 2.1 locally, and got a bit bogged down trying to test with a comparable 2.0 format. (Update: Yes, i totally botched it, but after upgrading my circleci-cli version, I can at least test v2.1 configs now. reverted the circleci config changes, so the CI test suite is currently not testing the above referenced tests)

  • Overall, switching over to using containers for integrations to test against vs writing mocks may not be worth the pain here, but think has some benefits as well (as mocking out some more obtuse instrumentations could become difficult and time consuming). Happy to move back to mocks here if that's preferred, just let me know

@fbogsany
Copy link
Contributor

Overall, switching over to using containers for integrations to test against vs writing mocks may not be worth the pain here, but think has some benefits as well (as mocking out some more obtuse instrumentations could become difficult and time consuming). Happy to move back to mocks here if that's preferred, just let me know

I am 👍 👍 👍 for use of containers for integration tests rather than mocking.

response = super(sql, options)
end

response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local response doesn’t really buy us anything here. tracer.in_span(...) do ... end will return the result of the block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 makes sense, updated

# https://github.com/open-telemetry/opentelemetry-python/blob/39fa078312e6f41c403aa8cad1868264011f7546/ext/opentelemetry-ext-dbapi/tests/test_dbapi_integration.py#L53
# This would create span names like mysql.default, mysql.replica, postgresql.staging etc etc
database_name ? "mysql.#{database_name}" : 'mysql'
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic conventions for databases states:

Span name should be set to low cardinality value representing the statement executed on the database. It may be stored procedure name (without argument), sql statement without variable arguments, etc. When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.

Do we have any way to get a meaningful, low-cardinality representation of the statement at this point? E.g. “stored procedure name (without argument), sql statement without variable arguments”. I think the answer is “no”, in which case what you have seems fine, but I’d love to be proven wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, so i poked around the mysql2 codebase a good amount here, and also looked into how some similar open-telemetry-python instrumentations were doing things, and it seems like the answer is no. That being said I am very far from a SQL expert, so i'd be curious to know how others are handling this. Is it just that everyone is breaking spec? Additionally, how are other's handling the db.statement sql obfuscation, is it all at the collector level? If there's an example of obfuscation of db.statement done at the tracer adapter level, perhaps we could leverage that approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are other's handling the db.statement sql obfuscation

I don’t think anyone is. We (Shopify) dug into this quite a bit a few years ago and concluded we’d have to add a SQL parser to sanitize or obfuscate the query. We redact the db.statement attribute from all spans in the collector — there is no obfuscation support specific to SQL queries there — but I don’t know what others are doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. From the perspective of where this instrumentation is being ported from (dd-trace-rb), the datadog-agent handles the obfuscation, which makes low cardinality representations of sql possible without adding that work into the tracing client code. In this case, I don't believe it's possible to set span.name to a representation of sql.

Copy link
Member

@mwear mwear Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use the raw sql as a span name, but I join you all in not knowing the best way to deal with this. The spec says:

When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.

However, that doesn't make this instrumentation very useful it all it does is create a span with the database instance information. I'd like to see if I can find any other SIGs who have encountered this problem, but haven't yet. I'll look / ask around a bit more.

One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.

Coincidentally, we just made a very similar change to our Shopify-internal tracing gem. It seems like a reasonable path forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i agree that MySQL <INSERT/SELECT/DELETE/etc> seems reasonable enough. I can think of some very inefficient ways to do this but going to look to see if this pattern is being used in other otel repo's so that I don't have to re-invent the wheel here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our code for this identifies specific patterns:

        QUERY_NAMES = [
          "set names",
          "select",
          "insert",
          "update",
          "delete",
          "begin",
          "commit",
          "rollback",
          "savepoint",
          "release savepoint",
          "explain",
          "drop database",
          "drop table",
          "create database",
          "create table",
        ].freeze

        QUERY_NAME_RE = Regexp.new("^(#{QUERY_NAMES.join('|')})", Regexp::IGNORECASE)

...
          QUERY_NAME_RE.match(sql) { |match| match[1].downcase } unless sql.nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's extremely helpful, thank you. I've updated the PR and tests accordingly

@ericmustin
Copy link
Contributor Author

To Update re: testing. I've added to the circleci config a rake-test-appraisal-container-ints command for running integration tests that require a mysql instance, added that command + relevant images/env vars to the ruby 2.5 and 2.6 jobs/executors...It's running fine/tests(including mysql2) are passing in CI but perhaps there's a cleaner way to approach it.

Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks really good here @ericmustin. We need to figure out what to do about the span name. Let's discuss this at the SIG meeting tomorrow.

# https://github.com/open-telemetry/opentelemetry-python/blob/39fa078312e6f41c403aa8cad1868264011f7546/ext/opentelemetry-ext-dbapi/tests/test_dbapi_integration.py#L53
# This would create span names like mysql.default, mysql.replica, postgresql.staging etc etc
database_name ? "mysql.#{database_name}" : 'mysql'
end
Copy link
Member

@mwear mwear Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use the raw sql as a span name, but I join you all in not knowing the best way to deal with this. The spec says:

When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.

However, that doesn't make this instrumentation very useful it all it does is create a span with the database instance information. I'd like to see if I can find any other SIGs who have encountered this problem, but haven't yet. I'll look / ask around a bit more.

One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.

Copy link
Contributor

@fbogsany fbogsany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @ericmustin !

@fbogsany fbogsany linked an issue Jun 29, 2020 that may be closed by this pull request
Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Thanks @ericmustin!

@fbogsany fbogsany merged commit d90adac into open-telemetry:master Jul 1, 2020
@robertlaurin robertlaurin deleted the add_mysql2_adapter branch May 29, 2023 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mysql2 instrumentation adapter
3 participants