Add mysql2 adapter #278

ericmustin · 2020-06-16T23:08:26Z

Summary

This PR addresses #243 and adds a Mysql2 adapter.

The instrumentation is largely a port from dd-trace-rb, found here, with a few modifications to account for opentelemetry span conventions.

Notes / Open Questions

For the actual adapter, I think things are pretty straightforward, but as this is my first adapter, any feedback is appreciated. Please let me know if I've missed any conventions specific to otel, I tried to follow the conventions of the other adapters.
It's worth pointing out that, as this is the first instrumentation collecting SQL queries, there's potentially going to be PII that get's extracted here. Either obfuscating at the adapter level, or, making clear how to obfuscate at the collector level (Perhaps documenting where to add in the regex, or provide a sample regex to use), should be considered from a user experience perspective.

I do have some more open ended questions on the test suite though:

The tests are relatively straightforward but rely on a containerised environment that also has a dependancy on a separate container running a basic mysql image to test against.
This is a new convention in the test suite, as every other integration has been able to rely on full featured mocking libraries like webmock or fakeredis.
This also required an additional container for testing the adapter added in the docker-compose.yml, as well as a minor modification to the base app's Dockerfile to add support for mysql.
To run the tests for mysql2

# To run tests:
# 1. Build the opentelemetry/opentelemetry-ruby image
# - docker-compose build
# 2. Bundle install
# - docker-compose run ex-adapter-mysql2-test bundle install
# 3. Run test suite
# - docker-compose run ex-adapter-mysql2-test bundle exec rake test

I tried to modify the circeci config as well but have almost surely botched it, so I assume the CI will probably fail, as ~~I wasn't able to test circleci version 2.1 locally, and got a bit bogged down trying to test with a comparable 2.0 format.~~ (Update: Yes, i totally botched it, but after upgrading my circleci-cli version, I can at least test v2.1 configs now. reverted the circleci config changes, so the CI test suite is currently not testing the above referenced tests)
Overall, switching over to using containers for integrations to test against vs writing mocks may not be worth the pain here, but think has some benefits as well (as mocking out some more obtuse instrumentations could become difficult and time consuming). Happy to move back to mocks here if that's preferred, just let me know

…m and test suite

fbogsany · 2020-06-17T03:15:20Z

Overall, switching over to using containers for integrations to test against vs writing mocks may not be worth the pain here, but think has some benefits as well (as mocking out some more obtuse instrumentations could become difficult and time consuming). Happy to move back to mocks here if that's preferred, just let me know

I am 👍 👍 👍 for use of containers for integration tests rather than mocking.

fbogsany · 2020-06-17T03:22:22Z

adapters/mysql2/lib/opentelemetry/adapters/mysql2/patches/client.rb

+              response = super(sql, options)
+            end
+
+            response


The local response doesn’t really buy us anything here. tracer.in_span(...) do ... end will return the result of the block.

👍 makes sense, updated

fbogsany · 2020-06-17T03:28:27Z

adapters/mysql2/lib/opentelemetry/adapters/mysql2/patches/client.rb

+            # https://github.com/open-telemetry/opentelemetry-python/blob/39fa078312e6f41c403aa8cad1868264011f7546/ext/opentelemetry-ext-dbapi/tests/test_dbapi_integration.py#L53
+            # This would create span names like mysql.default, mysql.replica, postgresql.staging etc etc
+            database_name ? "mysql.#{database_name}" : 'mysql'
+          end


The semantic conventions for databases states:

Span name should be set to low cardinality value representing the statement executed on the database. It may be stored procedure name (without argument), sql statement without variable arguments, etc. When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.

Do we have any way to get a meaningful, low-cardinality representation of the statement at this point? E.g. “stored procedure name (without argument), sql statement without variable arguments”. I think the answer is “no”, in which case what you have seems fine, but I’d love to be proven wrong.

yea, so i poked around the mysql2 codebase a good amount here, and also looked into how some similar open-telemetry-python instrumentations were doing things, and it seems like the answer is no. That being said I am very far from a SQL expert, so i'd be curious to know how others are handling this. Is it just that everyone is breaking spec? Additionally, how are other's handling the db.statement sql obfuscation, is it all at the collector level? If there's an example of obfuscation of db.statement done at the tracer adapter level, perhaps we could leverage that approach.

how are other's handling the db.statement sql obfuscation

I don’t think anyone is. We (Shopify) dug into this quite a bit a few years ago and concluded we’d have to add a SQL parser to sanitize or obfuscate the query. We redact the db.statement attribute from all spans in the collector — there is no obfuscation support specific to SQL queries there — but I don’t know what others are doing.

I see, that makes sense. From the perspective of where this instrumentation is being ported from (dd-trace-rb), the datadog-agent handles the obfuscation, which makes low cardinality representations of sql possible without adding that work into the tracing client code. In this case, I don't believe it's possible to set span.name to a representation of sql.

I don't think we can use the raw sql as a span name, but I join you all in not knowing the best way to deal with this. The spec says:

When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.

However, that doesn't make this instrumentation very useful it all it does is create a span with the database instance information. I'd like to see if I can find any other SIGs who have encountered this problem, but haven't yet. I'll look / ask around a bit more.

One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.

One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.

Coincidentally, we just made a very similar change to our Shopify-internal tracing gem. It seems like a reasonable path forward.

sure, i agree that MySQL <INSERT/SELECT/DELETE/etc> seems reasonable enough. I can think of some very inefficient ways to do this but going to look to see if this pattern is being used in other otel repo's so that I don't have to re-invent the wheel here

Our code for this identifies specific patterns:

QUERY_NAMES = [ "set names", "select", "insert", "update", "delete", "begin", "commit", "rollback", "savepoint", "release savepoint", "explain", "drop database", "drop table", "create database", "create table", ].freeze QUERY_NAME_RE = Regexp.new("^(#{QUERY_NAMES.join('|')})", Regexp::IGNORECASE) ... QUERY_NAME_RE.match(sql) { |match| match[1].downcase } unless sql.nil?

that's extremely helpful, thank you. I've updated the PR and tests accordingly

…s feedback

ericmustin · 2020-06-17T09:45:41Z

To Update re: testing. I've added to the circleci config a rake-test-appraisal-container-ints command for running integration tests that require a mysql instance, added that command + relevant images/env vars to the ruby 2.5 and 2.6 jobs/executors...It's running fine/tests(including mysql2) are passing in CI but perhaps there's a cleaner way to approach it.

mwear

Everything looks really good here @ericmustin. We need to figure out what to do about the span name. Let's discuss this at the SIG meeting tomorrow.

mwear · 2020-06-24T23:27:16Z

adapters/mysql2/lib/opentelemetry/adapters/mysql2/patches/client.rb

+            # https://github.com/open-telemetry/opentelemetry-python/blob/39fa078312e6f41c403aa8cad1868264011f7546/ext/opentelemetry-ext-dbapi/tests/test_dbapi_integration.py#L53
+            # This would create span names like mysql.default, mysql.replica, postgresql.staging etc etc
+            database_name ? "mysql.#{database_name}" : 'mysql'
+          end


I don't think we can use the raw sql as a span name, but I join you all in not knowing the best way to deal with this. The spec says:

When it's impossible to get any meaningful representation of the span name, it can be populated using the same value as db.instance.

However, that doesn't make this instrumentation very useful it all it does is create a span with the database instance information. I'd like to see if I can find any other SIGs who have encountered this problem, but haven't yet. I'll look / ask around a bit more.

One possible middle ground I can think of, is that we could try to identify the statement type and use it for the span name. That would give us span names such as "MySQL SELECT", "MySQL INSERT", etc. This would be similar to http client instrumentation using the verb when it can't derive a low cardinality name. We should discuss if this idea has any merit before moving forward with it.

fbogsany

LGTM - thanks @ericmustin !

mwear

Nice work! Thanks @ericmustin!

ericmustin added 5 commits June 16, 2020 12:30

[adapters-mysql2]: add initial adapter patching and wireframing of ge…

fbdeb8a

…m and test suite

[adapters-mysql2]: update naming convention

42ad625

[adapters-mysql2]: add test container and specs

05e3c1f

[adapters-mysql2]: add test suite and test container setup

aaa6077

[adapters-mysql2]: try to get circleci running

05036e5

ericmustin requested review from bai, dazuma, elskwid, fbogsany, luvtechno and mwear as code owners June 16, 2020 23:08

[adapters-mysql2]: revert circleci changes

e29e274

fbogsany reviewed Jun 17, 2020

View reviewed changes

ericmustin added 2 commits June 17, 2020 11:17

[adapters-mysql2]: add circleci container tests for mysql2 and addres…

7f1b94c

…s feedback

[adapters-mysql2]: forgot to add circli changes

bb07629

mwear reviewed Jun 24, 2020

View reviewed changes

[adapters-mysql2]: add statement type extraction with tests

7741233

fbogsany approved these changes Jun 29, 2020

View reviewed changes

fbogsany linked an issue Jun 29, 2020 that may be closed by this pull request

mysql2 instrumentation adapter #243

Closed

mwear approved these changes Jun 30, 2020

View reviewed changes

mwear and others added 2 commits June 30, 2020 13:17

Merge branch 'master' into add_mysql2_adapter

0caadab

Merge branch 'master' into add_mysql2_adapter

d307e92

fbogsany merged commit d90adac into open-telemetry:master Jul 1, 2020

robertlaurin deleted the add_mysql2_adapter branch May 29, 2023 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mysql2 adapter #278

Add mysql2 adapter #278

ericmustin commented Jun 16, 2020 •

edited

fbogsany commented Jun 17, 2020

fbogsany Jun 17, 2020

ericmustin Jun 17, 2020

fbogsany Jun 17, 2020

ericmustin Jun 17, 2020

fbogsany Jun 17, 2020

ericmustin Jun 17, 2020

mwear Jun 24, 2020 •

edited

fbogsany Jun 25, 2020

ericmustin Jun 25, 2020 •

edited

fbogsany Jun 25, 2020

ericmustin Jun 26, 2020

ericmustin commented Jun 17, 2020

mwear left a comment

mwear Jun 24, 2020 •

edited

fbogsany left a comment

mwear left a comment

Add mysql2 adapter #278

Add mysql2 adapter #278

Conversation

ericmustin commented Jun 16, 2020 • edited

Summary

Notes / Open Questions

fbogsany commented Jun 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwear Jun 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericmustin Jun 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericmustin commented Jun 17, 2020

mwear left a comment

Choose a reason for hiding this comment

mwear Jun 24, 2020 • edited

Choose a reason for hiding this comment

fbogsany left a comment

Choose a reason for hiding this comment

mwear left a comment

Choose a reason for hiding this comment

ericmustin commented Jun 16, 2020 •

edited

mwear Jun 24, 2020 •

edited

ericmustin Jun 25, 2020 •

edited

mwear Jun 24, 2020 •

edited