-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial elasticsearch-py instrumentation support, alternative implementation #191
Conversation
params = kwargs.pop('params', {}) | ||
cls_name, method_name = method.split('.', 1) | ||
body_pos = (self.body_positions['all'].get(method_name) or | ||
self.body_positions[self.version].get(method_name) or None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuation line over-indented for visual indent
|
||
import elasticapm | ||
from elasticapm.instrumentation.packages.base import AbstractInstrumentedModule | ||
from elasticapm.utils import compat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'elasticapm.utils.compat' imported but unused
d663b91
to
f094f4f
Compare
One possible solution for the "
But since the transport has a pool of connections, we'd need to be reasonably sure that if one connection is instrumented, all are, and I'm not sure we can make that assumption. |
eb5b6c9
to
fa0bfd2
Compare
fa0bfd2
to
f8d6f34
Compare
Just as an update to the above stated problem: since every |
# user can see it. | ||
if 'q' in params: | ||
# 'q' is already encoded to a byte string at this point | ||
query.append('q=' + params['q'].decode('utf-8')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably very edge case but do query params have to be utf-8 encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are encoded to utf8 here:
https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/utils.py#L34-L38
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if it's not already encoded though - https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/utils.py#L29-L31
>>> elasticsearch.client.utils._escape(u"\U0001f3d6")
b'\xf0\x9f\x8f\x96'
>>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16'))
b'\xff\xfe<\xd8\xd6\xdf'
>>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16')).decode('utf-8')
UnicodeDecodeError
Not sure if this is worth a lot of effort to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, right. I guess we at least can catch the UnicodeDecodeError
if it happens
context['db']['statement'] = '\n\n'.join(query) | ||
if api_method == 'Elasticsearch.update': | ||
if isinstance(body, dict) and 'script' in body: | ||
context['db']['statement'] = json.dumps(body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this grab just the script
here? Seems care is taken to avoid capturing doc
in the partial updated, but as is this would capture scripted upsert docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good catch!
if isinstance(body, dict) and 'query' in body: | ||
query.append(json.dumps(body['query'])) | ||
context['db']['statement'] = '\n\n'.join(query) | ||
if api_method == 'Elasticsearch.update': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif
?
elif api_method == 'Elasticsearch.update': | ||
if isinstance(body, dict) and 'script' in body: | ||
# only get the `script` field from the body | ||
context['db']['statement'] = json.dumps({'script': body['script']}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple spaces after operator
This adds instrumentation for an initial set of methods of the elasticsearch client library, as well as matrix tests for version 2, 5 and 6. This is an alternative implementation to elastic#169. It moves most of the work onto a wrapper of `Connection.perform_request`. This has several benefits: * we instrument everything by default, and can do more specific instrumentations for other some API methods (e.g. capture the query in the body for `search`) * we know which specific ES instance we connect to * if connections get retried, they appear as separate spans (more smartness, like storing retry information on the span will come later) There is also one draw back: if we manage to instrument the Elasticsearch object, but not the pooled Connection object, we could cause errors, because the former instrumentation adds stuff into the `params` dict that the latter instrumentation removes again. If the removal doesn't happen, things get messy. I haven't found a workaround for this yet.
This adds instrumentation for an initial set of methods of the elasticsearch client library, as well as matrix tests for version 2, 5 and 6. This is an alternative implementation to #169. It moves most of the work onto a wrapper of `Connection.perform_request`. This has several benefits: * we instrument everything by default, and can do more specific instrumentations for other some API methods (e.g. capture the query in the body for `search`) * we know which specific ES instance we connect to * if connections get retried, they appear as separate spans (more smartness, like storing retry information on the span will come later) closes #191
This adds instrumentation for an initial set of methods of the
elasticsearch client library, as well as matrix tests for version
2, 5 and 6.
This is an alternative implementation to #169. It moves most of the
work onto a wrapper of
Connection.perform_request
. This has severalbenefits:
for other some API methods (e.g. capture the query in the body for
search
)storing retry information on the span will come later)
There is also one draw back: if we manage to instrument the Elasticsearch object,
but not the pooled Connection object, we could cause errors, because the former
instrumentation adds stuff into the
params
dict that the latter instrumentationremoves again. If the removal doesn't happen, things get messy. I haven't found
a workaround for this yet.