Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial elasticsearch-py instrumentation support, alternative implementation #191

Closed
wants to merge 5 commits into from

Conversation

beniwohli
Copy link
Contributor

@beniwohli beniwohli commented Apr 6, 2018

This adds instrumentation for an initial set of methods of the
elasticsearch client library, as well as matrix tests for version
2, 5 and 6.

This is an alternative implementation to #169. It moves most of the
work onto a wrapper of Connection.perform_request. This has several
benefits:

  • we instrument everything by default, and can do more specific instrumentations
    for other some API methods (e.g. capture the query in the body for search)
  • we know which specific ES instance we connect to
  • if connections get retried, they appear as separate spans (more smartness, like
    storing retry information on the span will come later)

There is also one draw back: if we manage to instrument the Elasticsearch object,
but not the pooled Connection object, we could cause errors, because the former
instrumentation adds stuff into the params dict that the latter instrumentation
removes again. If the removal doesn't happen, things get messy. I haven't found
a workaround for this yet.

params = kwargs.pop('params', {})
cls_name, method_name = method.split('.', 1)
body_pos = (self.body_positions['all'].get(method_name) or
self.body_positions[self.version].get(method_name) or None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuation line over-indented for visual indent


import elasticapm
from elasticapm.instrumentation.packages.base import AbstractInstrumentedModule
from elasticapm.utils import compat

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'elasticapm.utils.compat' imported but unused

@beniwohli beniwohli changed the title initial elasticsearch-py instrumentation support initial elasticsearch-py instrumentation support, alternative implementation Apr 6, 2018
@beniwohli
Copy link
Contributor Author

One possible solution for the "Elasticsearch object is instrumented, but not the Connection object" would be

hasattr(instance.transport.get_connection(), '__wrapped__')

But since the transport has a pool of connections, we'd need to be reasonably sure that if one connection is instrumented, all are, and I'm not sure we can make that assumption.

@beniwohli
Copy link
Contributor Author

Just as an update to the above stated problem: since every Elasticsearch instance creates its own Connection object, and they aren't shared among Elasticsearch instances, this shouldn't be a problem. Either both of them are instrumented, or neither.

# user can see it.
if 'q' in params:
# 'q' is already encoded to a byte string at this point
query.append('q=' + params['q'].decode('utf-8'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably very edge case but do query params have to be utf-8 encoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if it's not already encoded though - https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/utils.py#L29-L31

>>> elasticsearch.client.utils._escape(u"\U0001f3d6")
b'\xf0\x9f\x8f\x96'

>>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16'))
b'\xff\xfe<\xd8\xd6\xdf'


>>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16')).decode('utf-8')
UnicodeDecodeError

Not sure if this is worth a lot of effort to handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, right. I guess we at least can catch the UnicodeDecodeError if it happens

context['db']['statement'] = '\n\n'.join(query)
if api_method == 'Elasticsearch.update':
if isinstance(body, dict) and 'script' in body:
context['db']['statement'] = json.dumps(body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this grab just the script here? Seems care is taken to avoid capturing doc in the partial updated, but as is this would capture scripted upsert docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good catch!

if isinstance(body, dict) and 'query' in body:
query.append(json.dumps(body['query']))
context['db']['statement'] = '\n\n'.join(query)
if api_method == 'Elasticsearch.update':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif ?

elif api_method == 'Elasticsearch.update':
if isinstance(body, dict) and 'script' in body:
# only get the `script` field from the body
context['db']['statement'] = json.dumps({'script': body['script']})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple spaces after operator

This adds instrumentation for an initial set of methods of the
elasticsearch client library, as well as matrix tests for version
2, 5 and 6.

This is an alternative implementation to elastic#169. It moves most of the
work onto a wrapper of `Connection.perform_request`. This has several
benefits:

 * we instrument everything by default, and can do more specific instrumentations
   for other some API methods (e.g. capture the query in the body for `search`)
 * we know which specific ES instance we connect to
 * if connections get retried, they appear as separate spans (more smartness, like
   storing retry information on the span will come later)

There is also one draw back: if we manage to instrument the Elasticsearch object,
but not the pooled Connection object, we could cause errors, because the former
instrumentation adds stuff into the `params` dict that the latter instrumentation
removes again. If the removal doesn't happen, things get messy. I haven't found
a workaround for this yet.
beniwohli added a commit that referenced this pull request Jun 4, 2018
This adds instrumentation for an initial set of methods of the
elasticsearch client library, as well as matrix tests for version
2, 5 and 6.

This is an alternative implementation to #169. It moves most of the
work onto a wrapper of `Connection.perform_request`. This has several
benefits:

 * we instrument everything by default, and can do more specific instrumentations
   for other some API methods (e.g. capture the query in the body for `search`)
 * we know which specific ES instance we connect to
 * if connections get retried, they appear as separate spans (more smartness, like
   storing retry information on the span will come later)

closes #191
@beniwohli beniwohli deleted the es-support-alt branch June 4, 2018 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants