initial elasticsearch-py instrumentation support, alternative implementation #191

beniwohli · 2018-04-06T15:59:51Z

This adds instrumentation for an initial set of methods of the
elasticsearch client library, as well as matrix tests for version
2, 5 and 6.

This is an alternative implementation to #169. It moves most of the
work onto a wrapper of Connection.perform_request. This has several
benefits:

we instrument everything by default, and can do more specific instrumentations
for other some API methods (e.g. capture the query in the body for search)
we know which specific ES instance we connect to
if connections get retried, they appear as separate spans (more smartness, like
storing retry information on the span will come later)

There is also one draw back: if we manage to instrument the Elasticsearch object,
but not the pooled Connection object, we could cause errors, because the former
instrumentation adds stuff into the params dict that the latter instrumentation
removes again. If the removal doesn't happen, things get messy. I haven't found
a workaround for this yet.

houndci-bot · 2018-04-06T16:00:02Z

elasticapm/instrumentation/packages/elasticsearch.py

+        params = kwargs.pop('params', {})
+        cls_name, method_name = method.split('.', 1)
+        body_pos = (self.body_positions['all'].get(method_name) or
+                           self.body_positions[self.version].get(method_name) or None)


continuation line over-indented for visual indent

houndci-bot · 2018-04-06T16:00:02Z

elasticapm/instrumentation/packages/elasticsearch.py

+
+import elasticapm
+from elasticapm.instrumentation.packages.base import AbstractInstrumentedModule
+from elasticapm.utils import compat


'elasticapm.utils.compat' imported but unused

beniwohli · 2018-04-09T14:49:47Z

One possible solution for the "Elasticsearch object is instrumented, but not the Connection object" would be

hasattr(instance.transport.get_connection(), '__wrapped__')

But since the transport has a pool of connections, we'd need to be reasonably sure that if one connection is instrumented, all are, and I'm not sure we can make that assumption.

beniwohli · 2018-05-07T11:28:33Z

Just as an update to the above stated problem: since every Elasticsearch instance creates its own Connection object, and they aren't shared among Elasticsearch instances, this shouldn't be a problem. Either both of them are instrumented, or neither.

graphaelli · 2018-05-07T18:26:26Z

elasticapm/instrumentation/packages/elasticsearch.py

+            # user can see it.
+            if 'q' in params:
+                # 'q' is already encoded to a byte string at this point
+                query.append('q=' + params['q'].decode('utf-8'))


probably very edge case but do query params have to be utf-8 encoded?

Yes, they are encoded to utf8 here:

https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/utils.py#L34-L38

Only if it's not already encoded though - https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/utils.py#L29-L31

>>> elasticsearch.client.utils._escape(u"\U0001f3d6") b'\xf0\x9f\x8f\x96' >>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16')) b'\xff\xfe<\xd8\xd6\xdf' >>> elasticsearch.client.utils._escape(u"\U0001f3d6".encode('utf-16')).decode('utf-8') UnicodeDecodeError

Not sure if this is worth a lot of effort to handle.

ugh, right. I guess we at least can catch the UnicodeDecodeError if it happens

graphaelli · 2018-05-07T18:45:26Z

elasticapm/instrumentation/packages/elasticsearch.py

+            context['db']['statement'] = '\n\n'.join(query)
+        if api_method == 'Elasticsearch.update':
+            if isinstance(body, dict) and 'script' in body:
+                context['db']['statement'] = json.dumps(body)


Why doesn't this grab just the script here? Seems care is taken to avoid capturing doc in the partial updated, but as is this would capture scripted upsert docs?

yes, good catch!

graphaelli · 2018-05-07T18:47:54Z

elasticapm/instrumentation/packages/elasticsearch.py

+            if isinstance(body, dict) and 'query' in body:
+                query.append(json.dumps(body['query']))
+            context['db']['statement'] = '\n\n'.join(query)
+        if api_method == 'Elasticsearch.update':


houndci-bot · 2018-05-08T07:38:42Z

elasticapm/instrumentation/packages/elasticsearch.py

+        elif api_method == 'Elasticsearch.update':
+            if isinstance(body, dict) and 'script' in body:
+                # only get the `script` field from the body
+                context['db']['statement'] =  json.dumps({'script': body['script']})


multiple spaces after operator

This adds instrumentation for an initial set of methods of the elasticsearch client library, as well as matrix tests for version 2, 5 and 6. This is an alternative implementation to elastic#169. It moves most of the work onto a wrapper of `Connection.perform_request`. This has several benefits: * we instrument everything by default, and can do more specific instrumentations for other some API methods (e.g. capture the query in the body for `search`) * we know which specific ES instance we connect to * if connections get retried, they appear as separate spans (more smartness, like storing retry information on the span will come later) There is also one draw back: if we manage to instrument the Elasticsearch object, but not the pooled Connection object, we could cause errors, because the former instrumentation adds stuff into the `params` dict that the latter instrumentation removes again. If the removal doesn't happen, things get messy. I haven't found a workaround for this yet.

This adds instrumentation for an initial set of methods of the elasticsearch client library, as well as matrix tests for version 2, 5 and 6. This is an alternative implementation to #169. It moves most of the work onto a wrapper of `Connection.perform_request`. This has several benefits: * we instrument everything by default, and can do more specific instrumentations for other some API methods (e.g. capture the query in the body for `search`) * we know which specific ES instance we connect to * if connections get retried, they appear as separate spans (more smartness, like storing retry information on the span will come later) closes #191

houndci-bot reviewed Apr 6, 2018

View reviewed changes

beniwohli changed the title ~~initial elasticsearch-py instrumentation support~~ initial elasticsearch-py instrumentation support, alternative implementation Apr 6, 2018

beniwohli force-pushed the es-support-alt branch from d663b91 to f094f4f Compare April 9, 2018 08:07

beniwohli added the [zube]: In Progress label Apr 11, 2018

beniwohli force-pushed the es-support-alt branch 3 times, most recently from eb5b6c9 to fa0bfd2 Compare April 13, 2018 09:46

alvarolobato assigned beniwohli Apr 17, 2018

beniwohli force-pushed the es-support-alt branch from fa0bfd2 to f8d6f34 Compare April 26, 2018 13:47

beniwohli force-pushed the es-support-alt branch from f8d6f34 to 14ff426 Compare May 7, 2018 11:05

beniwohli mentioned this pull request May 7, 2018

initial elasticsearch-py instrumentation support #169

Closed

graphaelli reviewed May 7, 2018

View reviewed changes

houndci-bot reviewed May 8, 2018

View reviewed changes

beniwohli force-pushed the es-support-alt branch from 42b6cbf to 70d97f2 Compare May 8, 2018 07:39

beniwohli mentioned this pull request May 8, 2018

(very) limited first stab at elasticsearch-py instrumentation [WIP] #131

Closed

alvarolobato added [zube]: In Review and removed [zube]: In Progress labels May 29, 2018

beniwohli added 4 commits June 4, 2018 11:15

allow running of specific parts of the test suite

3626c35

wait for elasticsearch to boot up

523d384

copy params in case the caller reuses it for some reason

c1dc7d0

beniwohli force-pushed the es-support-alt branch from 70d97f2 to c90f9ad Compare June 4, 2018 09:16

fixes from @graphaelli's review

a92f412

beniwohli force-pushed the es-support-alt branch from c90f9ad to a92f412 Compare June 4, 2018 09:57

beniwohli closed this in c5650af Jun 4, 2018

zube bot added [zube]: Done and removed [zube]: In Review labels Jun 4, 2018

beniwohli deleted the es-support-alt branch June 4, 2018 12:25

alvarolobato removed the [zube]: Done label Jun 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial elasticsearch-py instrumentation support, alternative implementation #191

initial elasticsearch-py instrumentation support, alternative implementation #191

beniwohli commented Apr 6, 2018 •

edited by alvarolobato

Loading

houndci-bot Apr 6, 2018

houndci-bot Apr 6, 2018

beniwohli commented Apr 9, 2018

beniwohli commented May 7, 2018

graphaelli May 7, 2018

beniwohli May 8, 2018

graphaelli May 8, 2018

beniwohli May 8, 2018

graphaelli May 7, 2018

beniwohli May 8, 2018

graphaelli May 7, 2018

houndci-bot May 8, 2018

initial elasticsearch-py instrumentation support, alternative implementation #191

initial elasticsearch-py instrumentation support, alternative implementation #191

Conversation

beniwohli commented Apr 6, 2018 • edited by alvarolobato Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beniwohli commented Apr 9, 2018

beniwohli commented May 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beniwohli commented Apr 6, 2018 •

edited by alvarolobato

Loading