Distributed Tracing with APM server; with Python #712

nerusnayleinad · 2020-02-04T18:44:51Z

I've been doing POC tests on different Tracing technologies (Jaeger, Zipkin, Stackdriver trace, Istio (still Jaeger or Zipkin. Different concepts though)), and now I am with Elasticsearch APM module, and I see it is more or less the same concept. You start counting a trace when you start a request, and end it when you get a response.

I've generated some traces, and am able to see them on Kibana, but I see the traces separately, which makes sense, as every time; in each service I initialize a new Tracer object, and this gets a new ID.

Now, when I want to see the cascade view of several service spans, or several spans of the same service, I pass the Trace ID to the next service, so this one will initialize the tracer with this ID, and will generate a new Span ID, and attach to the same trace.

I've been reading the docs, for Python, and the only method that suits this is elasticapm.set_context(), but everything I found in the docs is this:

def set_context(data, key="custom"):
"""
Attach contextual data to the current transaction and errors that happen during the current transaction.
If the transaction is not sampled, this function becomes a no-op.
:param data: a dictionary, or a callable that returns a dictionary
:param key: the namespace for this data
"""
...

I would like to know if this is the right way of doing this, or I am completely off the track.

The text was updated successfully, but these errors were encountered:

basepi · 2020-02-04T19:10:38Z

The agent actually doesn't do anything with context that you set -- set_custom_context just allows you to add your own contextual information about a transaction, for later searching or aggregation or just for your information when inspecting a transaction in kibana.

For distributed tracing, the standard way to combine traces across different services is via headers. Generally, you shouldn't have to worry about this, as we do it automatically. For example, in Python, we instrument all the major request libraries (such as requests and urllib3) to automatically add the correct headers to outbound requests. The assumption is that the receiving service will (hopefully) have the APM agent installed, and use those headers to tie the transaction to the parent transaction.

Additionally, when the python agent receives a request it looks for those headers and uses them to tie to the parent transaction.

Part of the problem may be that it sounds like you're instantiating your own Tracer objects. This isn't the standard way to use our agent. If you're using a supported framework then check out the documentation for that framework to get the agent set up. But even if you aren't using a supported framework, the established way to manually create transactions is documented here.

Keep me posted if you have more questions. And welcome to the community!

nerusnayleinad · 2020-02-05T12:04:38Z

Thank you.

I tried for frameworks as well, but here was trying the one without.

I ran again the test. It consists of a service that makes a GET requests to two different endpoints, these two services make another GET request to another service that returns 'hello world'. Something like this:

When I check the traces in Kibana, this is what I see:

Note: I named all services the same, so now the traces appear separately, under transaction type.

If I click on main, which is service 1 in my drawing, this is what I see:

As you can see the two requests appear there, but not the final request. In order to see the final request, I have to select one of the service 2. If I do, I do see the final request that returns hello world.

So, what's the way to see all the spans under the same trace, under main for example.

beniwohli · 2020-02-05T14:18:16Z

@nerusnayleinad can you try to use different service names for the three (or four, not sure what the difference between 2a and 2b is)? Also, I suggest to use the same transaction type request for all services. Transaction types aren't meant to distinguish between services, but between different types of transactions (requests served by a web app, background tasks executed by something like celery, cron jobs, ...)

nerusnayleinad · 2020-02-05T14:58:35Z

there are no differences between 2a and 2b. These are just mock services, that receive a request, give it some delay and make another.

With different service names, I get all the requests in separate services. the view is the same as before when accessing each service.

With same service name for all and same transaction type, the last one overwrites all traces, so I only see the last request.

basepi · 2020-02-05T16:16:58Z

Can you give us code snippets for how you're instrumenting each service manually? For example, if you didn't call elasticapm.instrument() on service 2a and 2b, then we wouldn't instrument and add the headers properly to the outgoing request to service 3.

Additionally, what library is making the call from 2a/2b to 3? We need to make sure it's in the supported list.

It looks like the distributed tracing is working on 1 -> 2a/2b, just not 2a/2b -> 3.

nerusnayleinad · 2020-02-05T16:51:37Z

@basepi sure. These are the scripts of all 4 services:

service 1: (this is the same service you advised to use, from the examples.)

import requests
import time
import elasticapm

def main():
    sess = requests.Session()
    for url in ['http://localhost:8080', 'http://localhost:8000']:
        resp = sess.get(url)
        time.sleep(1)

if __name__ == '__main__':
    client = elasticapm.Client(service_name='service 1')
    elasticapm.instrument()
    client.begin_transaction('main')
    main()
    client.end_transaction('main')

service 2a:

import elasticapm
from flask import Flask
import requests
import time

app = Flask(__name__)

url = "http://localhost:8888/"

@app.route('/')
def svc2a():
    client.begin_transaction('main')
    time.sleep(2)
    result = requests.get(url)
    client.end_transaction('main')
    return result.content

if __name__ == '__main__':
    client = elasticapm.Client(service_name='service 2a')
    elasticapm.instrument()
    app.run(host='127.0.0.1', port=8080)

service 2b:

import elasticapm
from flask import Flask
import requests
import time

app = Flask(__name__)

url = "http://localhost:8888/"

@app.route('/')
def svc2b():
    client.begin_transaction('main')
    time.sleep(2)
    result = requests.get(url)
    client.end_transaction('main')
    return result.content

if __name__ == '__main__':
    client = elasticapm.Client(service_name='service 2b')
    elasticapm.instrument()
    app.run(host='127.0.0.1', port=8000)

service 3:

import elasticapm
from flask import Flask
import time

app = Flask(__name__)

@app.route('/')
def hello():
    client.begin_transaction('main')
    time.sleep(1)                       # mocks doing something
    client.end_transaction('main')
    return 'hello world'

if __name__ == '__main__':
    client = elasticapm.Client(service_name='service 3')
    elasticapm.instrument()
    app.run(host='127.0.0.1', port=8888)

I think I have to do something with that session object I create in the service 1.

basepi · 2020-02-05T23:42:07Z

Alright, I see the disconnect. I have a working example, modified from your example above, in this gist

The problem was that you didn't quite have the instrumentation for service 2a/b and 3 correct. When you were looking at the trace, all you were seeing were the transaction and two spans (for the two network calls) from service 1.

This is because our instrumentation for flask requires us to connect to flask's signals, which we only do if you set up our flask integration, as documented here.

Otherwise the flask routing doesn't get instrumented, which means that while our headers are there from service 1, the agent doesn't know to look for them. In order to create a transaction that actually uses incoming http headers, you have to use begin_transaction with a TraceParent object, like we do here (in the flask integration code). So, if you ever needed to do distributed tracing in an unsupported framework, you'd do something like that.

Luckily, if you use our official integrations, we do all that hard work for you!

This is the waterfall I see when I run the example in my gist:

Much better!

Please keep me posted if anything I explained wasn't clear. We're here to help!

nerusnayleinad · 2020-02-06T09:40:41Z

Oh. Yes, much better.

So elasticapm.capture_span() is the guy to keep the context up to date.

Thank you very much.

nerusnayleinad · 2020-02-06T14:39:03Z

To get this complete, do you have any examples on how to do this without any framework? with pure python?

basepi · 2020-02-06T17:47:10Z

So elasticapm.capture_span() is the guy to keep the context up to date.

Spans are sub-pieces of transactions. The reason I used capture_span in my example is that there will already be an active transaction, since our Flask integration creates a new transaction for every incoming request.

To get this complete, do you have any examples on how to do this without any framework? with pure python?

It would look similar to your original example, except that you need to create a TraceParent object. I linked to the flask code and it's going to look similar to that:

# We need to create a parent trace, by creating a TraceParent object to pass into our new transaction
from elasticapm.utils.disttracing import TraceParent
trace_parent = TraceParent.from_headers(request.headers)
client.begin_transaction("main", trace_parent=trace_parent)

Note that there are a few different helpers in the TraceParent class that help with building these objects. For example, if you were using a message bus such that you didn't have the concept of HTTP headers, you could use to_string() to convert it to a string, add it to your message, and use TraceParent.from_string() on the other end to reconstruct the parent trace.

beniwohli · 2020-04-22T14:31:53Z

It looks like all questions have been addressed. I'll close this for now :)

simitt transferred this issue from elastic/apm-server Feb 4, 2020

basepi added the question label Feb 4, 2020

bmorelli25 mentioned this issue Mar 2, 2020

[docs] Distributed Tracing guide #744

Closed

beniwohli closed this as completed Apr 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Tracing with APM server; with Python #712

Distributed Tracing with APM server; with Python #712

nerusnayleinad commented Feb 4, 2020

basepi commented Feb 4, 2020

nerusnayleinad commented Feb 5, 2020

beniwohli commented Feb 5, 2020

nerusnayleinad commented Feb 5, 2020

basepi commented Feb 5, 2020

nerusnayleinad commented Feb 5, 2020

basepi commented Feb 5, 2020

nerusnayleinad commented Feb 6, 2020

nerusnayleinad commented Feb 6, 2020

basepi commented Feb 6, 2020

beniwohli commented Apr 22, 2020

Distributed Tracing with APM server; with Python #712

Distributed Tracing with APM server; with Python #712

Comments

nerusnayleinad commented Feb 4, 2020

basepi commented Feb 4, 2020

nerusnayleinad commented Feb 5, 2020

beniwohli commented Feb 5, 2020

nerusnayleinad commented Feb 5, 2020

basepi commented Feb 5, 2020

nerusnayleinad commented Feb 5, 2020

basepi commented Feb 5, 2020

nerusnayleinad commented Feb 6, 2020

nerusnayleinad commented Feb 6, 2020

basepi commented Feb 6, 2020

beniwohli commented Apr 22, 2020