# Traces with AWS X-Ray

Logs and metrics are powerful tools for understanding the behavior of a distruted system (set of one or more services).

Traces are a third type of observability/telemetry data that your application can generate. 

Generally speaking, traces are used for 3 things:

1. **Location:** identity *where* in your distributed system issues happen. This could be between two services, or within a single service.
2. **Profiling:** understand *how long* each part of your system takes to execute. Again, this could be between two services, or within a single service.
3. **Log correlation:** trace IDs can help you look up logs generated by ALL services that are involved in a single request.

If these statements do not make sense yet, don't worry! This is an area where you just need to see code and pictures.

A theme of these statements is: whereas logs and metrics are help you understand how a *single application* is performing, traces help you get a birds-eye, system-wide view of *multiple interdependent applications*.

## Constants

In [1]:
AWS_PROFILE = "cloud-course"
AWS_REGION = "us-west-2"
PRINT_SEGMENTS = True  # enables: prints traces to stdout
SEND_TRACES_TO_XRAY = True  # enables: use boto3 to actually send segments to x-ray
DEBUG_PUT_TRACE_DOCUMENTS_CALLS = False  # enables: printing boto3 put_trage_segments response for debugging

import os

os.environ["AWS_PROFILE"] = AWS_PROFILE
os.environ["AWS_REGION"] = AWS_REGION

## Imports

In [2]:
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core.models.segment import Segment
from aws_xray_sdk.core.models.entity import Entity
import boto3
from rich import print
import json
import IPython


## Setup - Readers: feel free to skip

Usually the `aws-xray-sdk` sends trace data (segments and subsegments) to a daemon running locally. 

To avoid the overhead of running a daemon, we will override the "Emitter" class used by the SDK to

1. print out the segment and subsegment data as JSON
2. send them directly to XRay
3. print a URL to see the trace segments in the console

In [3]:
class DirectXrayEmitter:
    def send_entity(self, entity):
        if SEND_TRACES_TO_XRAY:
            response = send_trace_to_xray(trace_entity=entity)

            # this output is helpful for troubleshooting trace upload failures
            if DEBUG_PUT_TRACE_DOCUMENTS_CALLS: print(response) 
            
            trace_url = make_trace_url_in_aws_console(region=AWS_REGION, trace_id=entity.trace_id)
            print(f"Trace URL in AWS console: {trace_url}")
            
        # also print the segments/subsegments data to the console
        if PRINT_SEGMENTS: print("Segment:", entity.to_dict())

def send_trace_to_xray(trace_entity: Entity):
    xray = boto3.client('xray')
    return xray.put_trace_segments(TraceSegmentDocuments=[trace_entity.serialize()])

def make_trace_url_in_aws_console(region: str, trace_id: str):
    """Uploaded traces will be visible here in the AWS console."""
    return f"https://{region}.console.aws.amazon.com/cloudwatch/home?region={region}#xray:traces/{trace_id}"

# Use the custom emitter
xray_recorder.configure(emitter=DirectXrayEmitter(), sampling=False)

## Create a new segment and capture a subsegment

Note

- A **segment** is a set of **subsegments** plus some metadata
- Do not create many segments. For example, use one segment per
    - request
    - background job
    - runtime, e.g. execution of a python script or AWS lambda function

In [4]:
segment: Segment = xray_recorder.begin_segment('my_first_segment')
segment.put_annotation('key', 'value')

# add a subsegment, note that we do not need a reference to the parent segment to do this!
subsegment = xray_recorder.begin_subsegment('my_first_subsegment')
subsegment.put_metadata('key', 'value1')
xray_recorder.end_subsegment()

# add another subsegment
subsegment = xray_recorder.begin_subsegment('my_second_subsegment')
subsegment.put_metadata('key', 'value2')
xray_recorder.end_subsegment()

# end the segment
xray_recorder.end_segment()

## Visualizing a segment in the AWS console

^^^ The above segment shows up in the AWS console like this:

The segment shows up in the "trace map" as a node. Nodes typically represent services, e.g. a FastAPI app, the S3 service, etc. 

Generally, for a given request or "transaction" through a system, each node should produce exactly one segment. Every segment shows up as a node in the trace map.
The name of the node is the name of the segment.

![Segment](./assets/segment-no-ctx-manager.png)

## Cleaner code: use context managers

The following code is equivalent to the block above, but there is no need to manually close the segments and subsegments. 

The context manager approach (using a `with` statement) takes care of that for us.

In [5]:
with xray_recorder.in_segment('my_first_segment') as segment:
    segment.put_annotation('key', 'value')
    with xray_recorder.in_subsegment('my_first_subsegment') as subsegment:
        subsegment.put_metadata('key', 'value1')
    with xray_recorder.in_subsegment('my_second_subsegment') as subsegment:
        subsegment.put_metadata('key', 'value2')

## Metadata

Segments and subsegments can contain metadata! (see [full docs on storable metadata here](https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-http))

Within a segment or subsegment, you can store

- **SQL queries**, e.g. if the subsegment represents a database query
- **Exceptions / tracebacks**, e.g. if an error was thrown during the subsegment
- **HTTP request/response data**, e.g. if the subsegment represents an HTTP request you could document
    - the HTTP method (`GET`, `POST`, etc.)
    - the URL
    - the response status code
- **arbitrary key-value pairs**
  - **annotations:** key-value pairs that are indexed for searching traces
  - **metadata:** key-value pairs that are stored with the trace, but not indexed. Supports more complex value types like objects and lists.

In [6]:
from aws_xray_sdk.core.utils.stacktrace import get_stacktrace


with xray_recorder.in_segment('my_application') as segment:
    segment: Segment
    segment.put_annotation(key='key', value='value')

    # simulate making a call to an LLM service that fails
    with xray_recorder.in_subsegment('failed_http_request_to_llm_service') as subsegment:
        subsegment.put_metadata(key='prompt', value='write me a poem about a llama')
        # the http_meta key names must come from this list: https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-http
        subsegment.put_http_meta(key='url', value='http://cool-llm.com')
        subsegment.put_http_meta(key='status', value='500')
        subsegment.put_http_meta(key='method', value='GET')
        subsegment.put_http_meta(key='user_agent', value='python-requests/2.25.1')
        subsegment.put_http_meta(key='content_length', value='1234')

    # show a subsegment with an error
    with xray_recorder.in_subsegment('failed_math') as subsegment:
        # fill this in
        try:
            1 / 0
        except ZeroDivisionError as err:
            subsegment.add_exception(err, stack=get_stacktrace(), remote=True)

Any of the tabs not shown in the screenshots below are empty.

### Segment: `my_application`

![](./assets/segment-metadata-example.png)

### Subsegment `failed_http_request_to_llm_service`

![](./assets/failed-llm-http-request-subsegment.png)

### Subsegment `failed_math`

![](./assets/failed-math-subsegment.png)

## Context Propagation

## Autoinstrumentation!

**"Instrumenting"** your code means adding statements to emit telemetry data. This could be adding logging statements, emitting metrics, or emitting trace segments.

It is common for telemetry SDKs like AWS X-Ray to provide **autoinstrumentation**. 

Basically, the SDK can inject instrumentation code into popular libraries at runtime by monkey patching `with xray_recorder.in_subsegment(...)` statements into the library's code.

From the [AWS XRay docs](https://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-python-patching.html):

> Supported Libraries
> botocore, boto3 – Instrument AWS SDK for Python (Boto) clients.
> 
> `pynamodb` – Instrument PynamoDB's version of the Amazon DynamoDB client.
> 
> `aiobotocore`, `aioboto3` – Instrument asyncio-integrated versions of SDK for Python clients.
> 
> `requests`, `aiohttp` – Instrument high-level HTTP clients.
> 
> `httplib`, `http.client` – Instrument low-level HTTP clients and the higher level libraries that use them.
> 
> `sqlite3` – Instrument SQLite clients.
> 
> `mysql-connector-python` – Instrument MySQL clients.
> 
> `pg8000` – Instrument Pure-Python PostgreSQL interface.
> 
> `psycopg2` – Instrument PostgreSQL database adapter.
> 
> `pymongo` – Instrument MongoDB clients.
> 
> `pymysql` – Instrument PyMySQL based clients for MySQL and MariaDB.
> 
> When you use a patched library, the X-Ray SDK for Python creates a subsegment for the call and records information from the request and response. A segment must be available for the SDK to create the subsegment, either from the SDK middleware or from AWS Lambda.

In [7]:
from aws_xray_sdk.core import patch_all, xray_recorder
import requests

# Automatically instrument all supported libraries (e.g., requests, boto3)
patch_all()

with xray_recorder.in_segment('my_segment_3') as segment:
    requests.get("https://httpbin.org/status/500")
    requests.get("https://httpbin.org/status/200")

![](./assets/autoinstrumentation-http-bin.png)

Notes:

- The `patch_all()` function caused a new subsegment to be created for each HTTP request made with `reqeusts.get(...)`
  - It even automatically added all the metadata like status code, URL, etc. 🎉
- AWS X-Ray detected that these subsegments exist, and on AWS' side, X-Ray generated two additional segments (not subsegments). One for each HTTP request made.
  - The result is that a new node appears in the trace map! For this trace, we can see that 50% of our reqests failed.
  - This is consistent with our code! We have one intentionally successful request, and one intentionally failed request. 🎉

Note that the the `parent_id` value of each of the `httpbin_org` segments is actually the ID of the corresponding subsegments creating by the autoinstrumented `requests.get(...)` calls. So they line up!

## Profiling

Now that we know about autoinstrumentation, let's use traces to profile our code. We will use this to opportunities to optimize the performance.

In [15]:
PRINT_SEGMENTS = False

In [16]:
# Patch all supported libraries for tracing
patch_all()

# Sequential calls using boto3
with xray_recorder.in_segment('my_application'):
    sts_client = boto3.client('sts')
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()
    sts_client.get_caller_identity()

![](./assets/sequential-boto.png)

In [18]:
from aws_xray_sdk.core.async_context import AsyncContext
from aws_xray_sdk.core.context import Context
xray_recorder.context = AsyncContext()

import aioboto3
import asyncio

patch_all()

# concurrent calls using async calls with boto3
session = aioboto3.Session()
async with session.client('sts') as sts_client_async:
    with xray_recorder.in_segment('my_application'):
        tasks = [
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
            sts_client_async.get_caller_identity(),
        ]
        responses = await asyncio.gather(*tasks)

# reset the xray_recorder
xray_recorder.context = Context()

![](./assets/parallelized-boto.png)