```{contents}
```
## Tracing 

**Tracing** is the process of **recording every step of an LLM workflow**—inputs, outputs, timings, errors, retries, and tool calls—so you can **observe, debug, and optimize** executions.

In practice, tracing answers:

* *What happened?*
* *Why did it happen?*
* *Where is time/cost spent?*

Tracing is natively supported in LangChain and commonly visualized using LangSmith.

```
User Input
  ↓
Retriever → Prompt → LLM → Tool → LLM
  ↓
Structured Trace (timeline + metadata)
```

---

### Why Tracing Is Critical

* Debug incorrect answers
* Diagnose latency bottlenecks
* Track retries/fallbacks
* Measure token usage and cost
* Audit tool usage
* Compare model versions

Without tracing, LLM pipelines are a **black box**.

---

### What Gets Traced

A trace typically includes:

* Chain / runnable start & end
* Inputs and outputs (optionally redacted)
* LLM calls (model, tokens, latency)
* Tool calls and results
* Errors, retries, fallbacks
* Hierarchical parent–child relationships

---

### Architecture View

![Image](https://mintcdn.com/langchain-5e9cc07a/rqYqeBEA_2oeiw17/langsmith/images/cloud-arch-light.png?auto=format\&fit=max\&n=rqYqeBEA_2oeiw17\&q=85\&s=0790cbdf4fe131c74d1e60bb120834e3)

![Image](https://media.licdn.com/dms/image/v2/D4E22AQHa-BlQogFLPA/feedshare-shrink_800/B4EZemSmlkHgAg-/0/1750841585548?e=2147483647\&t=ENr_EwR-JgI4UTvt_YWFco9c-4dG15fTxtrDfURh3aM\&v=beta)

![Image](https://blog.langchain.com/content/images/size/w1200/2024/01/Screenshot-2024-01-28-at-5.03.55-PM.png)


---

### Enable Tracing (Minimal Setup)

Set environment variables (once per environment):

```bash
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your_langsmith_api_key
export LANGCHAIN_PROJECT=demo-tracing
```

This automatically traces **all LangChain executions**.

---

### Trace a Simple Chain



In [1]:
from langchain_openai import ChatOpenAI
from langchain_classic.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Explain {topic} in one sentence."
)

llm = ChatOpenAI()

chain = prompt | llm

chain.invoke({"topic": "tracing"})


AIMessage(content='Tracing is the act of following or tracking the path or movement of something, typically in order to understand its location, development, or progress.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 29, 'prompt_tokens': 14, 'total_tokens': 43, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-Cq0BNDE7PdNKpOCsmX7VVVd7kRO6Z', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019b4c30-4c9a-7cb3-a477-fd82e93265b5-0', usage_metadata={'input_tokens': 14, 'output_tokens': 29, 'total_tokens': 43, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

**What you get**

* A trace showing:

  * Prompt rendering
  * LLM call
  * Tokens used
  * Latency

---

### Tracing a RunnableSequence (Step-by-Step)



In [2]:
from langchain_core.runnables import RunnableLambda

chain = (
    RunnableLambda(lambda x: x.strip())
    | RunnableLambda(lambda x: {"topic": x})
    | prompt
    | llm
)

chain.invoke("  tracing in langchain  ")

AIMessage(content='Tracing in langchain is the process of analyzing and recording the execution of code to understand how different parts of the program interact and behave.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 17, 'total_tokens': 45, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-Cq0BkOOg98wZHFNw8mE8MR2CWSFrh', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019b4c30-a706-7a53-a300-e1022b586d1e-0', usage_metadata={'input_tokens': 17, 'output_tokens': 28, 'total_tokens': 45, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})



Trace shows:

* Each lambda step
* Data shape changes
* Final LLM output

---

### Tracing with Retrieval (RAG)



In [4]:
from langchain_core.runnables import RunnableLambda
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Create a sample retriever for demonstration
vectorstore = FAISS.from_texts(
    ["Tracing records every step of an LLM workflow including inputs, outputs, timings, and errors."],
    embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

# Create LLM instance
llm = ChatOpenAI()

In [5]:
chain = (
    {
        "question": RunnableLambda(lambda x: x),
        "context": retriever
    }
    | ChatPromptTemplate.from_template(
        "Answer using context.\nQ: {question}\nC: {context}"
    )
    | llm
)

chain.invoke("What is tracing?")


AIMessage(content='Tracing is the process of recording every step of a workflow, including inputs, outputs, timings, and errors.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 71, 'total_tokens': 94, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-Cq0D0aRXJEIsAfZ6IZjQrtomxgyRz', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019b4c31-d777-7c40-b2f0-ec20229bbc6a-0', usage_metadata={'input_tokens': 71, 'output_tokens': 23, 'total_tokens': 94, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})



Trace highlights:

* Retriever latency
* Documents returned
* Prompt size growth
* LLM generation time

---

### Tracing Retries and Fallbacks



In [6]:
primary = ChatOpenAI(model="gpt-4").with_retry(stop_after_attempt=2)
backup = ChatOpenAI(model="gpt-3.5-turbo")

llm = primary.with_fallbacks([backup])

llm.invoke("Explain tracing briefly")


AIMessage(content='Tracing is a process used to monitor or investigate the behavior and data flow of an application or system. It involves the collection of information about the operation of the program, such as function or method calls, events, or messages. This data then can be utilized for debugging, performance tuning, or understanding complex systems. Tracing is a critical component in software engineering, to identify and solve issues that could affect the functionality or performance of an application.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 89, 'prompt_tokens': 11, 'total_tokens': 100, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4-0613', 'system_fingerprint': None, 'id': 'chatcmpl-Cq0DQjjrOiB8Ed6Zj5wc8doRD9t9O', 'serv



Trace clearly shows:

* Failed attempts
* Retry count
* Fallback model used

---

### Tracing with Callbacks (Custom Metadata)



In [8]:
from langchain_classic.callbacks.base import BaseCallbackHandler

class TraceMeta(BaseCallbackHandler):
    def on_chain_start(self, serialized, inputs, **kwargs):
        print("Tracing chain with inputs:", inputs)

chain.invoke(
    {"topic": "observability"},
    config={"callbacks": [TraceMeta()]}
)


Tracing chain with inputs: {'topic': 'observability'}
Tracing chain with inputs: {'topic': 'observability'}
Tracing chain with inputs: {'topic': 'observability'}


TypeError: argument 'text': 'dict' object cannot be converted to 'PyString'



Callbacks **augment** traces with custom logs.

---

### Async + Streaming Tracing

```python
async for chunk in llm.astream(
    "Explain tracing with streaming"
):
    pass
```

Trace includes:

* Stream start
* Token-by-token timings
* Stream end

---

### What a Trace Looks Like (Conceptually)

```
Run
 ├─ RunnableSequence
 │   ├─ RunnableLambda (5 ms)
 │   ├─ PromptTemplate (1 ms)
 │   └─ ChatOpenAI (620 ms, 412 tokens)
 └─ Output
```

Each node is clickable in the UI.

---

### Tracing vs Logging

| Aspect          | Tracing        | Logging |
| --------------- | -------------- | ------- |
| Structure       | Hierarchical   | Flat    |
| Context         | Full execution | Partial |
| Latency insight | Yes            | No      |
| Token usage     | Yes            | No      |
| Tool visibility | Yes            | No      |

Tracing ≠ logging. Tracing is **execution-aware**.

---

### Best Practices

* Enable tracing in **dev/staging**
* Redact PII/secrets
* Use projects per environment
* Trace before optimizing
* Keep sampling in prod if needed

---

### Mental Model

Tracing is a **flight recorder** for LLM pipelines.

```
Something went wrong → open trace → see exactly where and why
```

---

### Key Takeaways

* Tracing provides **full visibility** into LLM workflows
* Automatic with LangChain when enabled
* Essential for debugging, cost control, and reliability
* Foundation for production-grade LLM systems

