# Import LangSmith Traces into TraceMem

This notebook fetches conversation traces from LangSmith and imports them
into TraceMem's knowledge graph for retrieval, then uses retrieved memories
as context for an LLM call.

## Prerequisites

- Neo4j running locally (`docker compose up -d neo4j`)
- `LANGSMITH_API_KEY` in `.env`
- `OPENAI_API_KEY` in `.env` (for embeddings and LLM)

## 1. Setup

In [5]:
import json
import tempfile
from pathlib import Path
from typing import Any

from dotenv import load_dotenv
from langsmith import Client

from tracemem_core import RetrievalConfig, TraceMem, TraceMemConfig
from tracemem_core.messages import Message, ToolCall

load_dotenv()


class TradingResourceExtractor:
    """Resource extractor for trading agent tools."""

    def extract(self, tool_name: str, args: dict[str, Any]) -> str | None:
        if tool_name == "get_ticker_details":
            symbol = args.get("symbol")
            if symbol and isinstance(symbol, str):
                return f"ticker://{symbol.upper()}"
        return None


ls_client = Client()

_tmpdir = tempfile.mkdtemp(prefix="tracemem_langsmith_")
config = TraceMemConfig(
    mode="global",
    lancedb_path=Path(_tmpdir) / "lancedb",
)

tm = TraceMem(config=config, resource_extractor=TradingResourceExtractor())
await tm.__aenter__()

print(f"Connected. LanceDB at: {_tmpdir}")

Connected. LanceDB at: /var/folders/lf/j9dpx4lx3bl0x2tgdkr0pn8c0000gn/T/tracemem_langsmith_3pl_fddx


## 2. Fetch Traces from LangSmith

Get all root traces for a specific user from the `stonki_prd` project,
grouped by `thread_id` (conversation thread).

In [6]:
PROJECT = "stonki_prd"
USER_ID = "user_33nESbTFBKeYhcXBFV2S9Mx193o"

root_traces = list(ls_client.list_runs(
    project_name=PROJECT,
    is_root=True,
    filter=f'has(metadata, \'{{"user_id": "{USER_ID}"}}\') ',
    limit=100,
))

print(f"Fetched {len(root_traces)} root traces")

# Group by thread_id
threads: dict[str, list] = {}
for t in root_traces:
    tid = (t.metadata or {}).get("thread_id", "unknown")
    threads.setdefault(tid, []).append(t)

# Sort each thread chronologically
for tid in threads:
    threads[tid].sort(key=lambda r: r.start_time)

print(f"Across {len(threads)} threads")
for tid, runs in sorted(threads.items(), key=lambda x: x[1][0].start_time):
    first_msg = (runs[0].inputs or {}).get("messages", [{}])
    first_content = first_msg[0].get("content", "")[:80] if first_msg else ""
    print(f"  {tid[:12]}... ({len(runs)} turns) - {first_content}")

Fetched 100 root traces
Across 21 threads
  dd7097a2-970... (8 turns) - ðŸ”” **Trigger Alert**

message: Review all portfolio positions, check P&L changes,
  5bb94f2d-246... (37 turns) - analyze SKYT using SEPA criteria (mark mineervini) which phase is it in and wher
  dd27c455-e98... (17 turns) - ðŸ”” **Trigger Alert**

message: Perform daily portfolio review: Check all current 
  59ba5197-fc5... (2 turns) - ðŸ”” **Trigger Alert**

message: Review CVNA ultra-deep OTM position: 10 contracts 
  291c97d7-8c7... (1 turns) - Hi
  90819e3a-5b7... (1 turns) - I closed my short put at 50% profit and the stock has gone up even more. Can you
  1ac1ebd7-df2... (1 turns) - how come qcom is lagging behind the ai chip frenzy?
  e8bc2ed5-33d... (1 turns) - Analyze WOLF and especially the option activity
  b4dbc6d3-a68... (5 turns) - Create a recipe to track these stocks:
  4a0af4a5-4c3... (1 turns) - Analyze nxxt for a swing trade
  b644f647-75a... (1 turns) - what are people saying about RIME on re

## 3. Convert LangSmith Traces to TraceMem Messages

LangSmith root trace outputs contain the full conversation as a list of
messages with `type` field (`human`, `ai`, `tool`). AI messages may have
content blocks (thinking, text, tool_use) and tool_calls.

In [7]:
def extract_text_content(content) -> str:
    """Extract text from LangSmith message content (string or block list)."""
    if isinstance(content, str):
        return content
    if isinstance(content, list):
        texts = []
        for block in content:
            if isinstance(block, dict) and block.get("type") == "text":
                texts.append(block.get("text", ""))
        return "\n".join(texts)
    return str(content) if content else ""


def extract_tool_calls(msg: dict) -> list[ToolCall]:
    """Extract tool calls from a LangSmith AI message."""
    tool_calls = []
    # tool_calls field (LangChain format)
    for tc in msg.get("tool_calls", []):
        tool_calls.append(ToolCall(
            id=tc.get("id", ""),
            name=tc.get("name", ""),
            args=tc.get("args", {}),
        ))
    # Also check content blocks for tool_use
    content = msg.get("content", "")
    if isinstance(content, list):
        for block in content:
            if isinstance(block, dict) and block.get("type") == "tool_use":
                tc_id = block.get("id", "")
                # Skip if already captured from tool_calls field
                if not any(t.id == tc_id for t in tool_calls):
                    tool_calls.append(ToolCall(
                        id=tc_id,
                        name=block.get("name", ""),
                        args=block.get("input", {}),
                    ))
    return tool_calls


def langsmith_messages_to_tracemem(messages: list[dict]) -> list[Message]:
    """Convert LangSmith output messages to TraceMem Messages."""
    result = []
    for msg in messages:
        msg_type = msg.get("type", "")
        content = extract_text_content(msg.get("content", ""))

        if msg_type == "human":
            # Skip system reminders injected as human messages
            if content.startswith("<system_reminder>"):
                continue
            result.append(Message(role="user", content=content))

        elif msg_type == "ai":
            tool_calls = extract_tool_calls(msg)
            result.append(Message(
                role="assistant",
                content=content,
                tool_calls=tool_calls,
            ))

        elif msg_type == "tool":
            tool_content = msg.get("content", "")
            if isinstance(tool_content, (dict, list)):
                tool_content = json.dumps(tool_content)[:2000]
            result.append(Message(
                role="tool",
                content=str(tool_content)[:2000],
                tool_call_id=msg.get("tool_call_id", ""),
            ))

    return result


# Test with one trace
sample_thread_id = list(threads.keys())[0]
sample_runs = threads[sample_thread_id]
last_run = sample_runs[-1]
out_msgs = (last_run.outputs or {}).get("messages", [])

converted = langsmith_messages_to_tracemem(out_msgs)
print(f"Thread {sample_thread_id[:12]}... last trace has {len(out_msgs)} raw msgs -> {len(converted)} TraceMem msgs")
for m in converted[:6]:
    tc_str = f" tool_calls={[t.name for t in m.tool_calls]}" if m.tool_calls else ""
    print(f"  [{m.role:9s}] {m.content[:80]}{tc_str}")
if len(converted) > 6:
    print(f"  ... ({len(converted) - 6} more)")

Thread e56ebdaa-1c4... last trace has 38 raw msgs -> 38 TraceMem msgs
  [user     ] ASST bitcoin holdings are worth more than its market cap how come?
  [assistant]  tool_calls=['get_ticker_details']
  [tool     ] {
  "active": true,
  "address": {
    "address1": "100 CRESCENT CT",
    "addre
  [assistant] Good catchâ€”that's actually a classic Bitcoin treasury company arbitrage situatio
  [user     ] use web search
  [assistant]  tool_calls=['transfer_to_websearch_agent']
  ... (32 more)


## 4. Import All Threads into TraceMem

For each thread, we take the **last trace's output messages** (which
contains the full conversation history) and import it as a single
conversation.

In [8]:
imported = {}

for thread_id, runs in sorted(threads.items(), key=lambda x: x[1][0].start_time):
    # Use the last trace's output â€” it has the full conversation
    last_run = runs[-1]
    out_msgs = (last_run.outputs or {}).get("messages", [])

    if not out_msgs:
        print(f"  SKIP {thread_id[:12]}... (no output messages)")
        continue

    messages = langsmith_messages_to_tracemem(out_msgs)
    if not messages:
        print(f"  SKIP {thread_id[:12]}... (no convertible messages)")
        continue

    result = await tm.import_trace(thread_id, messages)
    user_count = sum(1 for m in messages if m.role == "user")
    assistant_count = sum(1 for m in messages if m.role == "assistant")
    imported[thread_id] = result
    print(f"  {thread_id[:12]}... {user_count} user + {assistant_count} assistant msgs -> {len(result)} nodes")

print(f"\nImported {len(imported)} threads")

  dd7097a2-970... 5 user + 8 assistant msgs -> 2 nodes
  5bb94f2d-246... 8 user + 19 assistant msgs -> 2 nodes
  dd27c455-e98... 7 user + 18 assistant msgs -> 2 nodes
  59ba5197-fc5... 2 user + 2 assistant msgs -> 2 nodes
  291c97d7-8c7... 1 user + 1 assistant msgs -> 2 nodes
  90819e3a-5b7... 5 user + 19 assistant msgs -> 4 nodes
  1ac1ebd7-df2... 1 user + 5 assistant msgs -> 2 nodes
  e8bc2ed5-33d... 1 user + 9 assistant msgs -> 4 nodes
  b4dbc6d3-a68... 5 user + 12 assistant msgs -> 2 nodes
  4a0af4a5-4c3... 1 user + 5 assistant msgs -> 4 nodes
  b644f647-75a... 1 user + 4 assistant msgs -> 2 nodes
  f26d8fad-1f3... 10 user + 36 assistant msgs -> 2 nodes
  c02c6d63-205... 1 user + 1 assistant msgs -> 2 nodes
  8599bf57-3a2... 1 user + 2 assistant msgs -> 2 nodes
  8c08714b-8d3... 1 user + 1 assistant msgs -> 2 nodes
  b74ae644-3f5... 1 user + 2 assistant msgs -> 2 nodes
  f7943f09-9ed... 1 user + 4 assistant msgs -> 2 nodes
  7aa09c2c-593... 1 user + 11 assistant msgs -> 2 nodes
  8

## 5. Search Imported Conversations

In [9]:
results = await tm.search("bitcoin holdings market cap", config=RetrievalConfig(limit=5))

for r in results:
    print(r)
    print()

Result(f95f63bd, score=0.033, conv=e56ebdaa-1c4b-4679-905d-27e663c7e09c, text='ASST bitcoin holdings are worth more than its market cap how...', context=yes)

Result(2f761ddf, score=0.032, conv=e56ebdaa-1c4b-4679-905d-27e663c7e09c, text='current market cap is 738M can you check recent news that sa...', context=yes)

Result(2626394f, score=0.031, conv=e56ebdaa-1c4b-4679-905d-27e663c7e09c, text='can you compare to MSTR and other bitcoin treasuries', context=yes)

Result(0d60b4bc, score=0.031, conv=b4dbc6d3-a680-4956-8b42-eb34de76a60e, text='Document my thesis on each stock. Wolf 250 dte moonshot call...', context=yes)

Result(23993b57, score=0.030, conv=dd27c455-e988-435b-b1a8-18a394eac8c9, text='**[Summary of conversation which got stale or too long]**\n\nT...', context=yes)



## 6. Expand a Result to Full Trajectory

In [10]:
if results:
    trajectory = await tm.get_trajectory(results[0].node_id)
    print(trajectory)
else:
    print("No results to expand")

Trajectory(4 steps):
  Step(f95f63bd UserText: 'ASST bitcoin holdings are worth more than its market cap how...')
  Step(0d98dcf4 AgentText: '' tools=[get_ticker_details])
  Step(46979700 AgentText: "Good catchâ€”that's actually a classic Bitcoin treasury compan...")
  Step(04abc4d1 UserText: 'use web search')


## 7. LLM Call with Retrieved Memories

Use TraceMem to retrieve relevant past conversations, then pass them as
context to an LLM call via the OpenAI API.

In [11]:
import openai

USER_QUERY = "What do you know about my bitcoin and crypto holdings?"

# Retrieve relevant memories
memories = await tm.search(USER_QUERY, config=RetrievalConfig(limit=5))

# Build context from retrieved memories
context_parts = []
for i, mem in enumerate(memories, 1):
    part = f"Memory {i} (score={mem.score:.3f}, conversation={mem.conversation_id[:12]}...):\n"
    part += f"  User: {mem.text}\n"
    if mem.context and mem.context.agent_text:
        part += f"  Agent: {mem.context.agent_text.text[:500]}\n"
    if mem.context and mem.context.tool_uses:
        tools = ", ".join(str(t) for t in mem.context.tool_uses)
        part += f"  Tools used: {tools}\n"
    context_parts.append(part)

context_block = "\n".join(context_parts)
print("Retrieved context:\n")
print(context_block)
print("\n---\n")

# LLM call with memories as context
client = openai.OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a helpful assistant. Use the following memories from past "
                "conversations to provide an informed answer. If the memories don't "
                "contain relevant information, say so.\n\n"
                f"## Past Conversation Memories\n\n{context_block}"
            ),
        },
        {"role": "user", "content": USER_QUERY},
    ],
)

print("LLM Response:\n")
print(response.choices[0].message.content)

Retrieved context:

Memory 1 (score=0.031, conversation=e56ebdaa-1c4...):
  User: ASST bitcoin holdings are worth more than its market cap how come?
  Agent: 
  Tools used: GET_TICKER_DETAILS(ticker://ASST rv=950caab2 res=62514a0b)

Memory 2 (score=0.031, conversation=b4dbc6d3-a68...):
  User: Document my thesis on each stock. Wolf 250 dte moonshot calls 35 strike, high risk high reward if they navigate their financial situation correctly. Axti double top and currently holding gap , high volatility small mcap but very important for chip industry (search for reddit posts on this). Skyt a us based foundry as a service still small 1.5B cap could 10x from here but a bit over bought. Deploy 5K over 2 weeks. Finally qcom severly behind ai peers but can rally when ai moves to edge - accumulating leaps 200 strike
  Agent: Let me search for Reddit discussion on AXTI to add context to your thesis, then I'll create the recipe documenting your positions.

Memory 3 (score=0.031, conversation=e56ebd

## 8. Cleanup

In [12]:
import shutil

await tm.__aexit__(None, None, None)
shutil.rmtree(_tmpdir, ignore_errors=True)
print("Cleaned up")

Cleaned up
