# LangSmith Data Fetching - Document Processing & Query Monitoring

This notebook demonstrates how to fetch the latest 10 records from LangSmith for:
1. Document Processing operations
2. RAG Query executions

## Purpose
Simple data retrieval for the Observability/Monitor page implementation.

## Setup and Authentication

In [1]:
# Import required libraries
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent.parent
sys.path.insert(0, str(project_root))

from langsmith import Client
from config.settings import settings
import json

import pandas as pd

print("‚úì Libraries imported successfully")

* 'fields' has been removed


‚úì Libraries imported successfully


In [2]:
settings.langsmith_project

'car-buyer-assist-rag'

In [3]:
# Initialize LangSmith client
client = Client(api_key=settings.langsmith_api_key)
print(f"‚úì Connected to project: {settings.langsmith_project}")

‚úì Connected to project: car-buyer-assist-rag


## 1. Latest 10 Document Processing Runs

In [4]:
client.list_runs(
    project_name=settings.langsmith_project,
    filter='eq(name, "process_document")',
    limit=10,
    order_by="-start_time"  # Most recent first
)

<generator object Client.list_runs at 0x1138d4230>

In [5]:
# Fetch latest 10 document processing runs
doc_runs = list(client.list_runs(
    project_name=settings.langsmith_project,
    filter='eq(name, "process_document")',
    limit=10,
    order_by="-start_time"  # Most recent first
))

print(f"Found {len(doc_runs)} document processing run(s)")

Found 10 document processing run(s)


In [6]:
doc_runs

[<class 'langsmith.schemas.Run'>(id=019be048-f79d-7f71-83f9-86b81cc42463, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-f442-7cc3-8007-82d47d5e0174, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-ee2f-7383-8468-7367b3b6619d, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-ebda-7a22-abaf-d84fad79548a, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-e97e-7b62-a4e0-a27aaf83e120, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-e6a7-79d3-a190-f6a7a0c51d1d, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-e403-77d2-8298-f2101bcd01e2, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-e13f-79c3-80a6-58d3f84798a2, name='process_document', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019be048-d8a2-77b1-b

In [7]:
doc_runs[0]

<class 'langsmith.schemas.Run'>(id=019be048-f79d-7f71-83f9-86b81cc42463, name='process_document', run_type='chain')

In [8]:
doc_runs[0].inputs

{'collection_name': 'toyota_specs',
 'file': 'UploadedFile(file_id=\'3da2167a-d9e1-4c20-a190-3ef5a3c485a6\', name=\'Toyota_Tacoma_Specifications.pdf\', type=\'application/pdf\', size=75319, _file_urls=file_id: "3da2167a-d9e1-4c20-a190-3ef5a3c485a6"\nupload_url: "/_stcore/upload_file/d5a52209-fe1d-425e-86f7-5194e9ac3443/3da2167a-d9e1-4c20-a190-3ef5a3c485a6"\ndelete_url: "/_stcore/upload_file/d5a52209-fe1d-425e-86f7-5194e9ac3443/3da2167a-d9e1-4c20-a190-3ef5a3c485a6"\n)',
 'progress_callback': '<function DocumentProcessor.process_multiple_documents.<locals>.doc_progress at 0x10f6b9c60>'}

In [9]:
doc_runs[0].outputs

{'output': {'chunks_created': 4,
  'error': None,
  'filename': 'Toyota_Tacoma_Specifications.pdf',
  'metadata': {},
  'model_name': 'Tacoma',
  'processing_time': 1.2200260162353516,
  'status': 'success'}}

In [10]:
# Extract and display document processing data
if doc_runs:
    doc_data = []
    
    for run in doc_runs:
        duration = (run.end_time - run.start_time).total_seconds() if run.end_time and run.start_time else 0
        
        # Extract from nested output structure
        filename = 'Unknown'
        chunks_created = 0
        model_name = 'N/A'
        
        if run.outputs and 'output' in run.outputs:
            output_data = run.outputs['output']
            filename = output_data.get('filename', 'Unknown')
            chunks_created = output_data.get('chunks_created', 0)
            model_name = output_data.get('model_name', 'N/A')
        
        doc_data.append({
            'timestamp': run.start_time.strftime('%Y-%m-%d %H:%M:%S') if run.start_time else 'N/A',
            'filename': filename,
            'model_name': model_name,
            'chunks_created': chunks_created,
            'duration_sec': round(duration, 2),
            'status': run.status,
            'run_id': str(run.id)
        })
    
    # Display as DataFrame
    doc_df = pd.DataFrame(doc_data)
    print("\nüìÑ Document Processing Runs:")
    print(doc_df.to_string(index=False))
else:
    print("‚ö†Ô∏è  No document processing runs found.")
    print("Process some documents first via the Streamlit app.")


üìÑ Document Processing Runs:
          timestamp                             filename model_name  chunks_created  duration_sec  status                               run_id
2026-01-21 11:20:37     Toyota_Tacoma_Specifications.pdf     Tacoma               4          1.22 success 019be048-f79d-7f71-83f9-86b81cc42463
2026-01-21 11:20:36       Toyota_RAV4_Specifications.pdf       RAV4               4          0.86 success 019be048-f442-7cc3-8007-82d47d5e0174
2026-01-21 11:20:34      Toyota_Prius_Specifications.pdf      Prius               4          1.55 success 019be048-ee2f-7383-8468-7367b3b6619d
2026-01-21 11:20:34 Toyota_Highlander_Specifications.pdf Highlander               4          0.60 success 019be048-ebda-7a22-abaf-d84fad79548a
2026-01-21 11:20:33    Toyota_Corolla_Specifications.pdf    Corolla               4          0.60 success 019be048-e97e-7b62-a4e0-a27aaf83e120
2026-01-21 11:20:32      Toyota_Camry_Specifications.pdf      Camry               4          0.73 success 019b

In [11]:
# Extract and display document processing data
if doc_runs:
    doc_data = []
    
    for run in doc_runs:
        duration = (run.end_time - run.start_time).total_seconds() if run.end_time and run.start_time else 0
        
        # Extract from nested output structure
        filename = 'Unknown'
        chunks_created = 0
        model_name = 'N/A'
        
        if run.outputs and 'output' in run.outputs:
            output_data = run.outputs['output']
            filename = output_data.get('filename', 'Unknown')
            chunks_created = output_data.get('chunks_created', 0)
            model_name = output_data.get('model_name', 'N/A')
        
        doc_data.append({
            'timestamp': run.start_time.strftime('%Y-%m-%d %H:%M:%S') if run.start_time else 'N/A',
            'filename': filename,
            'model_name': model_name,
            'chunks_created': chunks_created,
            'duration_sec': round(duration, 2),
            'status': run.status,
            'run_id': str(run.id)
        })
    
    # Show JSON format for UI implementation
    print("\nüìã JSON Format (for UI):")
    print(json.dumps(doc_data[:3], indent=2))  # Show first 3 as example
else:
    print("‚ö†Ô∏è  No document processing runs found.")
    print("Process some documents first via the Streamlit app.")


üìã JSON Format (for UI):
[
  {
    "timestamp": "2026-01-21 11:20:37",
    "filename": "Toyota_Tacoma_Specifications.pdf",
    "model_name": "Tacoma",
    "chunks_created": 4,
    "duration_sec": 1.22,
    "status": "success",
    "run_id": "019be048-f79d-7f71-83f9-86b81cc42463"
  },
  {
    "timestamp": "2026-01-21 11:20:36",
    "filename": "Toyota_RAV4_Specifications.pdf",
    "model_name": "RAV4",
    "chunks_created": 4,
    "duration_sec": 0.86,
    "status": "success",
    "run_id": "019be048-f442-7cc3-8007-82d47d5e0174"
  },
  {
    "timestamp": "2026-01-21 11:20:34",
    "filename": "Toyota_Prius_Specifications.pdf",
    "model_name": "Prius",
    "chunks_created": 4,
    "duration_sec": 1.55,
    "status": "success",
    "run_id": "019be048-ee2f-7383-8468-7367b3b6619d"
  }
]


In [12]:
doc_multiple_runs = list(client.list_runs(
    project_name=settings.langsmith_project,
    filter='eq(name, "process_multiple_documents")',
    limit=10,
    order_by="-start_time"  # Most recent first
))

In [13]:
len(doc_multiple_runs)

7

In [14]:
doc_multiple_runs[0]

<class 'langsmith.schemas.Run'>(id=019be048-e139-7110-91eb-54645978339f, name='process_multiple_documents', run_type='chain')

In [15]:
doc_multiple_runs[0].inputs

{'clear_existing': True,
 'collection_name': 'toyota_specs',
 'files': ['UploadedFile(file_id=\'b022956f-f4a6-4495-9187-e7d7ad32b86b\', name=\'Introduction_to_Toyota_Car_Sales.pdf\', type=\'application/pdf\', size=49748, _file_urls=file_id: "b022956f-f4a6-4495-9187-e7d7ad32b86b"\nupload_url: "/_stcore/upload_file/d5a52209-fe1d-425e-86f7-5194e9ac3443/b022956f-f4a6-4495-9187-e7d7ad32b86b"\ndelete_url: "/_stcore/upload_file/d5a52209-fe1d-425e-86f7-5194e9ac3443/b022956f-f4a6-4495-9187-e7d7ad32b86b"\n)',
  'UploadedFile(file_id=\'4ab64b21-1a55-4159-b888-be24c95fcd07\', name=\'Toyota_bZ4X_Specifications.pdf\', type=\'application/pdf\', size=85929, _file_urls=file_id: "4ab64b21-1a55-4159-b888-be24c95fcd07"\nupload_url: "/_stcore/upload_file/d5a52209-fe1d-425e-86f7-5194e9ac3443/4ab64b21-1a55-4159-b888-be24c95fcd07"\ndelete_url: "/_stcore/upload_file/d5a52209-fe1d-425e-86f7-5194e9ac3443/4ab64b21-1a55-4159-b888-be24c95fcd07"\n)',
  'UploadedFile(file_id=\'f371d705-862e-4529-be23-7a3c2af3495a\', 

In [16]:
doc_multiple_runs[0].outputs

{'output': {'cleared_existing': True,
  'collection_name': 'toyota_specs',
  'results': [{'chunks_created': 3,
    'error': None,
    'filename': 'Introduction_to_Toyota_Car_Sales.pdf',
    'metadata': {},
    'model_name': 'Car',
    'processing_time': 0.7076489925384521,
    'status': 'success'},
   {'chunks_created': 5,
    'error': None,
    'filename': 'Toyota_bZ4X_Specifications.pdf',
    'metadata': {},
    'model_name': 'bZ4X',
    'processing_time': 0.6744251251220703,
    'status': 'success'},
   {'chunks_created': 4,
    'error': None,
    'filename': 'Toyota_Camry_Specifications.pdf',
    'metadata': {},
    'model_name': 'Camry',
    'processing_time': 0.725553035736084,
    'status': 'success'},
   {'chunks_created': 4,
    'error': None,
    'filename': 'Toyota_Corolla_Specifications.pdf',
    'metadata': {},
    'model_name': 'Corolla',
    'processing_time': 0.6028642654418945,
    'status': 'success'},
   {'chunks_created': 4,
    'error': None,
    'filename': 'Toyot

## 2. Latest 10 Query Execution Runs

In [17]:
# Fetch latest 10 RAG query runs
query_runs = list(client.list_runs(
    project_name=settings.langsmith_project,
    filter='eq(name, "rag_query")',
    limit=10,
    order_by="-start_time"  # Most recent first
))

print(f"Found {len(query_runs)} query execution run(s)")

Found 10 query execution run(s)


In [18]:
query_runs

[<class 'langsmith.schemas.Run'>(id=019beaac-466f-7ea1-8455-2fe67ab66089, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019beaab-9aa4-7702-81f7-a2a526ce3626, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019bea9e-aa7a-7131-86c5-1ac5a5c82577, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019bea9e-8591-74a2-b61a-579cbf64bd08, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019bea9e-5e0e-7640-8b06-7255dc9786f5, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019bea9e-30c0-74d1-8187-0d2d7c5eb71f, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019bea9e-149f-7060-bfab-19ac0c9e2717, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019bea9d-eb95-7e72-9302-000bfb5fe2d8, name='rag_query', run_type='chain'),
 <class 'langsmith.schemas.Run'>(id=019bea9d-8f96-7511-a557-b9c6f38f7d79, name='rag_query', run_type='chain'),
 

In [19]:
query_runs[0].inputs

{'conversation_history': [],
 'question': 'Can you schedule a test drive for me?'}

In [20]:
query_runs[0].outputs

{'output': {'answer': "I don't have that information in the available Toyota specifications.",
  'context_used': '[Context 1 - Source: Toyota_bZ4X_Specifications.pdf, Model: bZ4X, Page: 3]\n12/29/24, 8:56\u202fPMToyota_bZ4X_Specifications.md\nPage 4 of 4http://localhost:61787/48c788d0-f127-4a7b-9d3c-969bd631ad1b/\nCharging Time (Fast)~30 minutes (80%)~30 minutes (80%)\nStarting Price (USD) $42,000 $48,000\n\n[Context 2 - Source: Toyota_bZ4X_Specifications.pdf, Model: bZ4X, Page: 1]\n12/29/24, 8:56\u202fPMToyota_bZ4X_Specifications.md\nPage 2 of 4http://localhost:61787/48c788d0-f127-4a7b-9d3c-969bd631ad1b/\nPre-Collision Syste...',
  'processing_time': 4.713650941848755,
  'retrieved_chunks': 5,
  'sources': ['Toyota_Camry_Specifications.pdf',
   'Toyota_RAV4_Specifications.pdf',
   'Toyota_bZ4X_Specifications.pdf']}}

In [21]:
# Extract and display query execution data
if query_runs:
    query_data = []
    
    for run in query_runs:
        duration = (run.end_time - run.start_time).total_seconds() if run.end_time and run.start_time else 0
        
        # Extract question and answer
        question = run.inputs.get('question', 'Unknown') if run.inputs else 'Unknown'
        answer = run.outputs.get('output', {}).get('answer', '') if run.outputs else ''
        sources = run.outputs.get('output', {}).get('sources', []) if run.outputs else []
        retrieved_chunks = run.outputs.get('output', {}).get('retrieved_chunks', 0) if run.outputs else 0
        
        # Truncate for display
        question_short = (question[:60] + '...') if len(question) > 60 else question
        answer_short = (answer[:100] + '...') if len(answer) > 100 else answer
        
        query_data.append({
            'timestamp': run.start_time.strftime('%Y-%m-%d %H:%M:%S') if run.start_time else 'N/A',
            'question': question_short,
            'answer_preview': answer_short,
            'sources': ', '.join(sources),
            'chunks': retrieved_chunks,
            'duration_sec': round(duration, 2),
            'status': run.status,
            'run_id': str(run.id)
        })
    
    # Display as DataFrame
    query_df = pd.DataFrame(query_data)
    print("\nüí¨ Query Execution Runs:")
    print(query_df[['timestamp', 'question', 'answer_preview', 'chunks', 'duration_sec', 'status']].to_string(index=False))
else:
    print("‚ö†Ô∏è  No query runs found.")
    print("Ask some questions via the Interactive Assistant first.")


üí¨ Query Execution Runs:
          timestamp                                                        question                                                                                            answer_preview  chunks  duration_sec  status
2026-01-23 11:45:17                           Can you schedule a test drive for me?                                     I don't have that information in the available Toyota specifications.       5          4.71 success
2026-01-23 11:44:33                           Can you schedule a test drive for me?                                     I don't have that information in the available Toyota specifications.       5          4.17 success
2026-01-23 11:30:25                           Can you schedule a test drive for me?   I don't have the functionality to schedule a test drive for you. However, I can suggest that you vis...       5          3.34 success
2026-01-23 11:30:16 I need a fuel-efficient car for city driving, what do you re...   Given 

In [22]:
# Show JSON format for UI implementation
print("\nüìã JSON Format (for UI):")
# Include full data for first record as example
example_data = [{
    'timestamp': query_runs[3].start_time.strftime('%Y-%m-%d %H:%M:%S') if run.start_time else 'N/A',
    'question': query_runs[3].inputs.get('question', 'Unknown') if query_runs[3].inputs else 'Unknown',
    'answer': query_runs[3].outputs.get('output', {}).get('answer', '') if query_runs[3].outputs else '',
    'sources': query_runs[3].outputs.get('output', {}).get('sources', []) if query_runs[3].outputs else [],
    'retrieved_chunks': query_runs[3].outputs.get('output', {}).get('retrieved_chunks', 0) if query_runs[3].outputs else 0,
    'duration_sec': (query_runs[3].end_time - query_runs[3].start_time).total_seconds() if run.end_time and run.start_time else 0,
    'status': query_runs[3].status,
    'run_id': str(query_runs[3].id)
}]
print(json.dumps(example_data, indent=2))


üìã JSON Format (for UI):
[
  {
    "timestamp": "2026-01-23 11:30:16",
    "question": "I need a fuel-efficient car for city driving, what do you recommend?",
    "answer": "Given your need for a fuel-efficient car for city driving, I recommend the Toyota Corolla or the Toyota Prius.\n\nThe Toyota Corolla with a gasoline engine has an estimated fuel efficiency of 30 MPG (city) / 38 MPG (highway), while the hybrid engine has 53 MPG (city) / 52 MPG (highway) (Toyota_Corolla_Specifications.pdf). The base model starts at $20,000 (Toyota_Corolla_Specifications.pdf).\n\nAlternatively, the Toyota Prius (Hybrid) gets approximately 58 MPG (city) / 53 MPG (highway) and has a base model starting at $28,000 (Toyota_Prius_Specifications.pdf).",
    "sources": [
      "Toyota_Camry_Specifications.pdf",
      "Toyota_Corolla_Specifications.pdf",
      "Toyota_Prius_Specifications.pdf"
    ],
    "retrieved_chunks": 5,
    "duration_sec": 5.664941,
    "status": "success",
    "run_id": "019bea9e-8

## Summary

### Key Points for Monitor Page Implementation:

1. **LangSmith Client Setup:**
   ```python
   from langsmith import Client
   client = Client(api_key=settings.langsmith_api_key)
   ```

2. **Fetch Document Processing:**
   ```python
   client.list_runs(
       project_name=settings.langsmith_project,
       filter='eq(name, "process_document")',
       limit=10,
       order_by="-start_time"
   )
   ```

3. **Fetch Query Executions:**
   ```python
   client.list_runs(
       project_name=settings.langsmith_project,
       filter='eq(name, "rag_query")',
       limit=10,
       order_by="-start_time"
   )
   ```

4. **Important Data Locations:**
   - Document filename: `run.outputs['output']['filename']`
   - Chunks created: `run.outputs['output']['chunks_created']`
   - Model name: `run.outputs['output']['model_name']`
   - Query question: `run.inputs['question']`
   - Query answer: `run.outputs['output']['answer']`
   - Sources: `run.outputs['output']['sources']`
   - Retrieved chunks: `run.outputs['output']['retrieved_chunks']`