# 🚀 **Complete TruLens + LangChain Demo with Interactive Dashboard**

## Features Demonstrated:
- ✅ LangChain RAG with In-Memory Vector Database
- ✅ TruLens Integration and Instrumentation
- ✅ Agentic LLM Evaluation
- ✅ Interactive Dashboard
- ✅ Local Server Deployment
- ✅ Advanced Feedback Functions

---

In [2]:
# 📦 INSTALLATION & IMPORTS
# Run this first if packages aren't installed:
# !pip install trulens-eval langchain langchain-openai langchain-chroma chromadb tiktoken

import os
import warnings
warnings.filterwarnings('ignore')

# Core imports
import pandas as pd
import numpy as np
from typing import List, Dict, Any

# LangChain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA
from langchain.agents import initialize_agent, Tool, AgentType
from langchain.memory import ConversationBufferMemory

# TruLens imports
from trulens.core import Feedback, Select, TruSession
from trulens.providers.openai import OpenAI as TruOpenAI
from trulens.apps.langchain import TruChain
from trulens.dashboard import run_dashboard

print("✅ All imports successful!")

ModuleNotFoundError: No module named 'trulens.apps.langchain'

In [3]:
# 🔑 SETUP ENVIRONMENT

# Set OpenAI API Key
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

# Initialize TruSession
tru = TruSession()
print("🦑 TruLens Session initialized")

# Reset database for clean demo (optional)
tru.reset_database()
print("🔄 Database reset for clean demo")

# Initialize OpenAI models
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.1)
embeddings = OpenAIEmbeddings()

print("✅ Environment setup complete!")

🦑 Initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `TruSession` to prevent this.
🦑 TruLens Session initialized


Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]

🔄 Database reset for clean demo
✅ Environment setup complete!





In [4]:
# 📚 CREATE ENHANCED KNOWLEDGE BASE

# Expanded knowledge base for better RAG demonstration
documents_text = [
    "Machine learning is a subset of artificial intelligence (AI) that enables computers to learn and improve from experience without being explicitly programmed. It involves algorithms that can identify patterns in data and make predictions or decisions based on those patterns.",
    
    "Deep learning is a specialized subset of machine learning that uses artificial neural networks with multiple layers (hence 'deep') to model and understand complex patterns in data. It's particularly effective for tasks like image recognition, natural language processing, and speech recognition.",
    
    "Natural Language Processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret, and manipulate human language. NLP combines computational linguistics with statistical, machine learning, and deep learning models to enable computers to process human language in a valuable way.",
    
    "Computer vision is a field of artificial intelligence that enables machines to interpret and make decisions based on visual information from the world. It involves techniques for acquiring, processing, analyzing, and understanding digital images or videos to extract meaningful information.",
    
    "Reinforcement learning is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward. The agent learns through trial and error, receiving feedback in the form of rewards or penalties for its actions.",
    
    "Artificial Intelligence (AI) is a broad field of computer science focused on creating machines capable of performing tasks that typically require human intelligence. This includes learning, reasoning, problem-solving, perception, and language understanding.",
    
    "Large Language Models (LLMs) are AI models trained on vast amounts of text data to understand and generate human-like text. Examples include GPT, BERT, and Claude. They can perform various tasks like text completion, translation, summarization, and question answering.",
    
    "Vector databases are specialized databases designed to store and query high-dimensional vectors efficiently. They're crucial for AI applications involving embeddings, similarity search, and retrieval-augmented generation (RAG) systems.",
    
    "Retrieval-Augmented Generation (RAG) is an AI technique that combines information retrieval with text generation. It retrieves relevant information from a knowledge base and uses it to generate more accurate and contextually relevant responses.",
    
    "AI agents are autonomous systems that can perceive their environment, make decisions, and take actions to achieve specific goals. They can be simple rule-based systems or complex learning agents that adapt their behavior based on experience."
]

# Create LangChain documents
documents = [
    Document(page_content=text, metadata={"source": f"doc_{i}", "topic": text.split()[0].lower()})
    for i, text in enumerate(documents_text)
]

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len
)

splits = text_splitter.split_documents(documents)
print(f"📄 Created {len(splits)} document chunks")

# Create in-memory vector store
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    collection_name="ai_knowledge_base"
)

print(f"🔍 Vector store created with {vectorstore._collection.count()} embeddings")
print("✅ Knowledge base setup complete!")

📄 Created 10 document chunks
🔍 Vector store created with 10 embeddings
✅ Knowledge base setup complete!


In [5]:
# 🤖 CREATE LANGCHAIN RAG SYSTEM

# Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Retrieve top 3 most relevant chunks
)

# Create RAG chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={
        "prompt": None  # Will use default prompt
    }
)

print("🔗 RAG Chain created successfully!")

# Test the RAG system
test_query = "What is the difference between machine learning and deep learning?"
test_result = rag_chain.invoke({"query": test_query})

print(f"\n🧪 Test Query: {test_query}")
print(f"📝 Answer: {test_result['result'][:200]}...")
print(f"📚 Sources: {len(test_result['source_documents'])} documents retrieved")
print("✅ RAG system working correctly!")

🔗 RAG Chain created successfully!

🧪 Test Query: What is the difference between machine learning and deep learning?
📝 Answer: Machine learning is a subset of artificial intelligence that involves algorithms that can identify patterns in data and make predictions or decisions based on those patterns. Deep learning, on the oth...
📚 Sources: 3 documents retrieved
✅ RAG system working correctly!


In [6]:
# 🎯 SETUP ADVANCED FEEDBACK FUNCTIONS

# Initialize TruLens OpenAI provider
provider = TruOpenAI()

# 1. Answer Relevance - How relevant is the answer to the question?
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name="Answer Relevance")
    .on_input_output()
)

# 2. Context Relevance - How relevant are the retrieved documents?
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(Select.RecordCalls.retrieve.rets.collect())
)

# 3. Groundedness - Is the answer grounded in the retrieved context?
f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
)

# 4. Custom Feedback: Comprehensiveness
def comprehensiveness_feedback(input_text: str, output_text: str) -> float:
    """Measure how comprehensive the answer is based on question complexity"""
    try:
        question_indicators = {
            'explain': 2.0, 'describe': 2.0, 'compare': 2.5, 'difference': 2.5,
            'how': 1.5, 'what': 1.0, 'why': 2.0, 'when': 1.0, 'where': 1.0
        }
        
        expected_length = 50  # Base expected length
        for indicator, multiplier in question_indicators.items():
            if indicator in input_text.lower():
                expected_length *= multiplier
                break
        
        actual_length = len(output_text.split())
        score = min(1.0, actual_length / expected_length)
        return max(0.1, score)  # Minimum score of 0.1
    except:
        return 0.5

f_comprehensiveness = (
    Feedback(comprehensiveness_feedback, name="Comprehensiveness")
    .on_input_output()
)

print("✅ All feedback functions created successfully!")
print(f"📊 Total feedback functions: 4")
print("   1. Answer Relevance (OpenAI)")
print("   2. Context Relevance (OpenAI)")
print("   3. Groundedness (OpenAI)")
print("   4. Comprehensiveness (Custom)")

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.retrieve.rets.collect() .
✅ In Groundedness, input source will be set to __record__.app.retrieve.rets.collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Comprehensiveness, input input_text will be set to __record__.main_input or `Select.RecordInput` .
✅ In Comprehensiveness, input output_text will be set to __record__.main_output or `Select.RecordOutput` .
✅ All feedback functions created successfully!
📊 Total feedback functions: 4
   1. Answer Relevance (OpenAI)
   2. Context Relevance (OpenAI)
   3. Groundedness (OpenAI)
   4. Comprehensiveness (Custom)


In [7]:
# 🔧 CREATE TRULENS-INSTRUMENTED RAG APP

# Create TruChain (TruLens wrapper for LangChain)
tru_rag = TruChain(
    rag_chain,
    app_name="Advanced_LangChain_RAG",
    app_version="v1.0",
    feedbacks=[
        f_answer_relevance,
        f_context_relevance,
        f_groundedness,
        f_comprehensiveness
    ]
)

print("🔗 TruChain created successfully!")
print(f"   App Name: Advanced_LangChain_RAG")
print(f"   Version: v1.0")
print(f"   Feedback Functions: 4")
print("✅ Ready for evaluation!")

NameError: name 'TruChain' is not defined

In [8]:
# 🧪 RUN COMPREHENSIVE EVALUATION

# Test questions covering different complexity levels
test_questions = [
    "What is machine learning?",
    "How does deep learning work and what makes it different from traditional machine learning?",
    "Compare natural language processing and computer vision in terms of their applications and techniques.",
    "Explain the concept of reinforcement learning and provide examples of its real-world applications.",
    "What are the key components of a RAG system and how do they work together?",
    "How do AI agents differ from traditional software programs?",
    "What role do vector databases play in modern AI applications?",
    "Describe the relationship between artificial intelligence, machine learning, and deep learning."
]

print(f"🚀 Starting evaluation with {len(test_questions)} questions...")
print("=" * 60)

results = []

for i, question in enumerate(test_questions, 1):
    print(f"\n📝 Question {i}/{len(test_questions)}: {question[:60]}...")
    
    try:
        # Run with TruLens instrumentation
        with tru_rag as recording:
            response = tru_rag.app.invoke({"query": question})
            
        answer = response['result']
        source_count = len(response.get('source_documents', []))
        
        print(f"✅ Answer ({len(answer.split())} words, {source_count} sources): {answer[:100]}...")
        
        results.append({
            'question': question,
            'answer': answer,
            'source_count': source_count
        })
        
    except Exception as e:
        print(f"❌ Error processing question: {e}")
        continue

print(f"\n🎉 Evaluation completed! {len(results)} questions processed.")
print("⏳ Feedback functions are running in background...")

🚀 Starting evaluation with 8 questions...

📝 Question 1/8: What is machine learning?...
❌ Error processing question: name 'tru_rag' is not defined

📝 Question 2/8: How does deep learning work and what makes it different from...
❌ Error processing question: name 'tru_rag' is not defined

📝 Question 3/8: Compare natural language processing and computer vision in t...
❌ Error processing question: name 'tru_rag' is not defined

📝 Question 4/8: Explain the concept of reinforcement learning and provide ex...
❌ Error processing question: name 'tru_rag' is not defined

📝 Question 5/8: What are the key components of a RAG system and how do they ...
❌ Error processing question: name 'tru_rag' is not defined

📝 Question 6/8: How do AI agents differ from traditional software programs?...
❌ Error processing question: name 'tru_rag' is not defined

📝 Question 7/8: What role do vector databases play in modern AI applications...
❌ Error processing question: name 'tru_rag' is not defined

📝 Question 8/

In [9]:
# 🤖 CREATE AGENTIC EVALUATION EXAMPLE

# Create an AI Agent for more complex interactions
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Define tools for the agent
def search_knowledge_base(query: str) -> str:
    """Search the knowledge base for relevant information"""
    try:
        result = rag_chain.invoke({"query": query})
        return f"Answer: {result['result']}\nSources: {len(result['source_documents'])} documents"
    except Exception as e:
        return f"Error searching knowledge base: {str(e)}"

def analyze_complexity(text: str) -> str:
    """Analyze the complexity of a given text or question"""
    word_count = len(text.split())
    complex_indicators = ['compare', 'analyze', 'explain', 'describe', 'evaluate']
    complexity_score = sum(1 for indicator in complex_indicators if indicator in text.lower())
    
    if complexity_score >= 2 or word_count > 20:
        return "High complexity question requiring detailed analysis"
    elif complexity_score == 1 or word_count > 10:
        return "Medium complexity question requiring explanation"
    else:
        return "Simple question requiring basic information"

# Create tools
tools = [
    Tool(
        name="KnowledgeSearch",
        func=search_knowledge_base,
        description="Search the AI knowledge base for information about artificial intelligence, machine learning, and related topics."
    ),
    Tool(
        name="ComplexityAnalyzer",
        func=analyze_complexity,
        description="Analyze the complexity level of a question or text to determine the appropriate response depth."
    )
]

# Create agent
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    verbose=True,
    handle_parsing_errors=True
)

print("🤖 AI Agent created successfully!")
print(f"🔧 Tools available: {len(tools)}")
for tool in tools:
    print(f"   - {tool.name}: {tool.description[:50]}...")

# Create TruLens wrapper for the agent
tru_agent = TruChain(
    agent,
    app_name="AI_Knowledge_Agent",
    app_version="v1.0",
    feedbacks=[
        f_answer_relevance,
        f_comprehensiveness
    ]
)

print("✅ TruLens-instrumented Agent ready!")

🤖 AI Agent created successfully!
🔧 Tools available: 2
   - KnowledgeSearch: Search the AI knowledge base for information about...
   - ComplexityAnalyzer: Analyze the complexity level of a question or text...


  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
  agent = initialize_agent(


NameError: name 'TruChain' is not defined

In [10]:
# 🎭 TEST AGENTIC INTERACTIONS

agent_queries = [
    "I'm new to AI. Can you help me understand what machine learning is and how complex it is to learn?",
    "I want to build a chatbot. What AI technologies should I consider and what are the trade-offs?",
    "Compare the learning approaches: supervised, unsupervised, and reinforcement learning. Which is best for recommendation systems?"
]

print("🎭 Testing Agentic Interactions...")
print("=" * 50)

for i, query in enumerate(agent_queries, 1):
    print(f"\n🗣️ User Query {i}: {query}")
    print("-" * 40)
    
    try:
        with tru_agent as recording:
            response = tru_agent.app.run(input=query)
        
        print(f"🤖 Agent Response: {response[:200]}...")
        
    except Exception as e:
        print(f"❌ Error in agent interaction: {e}")

print("\n✅ Agentic evaluation completed!")

🎭 Testing Agentic Interactions...

🗣️ User Query 1: I'm new to AI. Can you help me understand what machine learning is and how complex it is to learn?
----------------------------------------
❌ Error in agent interaction: name 'tru_agent' is not defined

🗣️ User Query 2: I want to build a chatbot. What AI technologies should I consider and what are the trade-offs?
----------------------------------------
❌ Error in agent interaction: name 'tru_agent' is not defined

🗣️ User Query 3: Compare the learning approaches: supervised, unsupervised, and reinforcement learning. Which is best for recommendation systems?
----------------------------------------
❌ Error in agent interaction: name 'tru_agent' is not defined

✅ Agentic evaluation completed!


In [11]:
# 📊 ANALYZE RESULTS AND PREPARE DASHBOARD

import time
print("⏳ Waiting for feedback evaluation to complete...")
time.sleep(10)  # Allow time for feedback processing

# Get all records and feedback
records, feedback = tru.get_records_and_feedback()

print(f"\n📈 EVALUATION SUMMARY")
print("=" * 40)
print(f"Total Records: {len(records)}")
print(f"Total Apps: {records['app_name'].nunique() if not records.empty else 0}")

if not records.empty:
    # Display apps
    apps = records['app_name'].unique()
    for app in apps:
        app_records = records[records['app_name'] == app]
        print(f"\n📱 {app}: {len(app_records)} records")
        
        # Show feedback scores if available
        feedback_cols = [col for col in app_records.columns 
                        if col in ['Answer Relevance', 'Context Relevance', 'Groundedness', 
                                 'Comprehensiveness']]
        
        if feedback_cols:
            print("   Feedback Scores:")
            for col in feedback_cols:
                scores = app_records[col].dropna()
                if len(scores) > 0:
                    avg_score = scores.mean()
                    print(f"     {col}: {avg_score:.3f} (avg of {len(scores)} evaluations)")
        
        # Show sample interaction
        if 'input' in app_records.columns and 'output' in app_records.columns:
            sample = app_records.iloc[0]
            print(f"   Sample Q: {str(sample['input'])[:60]}...")
            print(f"   Sample A: {str(sample['output'])[:80]}...")

print("\n✅ Analysis complete! Ready for dashboard.")

⏳ Waiting for feedback evaluation to complete...

📈 EVALUATION SUMMARY
Total Records: 0
Total Apps: 0

✅ Analysis complete! Ready for dashboard.


In [12]:
# 🚀 START TRULENS DASHBOARD

print("🚀 Starting TruLens Interactive Dashboard...")
print("=" * 50)
print("\n📋 Dashboard Features:")
print("✅ Interactive evaluation results")
print("✅ Real-time feedback scores")
print("✅ Comparative analysis between apps")
print("✅ Detailed trace inspection")
print("✅ Export capabilities")

print("\n🌐 Starting dashboard server...")
print("📱 The dashboard will open in your browser automatically")
print("🔗 Manual access: http://localhost:8501")
print("\n⚠️ Keep this cell running to maintain the dashboard server")
print("🛑 To stop: Interrupt the kernel or restart")

# Start the dashboard
run_dashboard(tru, port=8501, host="localhost")

🚀 Starting TruLens Interactive Dashboard...

📋 Dashboard Features:
✅ Interactive evaluation results
✅ Real-time feedback scores
✅ Comparative analysis between apps
✅ Detailed trace inspection
✅ Export capabilities

🌐 Starting dashboard server...
📱 The dashboard will open in your browser automatically
🔗 Manual access: http://localhost:8501

⚠️ Keep this cell running to maintain the dashboard server
🛑 To stop: Interrupt the kernel or restart


NameError: name 'run_dashboard' is not defined

## 🎯 **Dashboard Navigation Guide**

### **Main Sections:**

1. **📊 Leaderboard**
   - Compare all your applications
   - View aggregated feedback scores
   - Identify best performing models

2. **📱 Applications**
   - Detailed view of each app
   - Individual record inspection
   - Performance over time

3. **🔍 Records**
   - Browse all evaluation records
   - Filter by app, feedback scores, time
   - Export data for further analysis

4. **🎯 Feedback**
   - Configure feedback functions
   - View feedback definitions
   - Monitor evaluation status

### **Key Features:**
- **Interactive Charts**: Click and explore your data
- **Detailed Traces**: See exactly how your app processes each request
- **Comparative Analysis**: Compare different versions and configurations
- **Export Options**: Download results for presentations or reports

---

## 🎉 **COMPLETE DEMO SUMMARY**

✅ **Implemented Features:**
   - 🔗 LangChain integration with RAG pipeline
   - 🗄️ In-memory Chroma vector database
   - 🤖 AI Agent with multiple tools
   - 📊 4 comprehensive feedback functions
   - 🌐 Interactive TruLens dashboard
   - 🔧 Local server deployment

📈 **Evaluation Metrics:**
   - Answer Relevance (OpenAI-powered)
   - Context Relevance (retrieval quality)
   - Groundedness (factual accuracy)
   - Comprehensiveness (response depth)

🚀 **Use Cases Demonstrated:**
   - 📚 RAG system evaluation
   - 🤖 Agentic AI assessment
   - 📊 Comparative analysis
   - 🔍 Detailed tracing and debugging
   - 💼 Production readiness monitoring

🎯 **Manager Presentation Points:**
   - ✅ Comprehensive AI evaluation framework
   - ✅ Production-ready monitoring solution
   - ✅ Interactive dashboard for stakeholders
   - ✅ Quantitative quality metrics
   - ✅ Scalable architecture for enterprise use