# AutoAI: Expert system for Automated AI design & scaling - powered by knowledge graphs and RAG.

- **Author: Shridhar Kini** ([Profile](https://www.linkedin.com/in/shridhar-kini-79911249?utm_source=share&utm_campaign=share_via&utm_content=profile&utm_medium=android_app))
- **Note**: This notebook is designed to run in a Jupyter environment with access to the necessary libraries of AIOS v1 like `aios_instance`, `aios_transformers` and `Weaviate vector database`.
- **To Securely Run**: `jupyter notebook password` to generate onetime password for secure access
- **To Run**: `jupyter notebook --allow-root  --port 9999 --ip=0.0.0.0`
- **To Clear Outputs**: Use `jupyter nbconvert --clear-output --inplace RAG_ScaleLayout_AppLayout_Demo.ipynb`


This notebook demonstrates how **Retrieval-Augmented Generation (RAG)** is used to power two critical components of our AI vision pipeline system:

---

## 🎭 **Introduction to AppLayout and ScaleLayout**

---

### 🏗️ **AppLayout: The Logical Blueprint of ensemble AI** 💡

<div align="center">

```
🎯 VISION → 🧩 DECOMPOSITION → 📐 BLUEPRINT → ⚙️ EXECUTION

```

</div>


**AppLayout** is the **logical blueprint** and **architectural DNA** of an ensemble AI and AI application. It's the mastermind that meticulously deconstructs complex, high level AI use cases into composable AI workflows of smaller, interconnected, and reusable AI blocks. To know more about AppLayout, visit ([AppLayout](https://aiosdocs.pages.dev/getting-started/concepts/#app-layout))

#### 🍳 **Think of it as a Master Chef's Recipe:**
- 📝 Each **unit/block** = Individual intelligent or non intelligent module (AI model, algorithm, policy, code block)
- 🔗 **DAG Structure** = Compositon recipe of an ensemble - can be used to create permutation and combination of AIs 
- ⚡ **Flow** = sequence or order of execution - sequential or async or conditional or cyclic or acyclic etc.
- 🎓 **Domain Expertise** = Years of expertise & experience distilled into automation - a miniature AI twin of an AI expert.

> 💫 *"From high-level specifications to automated AI magic!"*

---

### 🚀 **ScaleLayout: The Strategic Deployment Commander** ⚖️

<div align="center">

```
🏗️ BLUEPRINT → 🌐 INFRASTRUCTURE → 🎯 OPTIMIZATION → 📈 DEPLOYMENT

```

</div>

**ScaleLayout** is the **intelligent deployment strategist** and **operational mastermind** for AppLayout execution. If AppLayout answers *"what and how"*, ScaleLayout conquers *"where and how efficiently"*. To know more about AppLayout, visit ([ScaleLayout](https://aiosdocs.pages.dev/getting-started/concepts/#scale-layout))

#### 🎯 **The Master Deployment Plan:**
- 🗺️ **Strategic Mapping** = Map of Processing units to Hardware infrastructure
- ⚡ **Performance Goals** = Minimize network traffic, eliminate bottlenecks
- 💰 **Resource Efficiency** = Zero CPU/GPU cycles or memory bytes wasted
- 🤖 **Smart Scheduling** = efficient deployment

> 🌟 *"Convert AI designs into real-world scale, high-performing reality!"*

---

### 🤝 **The Perfect Partnership**

<div align="center">

| 🏗️ **AppLayout** | ⚖️ **ScaleLayout** |
|:---:|:---:|
| 🧠 **WHAT** to process | 🌐 **WHERE** to deploy |
| ⚙️ **HOW** to structure | 📈 **HOW** to optimize |
| 📐 Logical Architecture | 🚀 Physical Execution |
| 🎯 Functional Design | ⚡ Performance Reality |

</div>

---

## 🎯 Scope of AppLayout and ScaleLayout Automation
- Traditionally, the creation higher level AI or ensemble AI meant handcrafting the ensembles and large high skills teams to meticulously plan, operate and mange deployments - a completely manual process - that doesnt scale. AppLayouts and ScaleLayouts allowed automation of most parts of the process - making it scalable and efficient. What about automating the app layout and scale layout itself 100%, that puts us in the territory of skynet - an AI that can compose itself and scale and adapt infintely to available resources. 
- Without autoAI 
  - It would require a domain expert to first architect the application's logical blueprint and then devise an end to end optimal deployment strategy - simpler for small scale and beyond complexity for large to planetary scale. 
  - These plans would then be handed off to the AIOS for execution. 
- In this demonstration, we unveil a revolutionary shift. 
  - We will showcase how an expert system powered by Retrieval-Augmented Generation (RAG) system can completely automate this workflow, dynamically generating and deploying the entire application structure in direct response to a user's query. 

### **AppLayout**: DAG Structure Creation for Use Cases
- **Purpose**: Creates Directed Acyclic Graph (DAG) structures for user-queried use cases. Applayout automatically designs processing pipelines based on requirements with the help of knowledge from our Deployment Knowledge Base.
- **Function**: Automatically designs processing pipelines based on requirements
- **RAG Role**: Retrieves relevant pipeline configurations, block specifications, and architectural patterns

### **ScaleLayout**: Deployment Planning & Hardware Optimization
- **Purpose**: Plans deployment across servers with intelligent hardware allocation
- **Function**: Optimizes resource distribution, handles multi-camera scenarios
- **RAG Role**: Retrieves deployment patterns, resource requirements, and optimization strategies

---

## 🔄 How RAG Enhances Both Systems

When multiple cameras are provided simultaneously, our RAG system enables:
- **Unified Planning**: All use case data available at once for optimal resource allocation
- **Context-Aware Decisions**: Historical deployment patterns inform current planning
- **Intelligent Optimization**: Cross-camera resource sharing and load balancing

---

## 📚 Knowledge Base Structure

Our RAG system indexes various types of documentation:

### AppLayout Specific Knowledge:
- **📋 Model Cards**: AI model specifications and requirements
- **🔧 Policy Cards**: Deployment and resource management policies
- **📊 Parameter Tuning**: Optimization guidelines and best practices
- **🏗️ Pipeline Compositions**: Pre-built pipeline architectures
- **💾 Input/Output Format Cards**: IO Data format specifications
- **🎯 Use Case Cards**: Specific application scenarios and configurations
- **🔗 Chaining Blueprints**: Inter-block communication patterns

### ScaleLayout Specific Knowledge:
- **💻 Hardware specifications and limits**
- **🎯 Resource allocation strategies**
- **📹 Multi-camera deployment patterns**
- **⚡ Performance optimization techniques**
- **📊 Streaming resource estimation**
- **🔧 Resource estimation for each component in pipeline deployment**

---

##  Prerequisites

Build the docker image for the RAG system with all necessary components:
- **Note**: Run the below two scripts outside of the notebook environment to ensure proper execution.

In [None]:
```bash
%%bash
# Install required packages
pip install weaviate-client llama-index llama-index-embeddings-openai llama-index-llms-openai google-generativeai

echo "✅ Packages installed successfully!"
```

```bash
docker build -t rag-scale-layout-app-layout .
```

### 🛠️ Setting Up the RAG Indexer docker

In [None]:
```bash
CUR_DIR=$(dirname "$(realpath "$0")")

dockertransformerimagename="rag-scale-layout-app-layout"
container_name="rag_weaviate_test_container"


docker run -d\
 --network=host \
 -v /home/ubuntu/models:/home/ubuntu/models \
 -v $CUR_DIR:$CUR_DIR \
 --gpus='"device=3"' \
 --env="BLOCK_ID=hello-001" \
 --env="BLOCKS_DB_URI=http://MANAGEMENTMASTER:30100" \
 --name=$container_name \
 --entrypoint /bin/bash \
  $dockertransformerimagename


---

## 🛠️ Setting Up the RAG Indexer

Let's start by importing the necessary components and setting up our test environment.
### **Process Flow**:
1. Setup Weaviate vector database for storing indexed data
2. Phase-1: Index the knowledge base with all relevant documentation of AppLayout and ScaleLayout.
3. Initialize Indexing with proper settings and configurations like embedding models, chunks size, chunk overlap, embedding dimensions, similarity threshold etc.
4. Start the indexing process to populate the Weaviate database with the knowledge base content.
5. Phase-2: Retrieve the indexed data using RAG to power AppLayout and ScaleLayout systems by providing suitable queries as chat messages.
   - Retrieve relevant pipeline configurations, block specifications, and architectural patterns for AppLayout first
    - With the retrieved pipeline for AppLayout, retrieve the deployment planning of cameras vs usecase matrix for ScaleLayout iteratively if not satisfied with LLM response at any point.

---

## Block Diagram
![RAG System Block Diagram](RAG_system_overview_s_2.jpg)

---

## ⚙️ Setup Weaviate
We will use Weaviate as our vector database to store and retrieve the indexed knowledge base content. 
- To Set up the Weaviate vector database, we will use the docker-compose file provided in the `weaveateSetup` directory.
- Run the script `create_folders.sh` to create the necessary directories for Weaviate setup.
- Run the script `run_docker_compose.sh` to start the Weaviate server. 
- To down the Weaviate server, we will use the script `down_weaveate.sh` to stop the Weaviate server. 
- To clean up the weaviate server, down the docker and clean the `weaveateSetup/weaviate_data` directory so that you can recreate database from scratch if needed.
- **Note**: Run these scripts outside of the notebook environment to ensure proper execution.

---

## 🎯 Phase-1: Indexing

### 🔐 API Keys Setup

To use the RAG system, you'll need API keys for the language models:

#### **🤖 OpenAI API Key** (Required)
- **Purpose**: Used for text embeddings and GPT chat completions
- **Get it from**: [OpenAI Platform](https://platform.openai.com/api-keys)
- **Format**: Starts with `sk-`

#### **💎 Gemini API Key** (Required)  
- **Purpose**: Alternative LLM for chat completions and model diversity
- **Get it from**: [Google AI Studio](https://makersuite.google.com/app/apikey)
- **Format**: Usually a long alphanumeric string

#### **⚙️ Setup Methods**
Choose one of the following methods to configure your API keys:

In [None]:
%%bash
#!/bin/bash

# Function to safely prompt for API key
get_api_key() {
    local service_name=$1
    local env_var_name=$2
    
    echo "🔑 Setting up $service_name API key..."
    echo "Please enter your $service_name API key:"
    read -s api_key
    
    if [ -z "$api_key" ]; then
        echo "❌ No API key entered for $service_name"
        return 1
    fi
    
    export $env_var_name="$api_key"
    echo "✅ $service_name API key set successfully"
    return 0
}

# Create persistent environment setup script
cat > setup_env.sh << 'EOF'
#!/bin/bash
# RAG System Environment Setup
# Replace the placeholder values below with your actual API keys

# OpenAI API Key (required for embeddings and chat)
export OPENAI_API_KEY="your-openai-api-key-here"

# Gemini API Key (alternative LLM option)
export GEMINI_API_KEY="your-gemini-api-key-here"

# Verify the keys are set
if [ ! -z "$OPENAI_API_KEY" ] && [ "$OPENAI_API_KEY" != "your-openai-api-key-here" ]; then
    echo "✅ OPENAI_API_KEY: ${OPENAI_API_KEY:0:10}..."
else
    echo "⚠️  Please set your actual OPENAI_API_KEY in setup_env.sh"
fi

if [ ! -z "$GEMINI_API_KEY" ] && [ "$GEMINI_API_KEY" != "your-gemini-api-key-here" ]; then
    echo "✅ GEMINI_API_KEY: ${GEMINI_API_KEY:0:10}..."
else
    echo "⚠️  Please set your actual GEMINI_API_KEY in setup_env.sh"
fi
EOF

chmod +x setup_env.sh

echo "📝 Created setup_env.sh file with template API keys"
echo ""
echo "🔧 SETUP OPTIONS:"
echo "1. Interactive setup (recommended for first time):"
echo "   Run the cells below to enter your API keys interactively"
echo ""
echo "2. Manual setup:"
echo "   Edit setup_env.sh file and replace the placeholder keys with your actual keys"
echo "   Then run: source setup_env.sh"
echo ""
echo "💡 For security, the setup_env.sh file should not be committed to version control"

# Check if we should do interactive setup
if [ "$INTERACTIVE_SETUP" = "true" ]; then
    echo ""
    echo "🚀 Starting interactive API key setup..."
    
    # Set up OpenAI API Key
    if get_api_key "OpenAI" "OPENAI_API_KEY"; then
        echo "export OPENAI_API_KEY=\"$OPENAI_API_KEY\"" >> .env_temp
    fi
    
    # Set up Gemini API Key  
    if get_api_key "Gemini" "GEMINI_API_KEY"; then
        echo "export GEMINI_API_KEY=\"$GEMINI_API_KEY\"" >> .env_temp
    fi
    
    if [ -f .env_temp ]; then
        source .env_temp
        rm .env_temp
        echo "🎉 API keys have been set for this session!"
    fi
else
    echo "📋 To enable interactive setup, set INTERACTIVE_SETUP=true before running this cell"
fi


#### ⚠️ **IMPORTANT** Don't forget to load the environment variables from the `setup_env.sh` file before running the indexing code or chat retriever code.

In [22]:
# After editing setup_env.sh with your actual API keys, run this to load them:
import os
import re

def load_env_from_sh(filename="setup_env.sh"):
    with open(filename) as f:
        for line in f:
            # Match lines like: export VAR="value"
            match = re.match(r'export (\w+)=(.*)', line)
            if match:
                key, val = match.groups()
                val = val.strip().strip('"').strip("'")
                os.environ[key] = val

load_env_from_sh()

See if environment variables are correctly accessible in the notebook environment.

In [23]:
# Verify environment variables are loaded in Python
import os

def check_api_keys():
    """Check if API keys are properly set and provide helpful guidance"""
    print("🔍 Checking API Key Configuration...")
    print("=" * 50)
    
    openai_key = os.environ.get("OPENAI_API_KEY")
    gemini_key = os.environ.get("GEMINI_API_KEY")
    
    # Check OpenAI API Key
    if openai_key and not openai_key.startswith("sk-your-"):
        print(f"✅ OPENAI_API_KEY is properly set: {openai_key[:10]}...{openai_key[-4:]}")
        openai_valid = True
    elif openai_key:
        print("⚠️  OPENAI_API_KEY is set but appears to be a placeholder")
        print("   Please replace with your actual OpenAI API key")
        openai_valid = False
    else:
        print("❌ OPENAI_API_KEY is not set")
        openai_valid = False
    
    # Check Gemini API Key
    if gemini_key and not gemini_key.startswith("your-gemini"):
        print(f"✅ GEMINI_API_KEY is properly set: {gemini_key[:10]}...{gemini_key[-4:]}")
        gemini_valid = True
    elif gemini_key:
        print("⚠️  GEMINI_API_KEY is set but appears to be a placeholder")
        print("   Please replace with your actual Gemini API key")
        gemini_valid = False
    else:
        print("❌ GEMINI_API_KEY is not set")
        gemini_valid = False
    
    print("\n" + "=" * 50)
    
    if openai_valid and gemini_valid:
        print("🎉 All API keys are properly configured!")
        print("✅ RAG system is ready to use both OpenAI and Gemini")
        return True
    elif openai_valid:
        print("⚡ OpenAI API key is ready - RAG system can use OpenAI models")
        print("💡 Gemini is optional but recommended for model diversity")
        return True
    elif gemini_valid:
        print("⚡ Gemini API key is ready - RAG system can use Gemini models")
        print("💡 OpenAI API key is highly recommended for embeddings")
        return True
    else:
        print("🚨 No valid API keys found!")
        print("\n📋 Setup Options:")
        print("1. Run the interactive Python setup above: setup_api_keys_interactive()")
        print("2. Edit setup_env.sh and run: source setup_env.sh")
        print("3. Set environment variables manually:")
        print("   export OPENAI_API_KEY='your-key-here'")
        print("   export GEMINI_API_KEY='your-key-here'")
        return False

# Check API keys and provide guidance
api_keys_ready = check_api_keys()

if api_keys_ready:
    print("\n🚀 Ready to proceed with RAG indexing!")
else:
    print("\n⏸️  Please set up API keys before continuing with the RAG system")

🔍 Checking API Key Configuration...
✅ OPENAI_API_KEY is properly set: sk-proj-k9...4eEA
✅ GEMINI_API_KEY is properly set: AIzaSyDZ9j...Ghgg

🎉 All API keys are properly configured!
✅ RAG system is ready to use both OpenAI and Gemini

🚀 Ready to proceed with RAG indexing!


---

### 🚀  Import the modules needed for Indexing the knowledge base content.

In [24]:
# Import the RAG indexing components
from block_indexer import IndexDocumentsBlock
from aios_instance import TestContext, BlockTester
import os
import shutil

print("✅ RAG Indexer components imported successfully!")
print("📍 Ready to process ScaleLayout and AppLayout knowledge base")

✅ RAG Indexer components imported successfully!
📍 Ready to process ScaleLayout and AppLayout knowledge base


### 📁 Configuring the Knowledge Base Directory

We'll point our indexer to the knowledge base containing all the documentation for ScaleLayout and AppLayout systems.

In [25]:
# Configure directories for RAG knowledge base
# This directory contains all ScaleLayout and AppLayout documentation
KNOWLEDGE_BASE_DIRECTORY = "knowledge_base"  # Contains model cards, policy cards, use case documentation
os.makedirs(KNOWLEDGE_BASE_DIRECTORY, exist_ok=True)

# Output directory for processed data
OUTPUT_DIR = "tests/output"
PASSAGES_JSON = os.path.join(OUTPUT_DIR, "psgs_w100.jsonl")  # Processed text chunks
INDEX_PATH = os.path.join(OUTPUT_DIR, "psgs_w100.index")     # Vector index for retrieval

# Clean and recreate output directory
shutil.rmtree(OUTPUT_DIR, ignore_errors=True)
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"📂 Knowledge base directory: {KNOWLEDGE_BASE_DIRECTORY}")
print(f"📄 Processed passages will be saved to: {PASSAGES_JSON}")
print(f"🔍 Vector index will be created at: {INDEX_PATH}")

📂 Knowledge base directory: knowledge_base
📄 Processed passages will be saved to: tests/output/psgs_w100.jsonl
🔍 Vector index will be created at: tests/output/psgs_w100.index




### ⚙️ RAG Configuration for ScaleLayout/AppLayout

#### Key Configuration Decisions:

**🧠 Embedding Model**: Using OpenAI's `text-embedding-3-large` for high-quality semantic understanding
- **Why**: Superior performance on technical documentation and multi-domain knowledge
- **Dimension**: 1024 for rich semantic representation

**📝 Chunking Strategy**: 
- **Chunk Size**: 200 tokens (optimal for technical documentation)
- **Overlap**: 50 tokens (ensures context continuity across chunks)
- **Filename Prefix**: Enabled (preserves source document information)

**🎯 Similarity Threshold**: 0.4 (enables broader context retrieval for complex queries)

In [26]:
# RAG system configuration optimized for ScaleLayout/AppLayout
EMBEDDING_DIM = 1024  # High-dimensional embeddings for technical content

# Create AIOS specific test context with optimized settings
context = TestContext()

# Configure the knowledge base and storage
context.block_init_data = {
    "repo_dir": KNOWLEDGE_BASE_DIRECTORY,
    "passages_json": PASSAGES_JSON,
    "index_path": INDEX_PATH,
    
    # Embedding configuration for technical documentation
    "embed_model": "openai/text-embedding-3-large",  # High-quality embeddings
    
    # Vector database configuration (Weaviate)
    "client_type": "weaviate",
    "client_config": {
        "uri": "http://localhost:8080",
        "user": "coguser1",
        "password": "sdf345BJw44HMy",
        "dim": EMBEDDING_DIM
    }
}

# Text processing parameters optimized for technical content
context.block_init_parameters = {
    # Chunking strategy for technical documentation
    "chunk_size": 200,              # Optimal size for technical concepts
    "chunk_overlap": 50,            # Ensures context continuity
    "max_length": EMBEDDING_DIM,    # Match embedding dimensions
    
    # Retrieval optimization
    "similarity_threshold": 0.4,    # Broader context for complex queries
    "include_filename_prefix": True  # Preserve source information
}

print("⚙️ RAG Configuration Summary:")
print(f"   🧠 Embedding Model: {context.block_init_data['embed_model']}")
print(f"   📝 Chunk Size: {context.block_init_parameters['chunk_size']} tokens")
print(f"   🔄 Chunk Overlap: {context.block_init_parameters['chunk_overlap']} tokens")
print(f"   🎯 Similarity Threshold: {context.block_init_parameters['similarity_threshold']}")
print(f"   💾 Vector Database: {context.block_init_data['client_type'].upper()}")

⚙️ RAG Configuration Summary:
   🧠 Embedding Model: openai/text-embedding-3-large
   📝 Chunk Size: 200 tokens
   🔄 Chunk Overlap: 50 tokens
   🎯 Similarity Threshold: 0.4
   💾 Vector Database: WEAVIATE


### 🚀 Initialize RAG Indexer

Now we'll initialize the indexer and process all the ScaleLayout and AppLayout documentation.

In [27]:
# Initialize the RAG indexer with our configuration
tester = BlockTester.init_with_context(IndexDocumentsBlock, context)

# Reset any existing data to ensure clean indexing
print("🧹 Resetting existing data...")
tester.block_instance.management("reset", {})

print("✅ RAG Indexer initialized and ready!")
print("📚 About to process ScaleLayout and AppLayout knowledge base...")

                    authentication. Are you sure this is correct?
INFO:block_indexer:Initialized with chunk_size=200, chunk_overlap=50


🧹 Resetting existing data...
✅ RAG Indexer initialized and ready!
📚 About to process ScaleLayout and AppLayout knowledge base...


### 🔄 Processing the Knowledge Base and Run the Indexer

This step will: (Read `block_indexer.py` for more details)
1. **📖 Extract text** from all documentation files (PDFs, Markdown, Json etc.)
2. **✂️ Chunk the content** into semantic segments with overlap
3. **🧠 Generate embeddings** using OpenAI's advanced model
4. **🕸️ Build a graph** connecting related concepts (Graph RAG). 
    - Chunk from same file will be connected with weight 1.0 
    - Chunks across file will be connected with weight of cosine similarity
    - Each chunk will be stored as a Node in Weaviate with Text, Embedding, Chunk ID, and File Name details.
    - Chunk IDs will start with zero (-1 is reserved for summary node)
    - Initial graph is constructed with the help of networkx.Graph library.
    - Sequential edges refferes to Chunks of same file linking.
    - Similarity edges refferes to Chunks of different file linking with cosine similarity
    - Property name with `from_node` and `to_node` are used to store the edges in Weaviate.
5. **💾 Store everything(Graph,Node,Edges)** in Weaviate for fast retrieval
6. **📝 (Optional) Enable Summary Node (`chunk_id = -1`)**:  
    - A summary node is generated by an LLM to condense an entire document into a single, comprehensive node.  
    - This provides the LLM with full document context for advanced reasoning.  
    - *Note: This feature is currently commented out in the code to avoid excessive context length.*



In [28]:
# Run the indexing process - this creates our RAG knowledge base
print("🚀 Starting RAG knowledge base creation...")
print("📊 This will process all ScaleLayout and AppLayout documentation")
print("⏳ Please wait while we build the semantic knowledge graph...\n")

# Execute the indexing
results = tester.run({})  # This triggers the on_data method to build the index

print("\n" + "="*60)
print("📈 INDEXING RESULTS")
print("="*60)
print("Indexer output:", results)
print("="*60)

INFO:block_indexer:File: README_input_format_cards.md - Created 3 chunks with 50 token overlap


🚀 Starting RAG knowledge base creation...
📊 This will process all ScaleLayout and AppLayout documentation
⏳ Please wait while we build the semantic knowledge graph...



INFO:block_indexer:File: DOCUMENTATION_INDEX.md - Created 10 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/pod_metrics.xlsx - Created 20 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/NodeAndGPUAssignementRules.md - Created 2 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/UsecaseGroupingRules.md - Created 3 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/pod_gpumemory_and_gpuutility.csv - Created 7 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/pod_gpumemory_and_gpuutility.md - Created 10 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/SetCreationRules.md - Created 3 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/query_vcpu_ram_of_pod.py - Created 3 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/ScaleLayoutRAGQueries.md - Created 2 chunks with 50 token overlap
INFO:block_indexer:File: ScaleLayout/UsecaseVsNumberOfInputFrames.md - Created


📈 INDEXING RESULTS
Indexer output: [{'message': 'Indexed 406 passages with 50 token overlap.\nBuilt graph with 227 sequential edges and 14212 similarity edges.'}]


## 🎯 Phase-2

### **🏗️ AppLayout: DAG Creation Process**

When a user queries for a specific use case (e.g., "Monitor stationary vehicles in Indian road scenario" or "Person detection with face recognition"):

1. **🔍 Query Understanding**: RAG retrieves relevant use case cards and pipeline blueprints
2. **🧩 Component Selection**: Identifies required AI models, preprocessing steps, and post-processing
3. **🔗 DAG Construction**: Creates optimal directed acyclic graph based on retrieved patterns
4. **⚡ Optimization**: Applies best practices from similar deployments

### **⚖️ ScaleLayout: Deployment Planning Process**

When multiple cameras need deployment across servers:

1. **📊 Resource Analysis**: RAG retrieves hardware specifications and capacity limits
2. **🎯 Input Grouping(Camera Grouping)**: Uses policy cards to determine optimal camera grouping based on set creation rules
3. **💻 Server Assignment**: Allocates resources based on historical performance data
4. **🔧 Load Balancing**: Applies optimization strategies from deployment experience

### **🤝 Synergy Between Systems**

- **Unified Context**: Both systems access the same knowledge base for consistent decisions
- **Cross-Optimization**: AppLayout DAG complexity informs ScaleLayout resource allocation
- **Model Sharing**: Multiple cameras processed together for model sharing optimization

## 🎓 Next Steps: Using the RAG System


### **🔄 Integration with Retrieval Block**:
The indexed knowledge will be used by the `block_retriever.py` component to:
- **Answer complex technical questions**
- **Provide deployment recommendations**
- **Generate optimized configurations**
- **Suggest alternative approaches**

- **Query Embedding** will be used to retrieve topk relevant chunks from the knowledge base(Sequential Node property is used).
- For every Sequential node, `edge_limit` number of Similarity edges will be traversed and relevant chunks will be retrieved from the knowledge base.
- **Reranker** will be used to get `reranking_topk` chunk from the retrieved chunks.
- This will be the context for the LLM to generate the response. 


## 💬 Interactive RAG Chat Interface

Now let's set up an interactive chat interface to query our knowledge base in real-time. This interface allows you to:

- **🎯 Ask specific questions** about AppLayout DAG creation
- **⚖️ Query ScaleLayout deployment strategies** 
- **🔍 Explore the knowledge base** interactively
- **📊 Get real-time answers** with source citations
- **💡 Test different query approaches** for optimization

### **Chat Interface Features**:
- **Context-aware responses** using RAG retrieval
- **Source citations** showing which documents informed the answer
- **Session management** with conversation history
- **Command support** (help, history, clear, quit)
- **Real-time processing** of complex technical queries

In [None]:
# Import the interactive chat components
from block_retriever import RagQAServiceBlock
import time
from datetime import datetime
from aios_instance import TestContext, BlockTester

# Setup RAG retriever for interactive chat
def setup_rag_chat_interface():
    """Setup the RAG retriever with CV pipeline knowledge"""
    
    # Create a unique session ID for this chat
    session_id = f"cv_pipeline_chat_{int(time.time())}"
    
    # Configure the model and generation parameters
    modelType = "gemini"  # Can be "gemini" or "openai"
    
    if modelType == "gemini":
        #llm_model = 'gemini-2.0-flash-exp'  # Fast and efficient for chat
        llm_model = 'gemini-2.5-pro'  # Fast and efficient for chat
        generation_kwargs = {
            "max_tokens": 16384,  # Increased for full passage processing
            "temperature": 0.2,
            "top_p": 0.95
        }
    elif modelType == "openai":
        llm_model = 'gpt-4o-mini'  # Cost-effective for chat interactions
        generation_kwargs = {
            "max_tokens": 16384,
            "temperature": 0.25,
            "top_p": 0.95,
            "frequency_penalty": 0.1,
            "presence_penalty": 0.05
        }
    
    # Configure the RAG block
    context = TestContext()
    context.block_init_data = {
        "weaviate_url": "http://localhost:8080",
        "node_class": "PassageNode", 
        "edge_class": "PassageEdge",
        "llm_model": llm_model,
        "embed_model": "openai/text-embedding-3-large",
    }
    
    # Set embedding dimension and reranking model
    EMBEDDING_DIM = 1024
    RERANKING_MODEL = "cross-encoder/ms-marco-MiniLM-L-12-v2"
    PASSAGES_JSON = ""
    
    # Configure RAG parameters
    context.block_init_parameters = {
        "topk": 200,              # Initial retrieval count
        "reranking_topk": 30,     # After reranking
        "max_length": EMBEDDING_DIM,
        "edge_limit": 50,
        "debug": False,
        "passages_json": PASSAGES_JSON,
        "similarity_threshold": 0.45,
        "auto_references": True,
        "generation_config": generation_kwargs,
        "reranking_model": RERANKING_MODEL
    }
    
    # Initialize the RAG block
    tester = BlockTester.init_with_context(RagQAServiceBlock, context)
    
    return tester, session_id

# Initialize the chat interface
print("🔧 Setting up RAG chat interface...")
chat_tester, chat_session_id = setup_rag_chat_interface()
print("✅ Chat interface ready!")
print(f"📱 Session ID: {chat_session_id}")

In [None]:
class NotebookChatInterface:
    """Interactive chat interface for Jupyter notebook"""
    
    def __init__(self, tester, session_id):
        self.tester = tester
        self.session_id = session_id
        self.conversation_history = []
        
        # Initialize chat session with system message
        self.tester.run({
            "mode": "chat",
            "session_id": self.session_id,
#             "system_message": """You are a specialized assistant for answering questions based on provided context. Your primary directive is to adhere strictly to the following rules:

# 1. **Context is King:** You must base your answers exclusively on the information present in the provided context chunks. Do not use any prior knowledge or external information.

# 2. **Cite Your Sources Precisely:** For every piece of information, data point, or decision step you take, you must cite its source. The citation must include **both the name of the source** (e.g., `pod_metrics`) **and the chunk number** it came from in brackets (e.g., `[4]`).

# 3. **No Assumptions:** If the context does not provide the necessary information to answer a question, state that the information is not available. Never ask the user to estimate or provide missing details.

# 4. **Topic-Specific Logic:**
#    * **For AppLayout Creation:** When a question is about creating an `AppLayout`, you must ignore any context or information related to `ScaleLayout`.
#    * **For Deployment Planning:** When a question is about deployment planning, you must follow the exact five-step process, providing precise citations for each step.
# """,
            "message": "Hello! I'm ready to help with your ScaleLayout and AppLayout questions."
        })
    
    def ask(self, question):
        """Ask a question and get a response from the RAG system"""
        try:
            # Process the query using RAG-enhanced chat
            result = self.tester.run({
                "mode": "rag-chat",
                "session_id": self.session_id,
                "message": question
            })
            
            if result and len(result) > 0:
                response = result[0].get("reply", "Sorry, I couldn't generate a response.")
            else:
                response = "Sorry, I couldn't process your query. Please try rephrasing."
            
            # Save to conversation history
            timestamp = datetime.now().strftime("%H:%M:%S")
            self.conversation_history.append({
                "timestamp": timestamp,
                "question": question,
                "response": response
            })
            
            return response
            
        except Exception as e:
            error_msg = f"Error processing query: {str(e)}"
            print(f"❌ {error_msg}")
            return error_msg
    
    def show_history(self):
        """Display conversation history"""
        if not self.conversation_history:
            print("No conversation history yet.")
            return
        
        print("📝 Conversation History")
        print("=" * 60)
        
        for i, entry in enumerate(self.conversation_history, 1):
            print(f"\n[{i}] {entry['timestamp']}")
            print(f"Q: {entry['question']}")
            print(f"A: {entry['response'][:200]}{'...' if len(entry['response']) > 200 else ''}")
        print("=" * 60)
    
    def clear_history(self):
        """Clear conversation history"""
        self.conversation_history = []
        print("✅ Conversation history cleared.")

# Create the chat interface
chat_interface = NotebookChatInterface(chat_tester, chat_session_id)
print("🎯 Chat interface created and ready for questions!")

### 🎯 How to Use the Chat Interface

The chat interface is now ready! You can ask questions in the following ways:

In [None]:
# 💬 Interactive RAG Chat - Unified AppLayout & ScaleLayout Assistant
print("💬 Interactive RAG Chat Assistant")
print("=" * 60)
print("Ask questions about AppLayout DAG creation OR ScaleLayout deployment planning.")
print("The system will automatically understand your question type and provide relevant answers.")
print()
print("🎯 Example Questions:")
print("   AppLayout: 'How do I create a DAG for person detection with face recognition?'")
print("   ScaleLayout: 'How should I deploy 15 cameras across 3 servers?'")
print("   General: 'What parameters should I use for low-light vehicle detection?'")
print("   Mixed: 'Plan deployment for retail analytics with face recognition'")
print()
print("Commands: Type 'quit', 'exit', or 'q' to stop | 'history' to view past questions | 'clear' to reset")
print("-" * 60)

while True:
    try:
        question = input("\n🎯 Your Question: ").strip()
        
        if not question:
            continue
            
        # Handle special commands
        if question.lower() in ['quit', 'exit', 'q']:
            print("👋 Thanks for using the RAG Chat Assistant! Goodbye!")
            break
            
        elif question.lower() == 'history':
            chat_interface.show_history()
            continue
            
        elif question.lower() == 'clear':
            chat_interface.clear_history()
            print("✅ Chat history cleared. Starting fresh!")
            continue
            
        # Process the question
        print("🤔 Processing your question...")
        response = chat_interface.ask(question)
        print(f"\n🤖 Assistant: {response}")
        print("\n" + "-" * 60)
        
    except KeyboardInterrupt:
        print("\n\n👋 Chat session interrupted. Goodbye!")
        break
    except Exception as e:
        print(f"\n❌ Error: {str(e)}")
        print("Please try again or type 'quit' to exit.")

### 🖥️ Command-Line Chat Interface

For those who prefer a command-line interactive experience, you can also run the standalone chat script:

In [None]:
# Run the standalone interactive chat script
# Uncomment the line below to run the full interactive terminal interface
# python test_retriever_interactive.py

echo "💡 To run the full interactive command-line interface:"
echo "   python test_retriever_interactive.py"
echo ""
echo "🎯 Features of the command-line interface:"
echo "   • Full conversation management"
echo "   • Command support (help, history, clear, quit)"
echo "   • Real-time streaming responses"
echo "   • Better for extended conversations"

## 🌐 Streamlit Web Chat Interface

For a more user-friendly web-based experience, you can also deploy the RAG chat system as a Streamlit web application. This provides:

- **🌐 Web-based Interface**: Beautiful, modern chat UI accessible via browser
- **📱 Mobile-Friendly**: Responsive design that works on all devices  
- **🎨 Rich Formatting**: Support for markdown, code blocks, and formatted responses
- **💾 Session Persistence**: Chat history maintained across browser sessions
- **🔄 Real-time Updates**: Live response streaming and progress indicators
- **👥 Multi-user Support**: Multiple users can access simultaneously

In [20]:
# Create a Streamlit web application for the RAG chat interface
streamlit_app_code = '''
import streamlit as st
import time
from datetime import datetime
import os
import sys
import logging

# Suppress verbose logging when running from notebook
logging.getLogger("streamlit").setLevel(logging.WARNING)
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("weaviate").setLevel(logging.WARNING)
logging.getLogger("openai").setLevel(logging.WARNING)

# Add the current directory to path to import our modules
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

from block_retriever import RagQAServiceBlock
from aios_instance import TestContext, BlockTester

# Configure Streamlit page
st.set_page_config(
    page_title="RAG Chat Assistant - AppLayout & ScaleLayout",
    page_icon="🤖",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Custom CSS for better styling
st.markdown("""
<style>
    .main-header {
        text-align: center;
        padding: 1rem 0;
        background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
        color: white;
        border-radius: 10px;
        margin-bottom: 2rem;
    }
    .chat-message {
        padding: 1rem;
        border-radius: 10px;
        margin: 1rem 0;
        box-shadow: 0 2px 4px rgba(0,0,0,0.1);
    }
    .user-message {
        background-color: #e3f2fd;
        border-left: 4px solid #2196f3;
    }
    .assistant-message {
        background-color: #f3e5f5;
        border-left: 4px solid #9c27b0;
    }
    .sidebar-content {
        background-color: #f8f9fa;
        padding: 1rem;
        border-radius: 10px;
    }
</style>
""", unsafe_allow_html=True)

class StreamlitChatInterface:
    """Streamlit-based chat interface for RAG system"""
    
    def __init__(self):
        self.setup_session_state()
        self.setup_rag_interface()
    
    def setup_session_state(self):
        """Initialize Streamlit session state variables"""
        if 'messages' not in st.session_state:
            st.session_state.messages = []
        if 'chat_interface' not in st.session_state:
            st.session_state.chat_interface = None
        if 'session_id' not in st.session_state:
            st.session_state.session_id = f"streamlit_chat_{int(time.time())}"
    
    def setup_rag_interface(self):
        """Setup the RAG retriever for Streamlit"""
        if st.session_state.chat_interface is None:
            with st.spinner("🔧 Setting up RAG chat interface..."):
                try:
                    # Configure the model and generation parameters
                    modelType = "gemini"  # Can be "gemini" or "openai"
                    llm_model = None
                    generation_kwargs = {}
                    if modelType == "gemini":
                        llm_model = 'gemini-2.5-pro'
                        generation_kwargs = {
                            "max_tokens": 16384,
                            "temperature": 0.2,
                            "top_p": 0.95
                        }
                    elif modelType == "openai":
                        llm_model = 'gpt-4o-mini'
                        generation_kwargs = {
                            "max_tokens": 16384,
                            "temperature": 0.25,
                            "top_p": 0.95,
                            "frequency_penalty": 0.1,
                            "presence_penalty": 0.05
                        }
                    
                    # Configure the RAG block
                    context = TestContext()
                    context.block_init_data = {
                        "weaviate_url": "http://localhost:8080",
                        "node_class": "PassageNode", 
                        "edge_class": "PassageEdge",
                        "llm_model": llm_model,
                        "embed_model": "openai/text-embedding-3-large",
                    }
                    
                    # Set embedding dimension and reranking model
                    EMBEDDING_DIM = 1024
                    RERANKING_MODEL = "cross-encoder/ms-marco-MiniLM-L-12-v2"
                    
                    # Configure RAG parameters
                    context.block_init_parameters = {
                        "topk": 200,
                        "reranking_topk": 30,
                        "max_length": EMBEDDING_DIM,
                        "edge_limit": 50,
                        "debug": False,
                        "passages_json": "",
                        "similarity_threshold": 0.45,
                        "auto_references": True,
                        "generation_config": generation_kwargs,
                        "reranking_model": RERANKING_MODEL
                    }
                    
                    # Initialize the RAG block
                    tester = BlockTester.init_with_context(RagQAServiceBlock, context)
                    
                    # Create chat interface using existing NotebookChatInterface logic
                    st.session_state.chat_interface = self.create_chat_interface(tester)
                    st.success("✅ RAG Chat Interface ready!")
                    
                except Exception as e:
                    st.error(f"❌ Error setting up chat interface: {str(e)}")
                    st.session_state.chat_interface = None
    
    def create_chat_interface(self, tester):
        """Create chat interface similar to NotebookChatInterface"""
        class StreamlitRAGInterface:
            def __init__(self, tester, session_id):
                self.tester = tester
                self.session_id = session_id
                
                # Initialize chat session with system message
                self.tester.run({
                    "mode": "chat",
                    "session_id": self.session_id,
#                     "system_message": \"\"\"You are a specialized assistant for answering questions based on provided context. Your primary directive is to adhere strictly to the following rules:

# 1. **Context is King:** You must base your answers exclusively on the information present in the provided context chunks. Do not use any prior knowledge or external information.

# 2. **Cite Your Sources Precisely:** For every piece of information, data point, or decision step you take, you must cite its source. The citation must include **both the name of the source** (e.g., `pod_metrics`) **and the chunk number** it came from in brackets (e.g., `[4]`).

# 3. **No Assumptions:** If the context does not provide the necessary information to answer a question, state that the information is not available. Never ask the user to estimate or provide missing details.

# 4. **Topic-Specific Logic:**
#    * **For AppLayout Creation:** When a question is about creating an `AppLayout`, you must ignore any context or information related to `ScaleLayout`.
#    * **For Deployment Planning:** When a question is about deployment planning, you must follow the exact five-step process, providing precise citations for each step.
# \"\"\",
                    "message": "Hello! I'm ready to help with your ScaleLayout and AppLayout questions."
                })
            
            def ask(self, question):
                \"\"\"Ask a question and get a response from the RAG system\"\"\"
                try:
                    result = self.tester.run({
                        "mode": "rag-chat",
                        "session_id": self.session_id,
                        "message": question
                    })
                    
                    if result and len(result) > 0:
                        return result[0].get("reply", "Sorry, I couldn't generate a response.")
                    else:
                        return "Sorry, I couldn't process your query. Please try rephrasing."
                        
                except Exception as e:
                    return f"Error processing query: {str(e)}"
        
        return StreamlitRAGInterface(tester, st.session_state.session_id)
    
    def render_sidebar(self):
        """Render the sidebar with information and controls"""
        with st.sidebar:
            st.markdown('<div class="sidebar-content">', unsafe_allow_html=True)
            
            st.markdown("### 🎯 RAG Chat Assistant")
            st.markdown("**AppLayout & ScaleLayout Expert**")
            
            st.markdown("---")
            
            st.markdown("### 📚 What can I help with?")
            st.markdown("""
            **🏗️ AppLayout Questions:**
            - DAG creation for use cases
            - Pipeline architecture design
            - Component selection
            - Parameter optimization
            
            **⚖️ ScaleLayout Questions:**
            - Deployment planning
            - Resource allocation
            - Camera grouping strategies
            - Hardware optimization
            
            **💡 General Questions:**
            - Best practices
            - Troubleshooting
            - Performance tuning
            """)
            
            st.markdown("---")
            
            # Chat controls
            if st.button("🗑️ Clear Chat History"):
                st.session_state.messages = []
                st.rerun()
            
            # Session info
            st.markdown("### 📊 Session Info")
            st.markdown(f"**Messages:** {len(st.session_state.messages)}")
            st.markdown(f"**Session ID:** `{st.session_state.session_id[:12]}...`")
            
            st.markdown('</div>', unsafe_allow_html=True)
    
    def render_chat_interface(self):
        """Render the main chat interface"""
        # Header
        st.markdown("""
        <div class="main-header">
            <h1>🤖 RAG Chat Assistant</h1>
            <p>Ask questions about AppLayout DAG creation or ScaleLayout deployment planning</p>
        </div>
        """, unsafe_allow_html=True)
        
        # Display chat messages
        for message in st.session_state.messages:
            message_class = "user-message" if message["role"] == "user" else "assistant-message"
            icon = "🎯" if message["role"] == "user" else "🤖"
            
            st.markdown(f"""
            <div class="chat-message {message_class}">
                <strong>{icon} {message["role"].title()}:</strong><br>
                {message["content"]}
            </div>
            """, unsafe_allow_html=True)
        
        # Chat input
        if prompt := st.chat_input("Ask about AppLayout or ScaleLayout..."):
            # Add user message
            st.session_state.messages.append({"role": "user", "content": prompt})
            
            # Display user message immediately
            st.markdown(f"""
            <div class="chat-message user-message">
                <strong>🎯 You:</strong><br>
                {prompt}
            </div>
            """, unsafe_allow_html=True)
            
            # Generate and display assistant response
            if st.session_state.chat_interface:
                with st.spinner("🤔 Processing your question..."):
                    response = st.session_state.chat_interface.ask(prompt)
                
                # Add assistant response
                st.session_state.messages.append({"role": "assistant", "content": response})
                
                # Display assistant response
                st.markdown(f"""
                <div class="chat-message assistant-message">
                    <strong>🤖 Assistant:</strong><br>
                    {response}
                </div>
                """, unsafe_allow_html=True)
                
                st.rerun()
            else:
                st.error("❌ Chat interface not available. Please refresh the page.")
    
    def run(self):
        """Main function to run the Streamlit app"""
        self.render_sidebar()
        self.render_chat_interface()

# Example usage section with instructions
st.markdown("### 🎯 Example Questions to Try:")

st.markdown("""
**🏗️ AppLayout Examples:**
- "How do I create a DAG for person detection with face recognition?"
- "What components are needed for vehicle counting in a parking lot?"
- "Design a pipeline for retail analytics with privacy compliance"

**⚖️ ScaleLayout Examples:**
- "How should I deploy 15 cameras across 3 servers for mixed use cases?"
- "What's the optimal camera grouping strategy for resource efficiency?"
- "Plan deployment for 50 cameras with different FPS requirements"

**💡 Mixed Questions:**
- "Plan deployment for retail analytics with face recognition across 5 servers"
- "Create DAG and deployment plan for vehicle counting in multiple parking lots"
- "Design pipeline and scale for privacy-compliant person detection system"
""")

# Initialize and run the Streamlit chat interface
if __name__ == "__main__":
    chat_app = StreamlitChatInterface()
    chat_app.run()
'''

# Save the Streamlit application to a file
with open("streamlit_rag_chat.py", "w") as f:
    f.write(streamlit_app_code)

print("✅ Streamlit RAG Chat Application created!")
print("📁 File saved as: streamlit_rag_chat.py")
print()
print("🚀 To run the Streamlit web application:")
print("   streamlit run streamlit_rag_chat.py")
print()
print("🌐 The web interface will be available at:")
print("   Local URL: http://localhost:8501")
print("   Network URL: http://[your-ip]:8501")

✅ Streamlit RAG Chat Application created!
📁 File saved as: streamlit_rag_chat.py

🚀 To run the Streamlit web application:
   streamlit run streamlit_rag_chat.py

🌐 The web interface will be available at:
   Local URL: http://localhost:8501
   Network URL: http://[your-ip]:8501


### 🚀 Running Streamlit from Jupyter Notebook

You can run the Streamlit web application directly from this notebook with automatic lifecycle management. The Streamlit server will:

- **🟢 Start automatically** when you run the cell below
- **🔗 Open in browser** with the correct URL
- **🔄 Auto-restart** if you make changes to the code
- **🛑 Stop automatically** when the notebook kernel stops or restarts
- **📊 Show logs** directly in the notebook output

This approach ensures clean resource management and eliminates orphaned processes.

In [None]:
!pip install streamlit
import subprocess
import threading
import time
import os
import signal
import atexit
from IPython.display import display, HTML
import webbrowser

class StreamlitManager:
    """Manage Streamlit server lifecycle from Jupyter notebook"""
    
    def __init__(self):
        self.process = None
        self.thread = None
        self.is_running = False
        self.port = 8501
        
        # Register cleanup function to run when notebook kernel stops
        atexit.register(self.stop_server)
    
    def start_server(self, app_file="streamlit_rag_chat.py", port=8501, auto_open=True):
        """Start the Streamlit server"""
        if self.is_running:
            print("⚠️  Streamlit server is already running!")
            self._show_server_info()
            return
        
        self.port = port
        
        # Check if the app file exists
        if not os.path.exists(app_file):
            print(f"❌ Error: {app_file} not found!")
            print("Please run the cell above to create the Streamlit app file first.")
            return
        
        print(f"🚀 Starting Streamlit server...")
        print(f"📁 App file: {app_file}")
        print(f"🔌 Port: {port}")
        
        try:
            # Start Streamlit in a subprocess
            cmd = [
                "streamlit", "run", app_file,
                "--server.port", str(port),
                "--server.headless", "true",
                "--server.fileWatcherType", "none",
                "--browser.gatherUsageStats", "false"
            ]
            
            self.process = subprocess.Popen(
                cmd,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                universal_newlines=True,
                preexec_fn=os.setsid  # Create new process group for clean shutdown
            )
            
            self.is_running = True
            
            # Start thread to monitor output
            self.thread = threading.Thread(target=self._monitor_output, daemon=True)
            self.thread.start()
            
            # Wait a moment for server to start
            time.sleep(3)
            
            # Show server information
            self._show_server_info()
            
            # Auto-open browser if requested
            if auto_open:
                self._open_browser()
                
        except FileNotFoundError:
            print("❌ Error: Streamlit not found!")
            print("Please install streamlit: pip install streamlit")
        except Exception as e:
            print(f"❌ Error starting server: {str(e)}")
            self.is_running = False
    
    def stop_server(self):
        """Stop the Streamlit server and cleanup"""
        if not self.is_running:
            return
        
        print("🛑 Stopping Streamlit server...")
        
        try:
            if self.process:
                # Kill the entire process group
                os.killpg(os.getpgid(self.process.pid), signal.SIGTERM)
                self.process.wait(timeout=5)
        except (ProcessLookupError, subprocess.TimeoutExpired):
            # Force kill if needed
            try:
                if self.process:
                    os.killpg(os.getpgid(self.process.pid), signal.SIGKILL)
            except ProcessLookupError:
                pass
        except Exception as e:
            print(f"⚠️  Warning during shutdown: {str(e)}")
        
        self.process = None
        self.is_running = False
        print("✅ Streamlit server stopped")
    
    def restart_server(self, app_file="streamlit_rag_chat.py"):
        """Restart the Streamlit server"""
        print("🔄 Restarting Streamlit server...")
        self.stop_server()
        time.sleep(2)
        self.start_server(app_file, self.port)
    
    def _monitor_output(self):
        """Monitor Streamlit output in a separate thread"""
        if not self.process:
            return
            
        for line in iter(self.process.stdout.readline, ''):
            if not self.is_running:
                break
            
            # Filter out verbose logs, show only important ones
            if any(keyword in line.lower() for keyword in ['error', 'warning', 'failed', 'exception']):
                print(f"📋 Streamlit: {line.strip()}")
    
    def _show_server_info(self):
        """Display server information with clickable links"""
        local_url = f"http://localhost:{self.port}"
        network_url = f"http://0.0.0.0:{self.port}"
        
        print("✅ Streamlit server is running!")
        print("🌐 Access your RAG Chat Interface at:")
        
        # Display clickable links in Jupyter
        display(HTML(f"""
        <div style="background-color: #f0f8ff; padding: 15px; border-radius: 10px; margin: 10px 0;">
            <h4>🚀 RAG Chat Interface is Ready!</h4>
            <p><strong>🌐 Local URL:</strong> <a href="{local_url}" target="_blank">{local_url}</a></p>
            <p><strong>📡 Network URL:</strong> <a href="{network_url}" target="_blank">{network_url}</a></p>
            <p><em>Click the links above to open the chat interface in a new tab</em></p>
        </div>
        """))
        
        print(f"📊 Server Status: {'🟢 Running' if self.is_running else '🔴 Stopped'}")
        print(f"🔌 Port: {self.port}")
        print("\n💡 Commands:")
        print("   • streamlit_manager.stop_server() - Stop the server")
        print("   • streamlit_manager.restart_server() - Restart the server")
    
    def _open_browser(self):
        """Attempt to open browser automatically"""
        try:
            local_url = f"http://localhost:{self.port}"
            threading.Timer(2.0, lambda: webbrowser.open(local_url)).start()
            print(f"🌐 Opening browser to {local_url}...")
        except Exception as e:
            print(f"⚠️  Could not auto-open browser: {str(e)}")
    
    def status(self):
        """Show current server status"""
        if self.is_running:
            self._show_server_info()
        else:
            print("🔴 Streamlit server is not running")
            print("💡 Run streamlit_manager.start_server() to start it")

# Create global manager instance
streamlit_manager = StreamlitManager()

# Auto-start the server
print("🎯 Streamlit Manager initialized!")
print("🚀 Starting RAG Chat Interface...")
streamlit_manager.start_server()

print("\n" + "="*60)
print("🎉 STREAMLIT INTEGRATION READY!")
print("="*60)
print("The Streamlit web interface is now running alongside your notebook.")
print("Both interfaces share the same RAG backend and knowledge base.")
print("="*60)

In [None]:
# Convenience functions for easy server management
def start_streamlit():
    """Start the Streamlit server"""
    streamlit_manager.start_server()

def stop_streamlit():
    """Stop the Streamlit server"""
    streamlit_manager.stop_server()

def restart_streamlit():
    """Restart the Streamlit server"""
    streamlit_manager.restart_server()

def streamlit_status():
    """Show Streamlit server status"""
    streamlit_manager.status()

print("🎛️  Streamlit Control Functions Available:")
print("   • start_streamlit() - Start the web interface")
print("   • stop_streamlit() - Stop the web interface") 
print("   • restart_streamlit() - Restart the web interface")
print("   • streamlit_status() - Check server status")
print("\n💡 Example usage:")
print("   start_streamlit()  # Start the server")
print("   streamlit_status() # Check status")
print("   stop_streamlit()   # Stop when done")
stop_streamlit()

## 🎉 Summary

✅ **RAG Knowledge Base Created**: All ScaleLayout and AppLayout documentation indexed  
✅ **Semantic Search Ready**: Vector embeddings enable intelligent retrieval  
✅ **Graph Connections Built**: Related concepts linked for comprehensive answers  
✅ **Multi-Modal Support**: PDFs, Markdown, code files, and structured data processed  
✅ **Production Ready**: Optimized for technical queries and deployment planning  

Your RAG system is now ready to power intelligent AppLayout DAG creation and ScaleLayout deployment planning! 🚀