# üîç Cortex Search Interactive Tutorial
## Learn by Doing: Semantic Vector Search in Snowflake

**Author:** Li Ma  
**Date:** February 24, 2026  
**Project:** DIA v2.0 - Direct Marketing Analytics Intelligence

---

## üìö What You'll Learn

This interactive notebook teaches you how to:
1. ‚úÖ Perform semantic (meaning-based) search
2. ‚úÖ Find similar content using vector embeddings
3. ‚úÖ Use RAG (Retrieval Augmented Generation) pattern
4. ‚úÖ Apply filters to refine search results
5. ‚úÖ Build intelligent Q&A systems

## üéØ Prerequisites

- Docker containers running (`docker-compose up`)
- Snowflake credentials configured in `.env` file
- **Cortex Search Service created in Snowflake** (see setup below)

## üß† What is Cortex Search?

**Cortex Search** performs semantic search - finding content by **meaning**, not just keywords.

**Examples:**
- Search: "summer campaigns" ‚Üí Finds: "seasonal promotions", "warm weather sales"
- Search: "email deliverability" ‚Üí Finds: "inbox placement", "bounce rate reduction"

**Traditional Search (keyword):**
- "email campaign" only finds exact matches

**Semantic Search (meaning):**
- "email campaign" finds: newsletters, promotional emails, automated sequences

**Use Cases:**
- Knowledge base search
- Find similar past campaigns
- Customer support Q&A
- Content recommendations

---

## ‚öôÔ∏è Setup: Create Cortex Search Service

**Before running this notebook, you must create a search service in Snowflake:**

```sql
-- Example: Create search service for campaign documents
CREATE CORTEX SEARCH SERVICE campaign_knowledge
    ON content_column
    WAREHOUSE = COMPUTE_WH
    TARGET_LAG = '1 hour'
    AS (
        SELECT 
            campaign_name,
            description AS content_column,
            category,
            start_date
        FROM campaign_documents
    );
```

---

**üí° Tip:** Run each cell with `Shift + Enter` and experiment with different queries!

In [None]:
# Install required packages for this notebook
# Run this cell once to install dependencies
import sys
import subprocess

packages = [
    'structlog',
    'python-dotenv',
    'snowflake-snowpark-python'
]

print("üì¶ Installing required packages...")
for package in packages:
    print(f"   Installing {package}...")
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])
        print(f"   ‚úÖ {package} installed")
    except subprocess.CalledProcessError as e:
        print(f"   ‚ùå Failed to install {package}: {e}")

print("\n‚úÖ Installation complete!")
print("‚ö†Ô∏è  If this is the first install, please RESTART THE KERNEL:")
print("   Jupyter menu: Kernel ‚Üí Restart Kernel")

In [None]:
# Setup Python paths and import libraries
import sys
import os

# Calculate the project paths dynamically
notebook_dir = os.getcwd()
project_root = os.path.abspath(os.path.join(notebook_dir, '..'))
orchestrator_path = os.path.join(project_root, 'orchestrator')

# Add paths for both local and Docker environments
sys.path.insert(0, orchestrator_path)
sys.path.insert(0, project_root)
sys.path.insert(0, '/app')

print(f"üìÅ Python paths added:")
print(f"   Project Root: {project_root}")
print(f"   Orchestrator: {orchestrator_path}")

# Verify orchestrator path exists
if os.path.exists(orchestrator_path):
    print(f"   ‚úÖ Orchestrator directory found")
else:
    print(f"   ‚ö†Ô∏è  Orchestrator directory NOT found at: {orchestrator_path}")

# Core Python libraries
import json
from typing import Dict, List, Any, Optional
from dataclasses import dataclass

# Snowflake libraries
from snowflake.snowpark import Session

# Environment and logging
from dotenv import load_dotenv

# Try to import custom logger with fallback
try:
    from utils.logging import get_logger
    logger = get_logger(__name__)
    print(f"   ‚úÖ Using custom structlog logger")
except ImportError as e:
    import logging
    logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
    logger = logging.getLogger(__name__)
    print(f"   ‚ö†Ô∏è  Using standard logging (utils.logging not found)")

# Load environment variables from .env file
load_dotenv()

print("\n‚úÖ All libraries imported successfully!")
print(f"   Python version: {sys.version.split()[0]}")

## üì¶ Understanding the Response Data Models

The search service uses these data structures to organize results.

In [None]:
@dataclass
class SearchResult:
    """
    Single search result with similarity score.
    
    Attributes:
        content (str): The matched content/text
        score (float): Similarity score (0.0 to 1.0, higher = more similar)
        rank (int): Position in results (1 = best match)
        metadata (Dict): Additional fields (campaign_name, date, etc.)
    """
    content: str
    score: float
    rank: int
    metadata: Optional[Dict[str, Any]] = None


@dataclass
class SearchResponse:
    """
    Complete search response with all results.
    
    Attributes:
        query (str): The search query
        results (List[SearchResult]): Ordered list of matches
        metadata (Dict): Search info (result count, execution time)
        error (str): Error message if something went wrong
    """
    query: str
    results: Optional[List[SearchResult]] = None
    metadata: Optional[Dict[str, Any]] = None
    error: Optional[str] = None


# Test it out!
sample_result = SearchResult(
    content="Summer sale campaign with 25% discount on all products",
    score=0.89,
    rank=1,
    metadata={"campaign_name": "SUMMER_2025", "category": "promotional"}
)

sample_response = SearchResponse(
    query="seasonal promotions",
    results=[sample_result],
    metadata={"result_count": 1}
)

print("‚úÖ Search data models created!")
print(f"   Query: {sample_response.query}")
print(f"   Top Result: {sample_result.content}")
print(f"   Similarity Score: {sample_result.score}")
print(f"   Rank: #{sample_result.rank}")

## üîß Import CortexSearch Service

Now let's import the `CortexSearch` class from the services module.

In [None]:
# Import the CortexSearch service class
try:
    from services.cortex_search import CortexSearch
    print("‚úÖ CortexSearch class imported successfully!")
    print("   Ready to perform semantic search")
except ImportError as e:
    print(f"‚ùå Failed to import CortexSearch: {e}")
    print("\nüí° Troubleshooting:")
    print("   1. Make sure you ran Cell 2 (path setup)")
    print("   2. Check that orchestrator/services/cortex_search.py exists")

## üîç Example 1: Basic Semantic Search

Search for similar content in your knowledge base.

**‚ö†Ô∏è Note:** You'll need to replace `"your_search_service"` with your actual search service name!

In [None]:
# Replace with your actual search service name
SERVICE_NAME = "campaign_knowledge"  # Change this!

try:
    # Create search service instance
    search = CortexSearch(service_name=SERVICE_NAME)
    
    # Perform semantic search
    response = search.search(
        query="email campaigns about seasonal promotions",
        limit=5
    )
    
    if response.error:
        print(f"‚ùå Error: {response.error}")
        print("\nüí° Troubleshooting:")
        print("   1. Make sure your search service exists:")
        print("      SHOW CORTEX SEARCH SERVICES;")
        print("   2. Update SERVICE_NAME variable above")
        print("   3. Check that you have data in the service")
    else:
        print(f"‚úÖ Found {len(response.results)} results for: '{response.query}'\n")
        
        for result in response.results:
            print(f"Rank #{result.rank} (Score: {result.score:.3f})")
            print(f"   Content: {result.content[:100]}...")
            if result.metadata:
                print(f"   Metadata: {result.metadata}")
            print()
            
except Exception as e:
    print(f"‚ùå Search failed: {e}")
    print("\nüí° Make sure you:")
    print("   1. Created a Cortex Search Service in Snowflake")
    print("   2. Updated SERVICE_NAME variable above")
    print("   3. Have Cortex Search enabled in your account")

## üéØ Example 2: Search with Filters

Refine your search results with filters (category, date range, etc.).

In [None]:
# Search with filters
try:
    with CortexSearch(service_name=SERVICE_NAME) as search:
        response = search.search(
            query="product launch campaigns",
            limit=10,
            filters={"category": "promotional", "year": 2025}  # Adjust filters based on your data
        )
        
        if response.error:
            print(f"‚ùå Error: {response.error}")
        else:
            print(f"‚úÖ Filtered search results:")
            print(f"   Query: '{response.query}'")
            print(f"   Filters: category=promotional, year=2025")
            print(f"   Results: {len(response.results)}\n")
            
            for result in response.results[:3]:  # Show top 3
                print(f"#{result.rank}: {result.content}")
                print(f"   Score: {result.score:.3f}\n")
                
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("üí° Adjust the filters based on your actual data schema")

## ü§ñ Example 3: RAG - Retrieval Augmented Generation

Combine search + LLM to answer questions intelligently!

**How RAG works:**
1. üîç Search for relevant documents
2. üìÑ Extract context from top results
3. ü§ñ LLM generates answer using context
4. ‚úÖ Result: Accurate answer based on YOUR data

In [None]:
# RAG: Search + LLM = Intelligent Answers
try:
    with CortexSearch(service_name=SERVICE_NAME) as search:
        answer = search.search_with_llm(
            query="What are the best practices for summer email campaigns?",
            limit=5,            # Search top 5 relevant documents
            llm_model="llama3-70b"  # Use LLM to generate answer
        )
        
        print("ü§ñ RAG-Generated Answer:\n")
        print(answer)
        print("\n" + "=" * 70)
        print("üí° This answer is based on YOUR documents, not generic AI knowledge!")
        
except Exception as e:
    print(f"‚ùå RAG failed: {e}")
    print("\nüí° RAG requires:")
    print("   1. Cortex Search Service with content")
    print("   2. Cortex Complete enabled")
    print("   3. Semantic model or documents to search")

## üéØ Example 4: Compare Keyword vs Semantic Search

See the difference between traditional and semantic search!

In [None]:
# Compare different search queries
queries = [
    "email campaign performance",          # Direct keywords
    "how well did our newsletters do",     # Natural language
    "inbox delivery success rate",         # Different terminology
]

print("üìä Semantic Search Comparison:\n")
print("=" * 70)

try:
    with CortexSearch(service_name=SERVICE_NAME) as search:
        for query in queries:
            print(f"\nüîç Query: \"{query}\"")
            print("-" * 70)
            
            response = search.search(query, limit=3)
            
            if response.error:
                print(f"‚ùå Error: {response.error}")
            else:
                for result in response.results:
                    print(f"   ‚Ä¢ Score {result.score:.3f}: {result.content[:80]}...")
                    
except Exception as e:
    print(f"‚ùå Error: {e}")

print("\n" + "=" * 70)
print("üí° Notice how different phrasings find similar content!")

## üèÜ Example 5: Find Similar Items

Find content similar to a specific campaign or document.

In [None]:
# Find similar campaigns
reference_campaign = "Black Friday 2025 email campaign with 30% discount and free shipping"

try:
    with CortexSearch(service_name=SERVICE_NAME) as search:
        response = search.search(
            query=reference_campaign,
            limit=10
        )
        
        if response.error:
            print(f"‚ùå Error: {response.error}")
        else:
            print(f"üîç Finding campaigns similar to:")
            print(f"   \"{reference_campaign}\"\n")
            print("=" * 70)
            print("üìä Similar Campaigns:\n")
            
            for result in response.results:
                print(f"#{result.rank} - Similarity: {result.score:.1%}")
                print(f"   {result.content}\n")
                
            print("=" * 70)
            print(f"üí° Found {len(response.results)} similar campaigns!")
            
except Exception as e:
    print(f"‚ùå Error: {e}")

## üéì Summary: What You Learned

Congratulations! You've learned:

‚úÖ **Cortex Search Fundamentals**
- Semantic (meaning-based) search
- Vector embeddings and similarity scores
- Difference from keyword search

‚úÖ **Advanced Features**
- Search with filters
- RAG (Retrieval Augmented Generation)
- Find similar content
- Compare search strategies

‚úÖ **Practical Applications**
- Knowledge base search
- Campaign similarity matching
- Intelligent Q&A systems
- Content recommendations

---

## üöÄ Next Steps

**Try These Experiments:**
1. Create your own search service with campaign data
2. Build a Q&A bot using RAG pattern
3. Find similar past campaigns before launching new ones
4. Search across different content types

**Advanced Use Cases:**

### 1. Smart Campaign Finder
```python
def find_similar_campaigns(description: str, top_k: int = 5):
    with CortexSearch(service_name="campaign_history") as search:
        return search.search(description, limit=top_k)
```

### 2. Knowledge Base Bot
```python
def answer_question(question: str):
    with CortexSearch(service_name="company_kb") as search:
        return search.search_with_llm(question, limit=5)
```

### 3. Content Recommendation
```python
def recommend_similar(content_id: str, limit: int = 10):
    # Get content description
    # Search for similar
    # Return recommendations
    pass
```

---

## üîó Related Resources

- **Documentation:** `guides/02_STEP_2.1_CORTEX_SERVICES.md`
- **Service Code:** `orchestrator/services/cortex_search.py`
- **Other Notebooks:**
  - `cortex_analyst_interactive.ipynb` - SQL generation
  - `cortex_complete_interactive.ipynb` - Text generation
  - `cortex_ml_interactive.ipynb` - Forecasting & anomalies

---

## üìù Key Concepts

### Semantic Search vs Keyword Search

**Keyword Search:**
- Matches exact words
- "email campaign" ‚â† "newsletter"
- Fast but limited

**Semantic Search:**
- Understands meaning
- "email campaign" = "newsletter" = "promotional message"
- Slower but smarter

### Vector Embeddings

Text is converted to numbers (vectors) that represent meaning:
- Similar meanings ‚Üí Similar vectors
- Distance between vectors = Similarity score
- All handled automatically by Snowflake!

### RAG Pattern

**Problem:** LLMs don't know your specific data  
**Solution:** RAG combines search + generation

```python
# Step 1: Search your documents
docs = search.search("email best practices")

# Step 2: LLM reads docs and generates answer
answer = llm.generate_with_context(docs)

# Result: Accurate answer based on YOUR knowledge
```

---

## ‚öôÔ∏è Creating Your Search Service

```sql
-- Step 1: Prepare your data table
CREATE TABLE campaign_documents AS
SELECT 
    campaign_name,
    description,
    category,
    start_date
FROM your_campaigns_table;

-- Step 2: Create search service
CREATE CORTEX SEARCH SERVICE campaign_knowledge
    ON description
    WAREHOUSE = COMPUTE_WH
    TARGET_LAG = '1 hour'
    AS (
        SELECT 
            campaign_name,
            description,
            category,
            start_date
        FROM campaign_documents
    );

-- Step 3: Wait for indexing (check status)
SHOW CORTEX SEARCH SERVICES;

-- Step 4: Test search
SELECT * FROM TABLE(
    campaign_knowledge!SEARCH(
        'summer promotional campaigns',
        10
    )
);
```

---

**Status:** ‚úÖ Tutorial Complete  
**Next:** Try `cortex_ml_interactive.ipynb` for forecasting and anomaly detection!