# Layer 6: Semantic Search

### What We're Building:
- First: Manual embeddings table (to show the maintenance pain)
- Then: Cortex Search Service (self-maintaining, agent-ready)
- Natural language queries across email corpus
- Pattern discovery for investigations

In [None]:
from snowflake.snowpark import Session

session = Session.builder.getOrCreate()
session.use_warehouse('COMPLIANCE_DEMO_WH')
session.use_database('COMPLIANCE_DEMO')
session.use_schema('SEARCH')

print("Layer 6: Building semantic search...")

---

## Part 1: The Manual Approach (The Pain)

Let's build embeddings manually to understand what we're replacing.

In [None]:
import time

print("Creating manual embeddings table...")
start = time.time()

session.sql("""
CREATE OR REPLACE TABLE COMPLIANCE_DEMO.SEARCH.EMAIL_EMBEDDINGS_MANUAL AS
SELECT 
    EMAIL_ID,
    SUBJECT,
    BODY,
    COMPLIANCE_LABEL,
    SENDER_DEPT,
    RECIPIENT_DEPT,
    SENT_AT,
    SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m', 
        CONCAT(SUBJECT, ' ', LEFT(BODY, 1000))
    )::VECTOR(FLOAT, 768) AS EMBEDDING
FROM COMPLIANCE_DEMO.EMAIL_SURVEILLANCE.EMAILS
LIMIT 500
""").collect()

elapsed = time.time() - start
count = session.sql('SELECT COUNT(*) as cnt FROM EMAIL_EMBEDDINGS_MANUAL').collect()[0]['CNT']

print(f"Created embeddings for {count} emails in {elapsed:.1f}s")

In [None]:
query = 'confidential merger acquisition tip'

results = session.sql(f"""
SELECT 
    EMAIL_ID,
    COMPLIANCE_LABEL,
    SUBJECT,
    LEFT(BODY, 300) as BODY_PREVIEW,
    ROUND(VECTOR_COSINE_SIMILARITY(
        EMBEDDING,
        SNOWFLAKE.CORTEX.EMBED_TEXT_768('snowflake-arctic-embed-m', '{query}')::VECTOR(FLOAT, 768)
    ), 3) AS SIMILARITY
FROM EMAIL_EMBEDDINGS_MANUAL
ORDER BY SIMILARITY DESC
LIMIT 3
""").to_pandas()

print(f"Manual search: '{query}'")
print("="*70)
for _, row in results.iterrows():
    print(f"\n[{row['COMPLIANCE_LABEL']}] Similarity: {row['SIMILARITY']}")
    print(f"Subject: {row['SUBJECT']}")
    print(f"Body: {row['BODY_PREVIEW'][:200]}...")

### The Problem with Manual Embeddings

| Issue | Impact |
|-------|--------|
| **No auto-refresh** | New emails aren't searchable until you rebuild |
| **Manual maintenance** | Need to schedule jobs, handle failures |
| **No hybrid search** | Can't combine text + vector matching |
| **Not agent-ready** | Can't use with Cortex Agents directly |
| **Query complexity** | Write VECTOR_COSINE_SIMILARITY every time |

---

## Part 2: Cortex Search Service (The Solution)

Cortex Search handles all of this automatically.

In [None]:
session.sql("""
CREATE OR REPLACE CORTEX SEARCH SERVICE COMPLIANCE_DEMO.SEARCH.EMAIL_SEARCH_SERVICE
ON BODY
ATTRIBUTES SUBJECT, COMPLIANCE_LABEL, SENDER_DEPT, RECIPIENT_DEPT
WAREHOUSE = COMPLIANCE_DEMO_WH
TARGET_LAG = '1 hour'
AS (
    SELECT 
        EMAIL_ID,
        SUBJECT,
        BODY,
        COMPLIANCE_LABEL,
        SENDER_DEPT,
        RECIPIENT_DEPT,
        SENT_AT::STRING as SENT_AT
    FROM COMPLIANCE_DEMO.EMAIL_SURVEILLANCE.EMAILS
)
""").collect()

print("Cortex Search Service created!")
print("\n  -> Auto-maintains embeddings (TARGET_LAG = 1 hour)")
print("  -> Hybrid search (text + semantic)")
print("  -> Ready for Cortex Agents")
print("  -> Filterable by attributes")

In [None]:
from snowflake.core import Root
import json

root = Root(session)
search_service = root.databases['COMPLIANCE_DEMO'].schemas['SEARCH'].cortex_search_services['EMAIL_SEARCH_SERVICE']

print("SearchService connected!")

## Step 3: Natural Language Search

In [None]:
query = "confidential merger acquisition before public announcement"

response = search_service.search(
    query=query,
    columns=['EMAIL_ID', 'SUBJECT', 'BODY', 'COMPLIANCE_LABEL', 'SENDER_DEPT', 'RECIPIENT_DEPT'],
    limit=5
)

print(f"Search: '{query}'")
print("="*80)

results = response.results
for i, result in enumerate(results, 1):
    print(f"\n--- Result {i} [{result['COMPLIANCE_LABEL']}] ---")
    print(f"Route: {result['SENDER_DEPT']} -> {result['RECIPIENT_DEPT']}")
    print(f"Subject: {result['SUBJECT']}")
    print(f"Body: {result['BODY'][:300]}...")

In [None]:
query = "research analyst sharing recommendation with trading before clients"

response = search_service.search(
    query=query,
    columns=['EMAIL_ID', 'SUBJECT', 'BODY', 'COMPLIANCE_LABEL', 'SENDER_DEPT', 'RECIPIENT_DEPT'],
    limit=5
)

print(f"Search: '{query}'")
print("="*80)

results = response.results
for i, result in enumerate(results, 1):
    print(f"\n--- Result {i} [{result['COMPLIANCE_LABEL']}] ---")
    print(f"Route: {result['SENDER_DEPT']} -> {result['RECIPIENT_DEPT']}")
    print(f"Subject: {result['SUBJECT']}")
    print(f"Body: {result['BODY'][:300]}...")

## Step 4: Filtered Search (Investigation Use Case)

In [None]:
query = "delete this message keep quiet"

response = search_service.search(
    query=query,
    columns=['EMAIL_ID', 'SUBJECT', 'BODY', 'COMPLIANCE_LABEL', 'SENDER_DEPT', 'RECIPIENT_DEPT'],
    filter={"@eq": {"SENDER_DEPT": "Research"}},
    limit=5
)

print(f"Search: '{query}' (filtered to SENDER_DEPT = Research)")
print("="*80)

results = response.results
for i, result in enumerate(results, 1):
    print(f"\n--- Result {i} [{result['COMPLIANCE_LABEL']}] ---")
    print(f"Route: {result['SENDER_DEPT']} -> {result['RECIPIENT_DEPT']}")
    print(f"Subject: {result['SUBJECT']}")
    print(f"Body: {result['BODY'][:300]}...")

## Cortex Search: Ready for Agents

This Search Service can be used directly with **Cortex Agents**:

```sql
CREATE CORTEX SEARCH AGENT COMPLIANCE_ASSISTANT
  SEARCH_SERVICES = (COMPLIANCE_DEMO.SEARCH.EMAIL_SEARCH_SERVICE)
  LLM_MODEL = 'claude-3-5-sonnet'
  ...
```

The agent can then answer questions like:
- "Find all emails about the ACME merger"
- "Show me Research-to-Trading communications from last week"
- "What violations involve the word 'confidential'?"

## Manual vs Cortex Search

| Feature | Manual Embeddings | Cortex Search |
|---------|------------------|---------------|
| Auto-refresh | No | **Yes** (TARGET_LAG) |
| Maintenance | Manual jobs | **Zero** |
| Search type | Vector only | **Hybrid** (text + vector) |
| Filtering | Manual SQL | **Built-in** |
| Agent-ready | No | **Yes** |
| API | SQL only | **Python SDK** |

## Layer 6 Complete

**What we built:**
- Manual embeddings (to show the pain)
- **Cortex Search Service** (self-maintaining, agent-ready)
- Natural language search across 10K emails
- Filtered search for investigations

**Why Cortex Search matters:**
- **Zero maintenance** - embeddings auto-refresh
- **Hybrid search** - combines semantic + keyword
- **Agent-ready** - plug into Cortex Agents
- **Investigation support** - "Find all emails like this one"

---

## THE FULL SYSTEM IS NOW COMPLETE