# Examples Notebook - SCM Logistics Case Examples

This notebook demonstrates how to work with example cases and guidelines in the SCM logistics assistant system. It shows:

1. Loading and exploring example case studies from the knowledge base
2. Understanding the structure of example data
3. Retrieving relevant examples for few-shot learning
4. Using examples in prompt construction for the LLM assistant

Examples are the foundation of the MedPrompt-inspired approach, providing the LLM with concrete patterns to follow when solving new logistics scenarios.

## 1. Setup and Imports

Import necessary libraries and set up connections to the vector database.

In [2]:
# Import necessary libraries
import sys
import os
import json
import pandas as pd
from pathlib import Path
from IPython.display import display, Markdown, HTML
from dotenv import load_dotenv

# Add the project root to the path
sys.path.append("..")

# Import utility functions
from utils.qdrant_client import get_qdrant_client, get_embedding, search_datapoints, test_connection, COLLECTION_NAME

# Load environment variables
load_dotenv()

print("Libraries imported successfully!")
print(f"Current working directory: {os.getcwd()}")

Libraries imported successfully!
Current working directory: /Users/max/Documents/code/scmprompt/Notebooks


## 2. Load Example Data

Load example cases and guidelines from the project's data files.

In [3]:
# Load generated case examples from the training dataset
generated_cases_file = Path("../Data/GeneratedCases/train_cases.parquet")

if generated_cases_file.exists():
    # Load the parquet file with generated cases
    examples_df = pd.read_parquet(generated_cases_file)
    
    print(f"Loaded {len(examples_df)} generated case examples from {generated_cases_file}")
    
    # Display structure of the dataset
    print("\nDataset structure:")
    print(f"Columns: {list(examples_df.columns)}")
    print(f"Shape: {examples_df.shape}")
    
    # Show a sample of the data structure
    if not examples_df.empty:
        print("\nSample case structure (first example):")
        first_case = examples_df.iloc[0]
        for column in ['case_id', 'title', 'realism_score', 'complexity_score', 'educational_value']:
            if column in first_case:
                value = first_case[column]
                if isinstance(value, str) and len(value) > 100:
                    print(f"  {column}: {value[:100]}...")
                else:
                    print(f"  {column}: {value}")
    
    # Convert to list of dictionaries for easier processing
    examples_data = examples_df.to_dict('records')
else:
    print(f"Generated cases file not found at {generated_cases_file}")
    examples_data = []
    examples_df = pd.DataFrame()

Loaded 245 generated case examples from ../Data/GeneratedCases/train_cases.parquet

Dataset structure:
Columns: ['case_id', 'title', 'enhanced_case', 'solution', 'file_path', 'enhanced_case_length', 'solution_length', 'realism_score', 'complexity_score', 'educational_value', 'solution_quality', 'overall_qualification', 'evaluation_summary', 'improvement_suggestions', 'case_for_embedding']
Shape: (245, 15)

Sample case structure (first example):
  case_id: case-20250330-081720-dj177a
  title: **Baltic Salmon Run: Navigating Regulatory Hurdles and Logistical Storms to Reach Asian Gourmet**
  realism_score: 8.0
  complexity_score: 7.0
  educational_value: 9.0


## 3. Explore Case Examples

Examine the different types of case examples available in the system.

In [4]:
if not examples_df.empty:
    print("Dataset Overview:")
    print(f"Total generated cases: {len(examples_df)}")
    print(f"Columns: {list(examples_df.columns)}")
    
    # Analyze case quality metrics
    print("\nCase Quality Metrics:")
    for metric in ['realism_score', 'complexity_score', 'educational_value', 'solution_quality']:
        if metric in examples_df.columns:
            print(f"  {metric}: mean={examples_df[metric].mean():.1f}, std={examples_df[metric].std():.1f}")
    
    # Analyze qualification status
    if 'overall_qualification' in examples_df.columns:
        print("\nQualification status:")
        print(examples_df['overall_qualification'].value_counts())
    
    # Show case length statistics
    if 'enhanced_case_length' in examples_df.columns:
        print(f"\nCase length statistics:")
        print(f"  Enhanced case length: mean={examples_df['enhanced_case_length'].mean():.0f}, std={examples_df['enhanced_case_length'].std():.0f}")
    
    if 'solution_length' in examples_df.columns:
        print(f"  Solution length: mean={examples_df['solution_length'].mean():.0f}, std={examples_df['solution_length'].std():.0f}")
    
    # Show a sample case
    print("\n" + "="*80)
    print("SAMPLE GENERATED CASE:")
    print("="*80)
    
    # Find a well-qualified case for demonstration
    qualified_cases = examples_df[examples_df['overall_qualification'] == 'QUALIFIED'] if 'overall_qualification' in examples_df.columns else examples_df
    if not qualified_cases.empty:
        sample_case = qualified_cases.iloc[0]
    else:
        sample_case = examples_df.iloc[0]
    
    print(f"\nCASE ID: {sample_case.get('case_id', 'N/A')}")
    print(f"TITLE: {sample_case.get('title', 'N/A')}")
    
    if 'realism_score' in sample_case:
        print(f"QUALITY SCORES: Realism={sample_case['realism_score']}, Complexity={sample_case['complexity_score']}, Educational={sample_case['educational_value']}")
    
    # Show case content preview
    case_content = sample_case.get('enhanced_case', sample_case.get('case_for_embedding', ''))
    if case_content:
        print(f"\nCASE PREVIEW:")
        preview = case_content[:800] + "..." if len(case_content) > 800 else case_content
        print(preview)
    
    # Show solution preview
    solution_content = sample_case.get('solution', '')
    if solution_content:
        print(f"\nSOLUTION PREVIEW:")
        solution_preview = solution_content[:500] + "..." if len(solution_content) > 500 else solution_content
        print(solution_preview)
else:
    print("No generated case examples available.")

Dataset Overview:
Total generated cases: 245
Columns: ['case_id', 'title', 'enhanced_case', 'solution', 'file_path', 'enhanced_case_length', 'solution_length', 'realism_score', 'complexity_score', 'educational_value', 'solution_quality', 'overall_qualification', 'evaluation_summary', 'improvement_suggestions', 'case_for_embedding']

Case Quality Metrics:
  realism_score: mean=8.1, std=0.3
  complexity_score: mean=7.1, std=0.3
  educational_value: mean=8.7, std=0.5
  solution_quality: mean=7.7, std=0.4

Qualification status:
overall_qualification
QUALIFIED    243
Name: count, dtype: int64

Case length statistics:
  Enhanced case length: mean=7004, std=1200
  Solution length: mean=59525, std=60537

SAMPLE GENERATED CASE:

CASE ID: case-20250330-081720-dj177a
TITLE: **Baltic Salmon Run: Navigating Regulatory Hurdles and Logistical Storms to Reach Asian Gourmet**
QUALITY SCORES: Realism=8.0, Complexity=7.0, Educational=9.0

CASE PREVIEW:
**Scenario:** Baltic Breeze Seafood, a rapidly expan

## 3.1. Export Sample Case to Text File

Export the full details of the sample case to a text file for detailed review.

In [7]:
import re
import numpy as np
from datetime import datetime

def safe_get_value(series, key, default=''):
    """Safely extract a value from pandas Series, handling numpy arrays and NaN values."""
    try:
        value = series.get(key, default)
        if pd.isna(value) or value is None:
            return default
        if isinstance(value, np.ndarray):
            return str(value)
        return str(value)
    except:
        return default

# Export the sample case to a text file
if not examples_df.empty and 'sample_case' in locals():
    # Create output directory if it doesn't exist
    output_dir = Path("/Users/max/Documents/code/scmprompt/Data/GeneratedCases/txt")
    output_dir.mkdir(parents=True, exist_ok=True)
    
    # Get case details safely
    case_id = safe_get_value(sample_case, 'case_id', 'unknown')
    title = safe_get_value(sample_case, 'title', 'Untitled Case')
    
    # Use only case ID for filename (simplified as requested)
    filename = f"{case_id}.txt"
    filepath = output_dir / filename
    
    # Prepare the content
    content_lines = []
    content_lines.append("=" * 80)
    content_lines.append("MARITIME LOGISTICS CASE STUDY")
    content_lines.append("=" * 80)
    content_lines.append("")
    content_lines.append(f"Case ID: {case_id}")
    content_lines.append(f"Title: {title}")
    content_lines.append(f"Exported: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    content_lines.append("")
    
    # Add quality scores if available
    realism_score = safe_get_value(sample_case, 'realism_score')
    if realism_score and realism_score != '':
        content_lines.append("QUALITY METRICS:")
        content_lines.append("-" * 40)
        content_lines.append(f"Realism Score: {safe_get_value(sample_case, 'realism_score')}")
        content_lines.append(f"Complexity Score: {safe_get_value(sample_case, 'complexity_score')}")
        content_lines.append(f"Educational Value: {safe_get_value(sample_case, 'educational_value')}")
        content_lines.append(f"Solution Quality: {safe_get_value(sample_case, 'solution_quality', 'N/A')}")
        content_lines.append(f"Overall Qualification: {safe_get_value(sample_case, 'overall_qualification', 'N/A')}")
        content_lines.append("")
    
    # Add case content
    case_content = safe_get_value(sample_case, 'enhanced_case')
    if not case_content:
        case_content = safe_get_value(sample_case, 'case_for_embedding')
    
    if case_content and case_content.strip():
        content_lines.append("CASE SCENARIO:")
        content_lines.append("=" * 80)
        content_lines.append("")
        content_lines.append(case_content)
        content_lines.append("")
    
    # Add solution
    solution_content = safe_get_value(sample_case, 'solution')
    if solution_content and solution_content.strip():
        content_lines.append("SOLUTION:")
        content_lines.append("=" * 80)
        content_lines.append("")
        content_lines.append(solution_content)
        content_lines.append("")
    
    # Add additional metadata if available
    evaluation_summary = safe_get_value(sample_case, 'evaluation_summary')
    if evaluation_summary and evaluation_summary.strip():
        content_lines.append("EVALUATION SUMMARY:")
        content_lines.append("-" * 40)
        content_lines.append(evaluation_summary)
        content_lines.append("")
    
    improvement_suggestions = safe_get_value(sample_case, 'improvement_suggestions')
    if improvement_suggestions and improvement_suggestions.strip():
        content_lines.append("IMPROVEMENT SUGGESTIONS:")
        content_lines.append("-" * 40)
        content_lines.append(improvement_suggestions)
        content_lines.append("")
    
    # Write to file
    try:
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write('\n'.join(content_lines))
        
        print(f"✓ Case exported successfully!")
        print(f"  File: {filepath}")
        print(f"  Case ID: {case_id}")
        print(f"  Title: {title}")
        print(f"  File size: {filepath.stat().st_size} bytes")
        
        # Show file preview
        print(f"\nFile preview (first 500 characters):")
        print("-" * 60)
        with open(filepath, 'r', encoding='utf-8') as f:
            preview_content = f.read(500)
            print(preview_content)
            if len(preview_content) == 500:
                print("...")
                
    except Exception as e:
        print(f"❌ Error writing file: {e}")
        import traceback
        traceback.print_exc()
else:
    print("No sample case available to export.")

✓ Case exported successfully!
  File: /Users/max/Documents/code/scmprompt/Data/GeneratedCases/txt/case-20250330-081720-dj177a.txt
  Case ID: case-20250330-081720-dj177a
  Title: **Baltic Salmon Run: Navigating Regulatory Hurdles and Logistical Storms to Reach Asian Gourmet**
  File size: 24286 bytes

File preview (first 500 characters):
------------------------------------------------------------
MARITIME LOGISTICS CASE STUDY

Case ID: case-20250330-081720-dj177a
Title: **Baltic Salmon Run: Navigating Regulatory Hurdles and Logistical Storms to Reach Asian Gourmet**
Exported: 2025-07-19 11:04:12

QUALITY METRICS:
----------------------------------------
Realism Score: 8.0
Complexity Score: 7.0
Educational Value: 9.0
Solution Qua
...


## 4. Connect to Vector Database

Test connection to Qdrant and explore the stored examples collection.

In [None]:
# Test Qdrant connection
try:
    print("Testing Qdrant connection...")
    test_connection()
    
    # Get client
    client = get_qdrant_client()
    
    # Get collection info
    try:
        collection_info = client.get_collection(COLLECTION_NAME)
        print(f"\nCollection '{COLLECTION_NAME}' info:")
        print(f"  Points count: {collection_info.points_count}")
        print(f"  Vector size: {collection_info.config.params.vectors.size}")
        print(f"  Distance metric: {collection_info.config.params.vectors.distance}")
    except Exception as e:
        print(f"Could not get collection info: {e}")
        
except Exception as e:
    print(f"Qdrant connection failed: {e}")
    print("You can still explore the examples data without vector search.")

## 5. Search for Relevant Examples

Demonstrate how to search for examples relevant to a specific scenario.

In [None]:
# Example search query
sample_scenario = "Container shipping from Hamburg to Singapore with dangerous goods"

print(f"Searching for examples relevant to: '{sample_scenario}'")
print("\n" + "-"*60)

try:
    # Search using the vector database
    results = search_datapoints(
        query=sample_scenario,
        limit=3,
        content_type_filter="example"
    )
    
    print(f"Found {len(results)} relevant examples:")
    
    for i, result in enumerate(results, 1):
        print(f"\n{i}. Example (Score: {result.score:.3f})")
        print(f"   ID: {result.payload.get('chunk_id', 'N/A')}")
        print(f"   Type: {result.payload.get('example_type', 'N/A')}")
        
        # Show content preview
        content = result.payload.get('content', '')
        if content:
            preview = content[:200] + "..." if len(content) > 200 else content
            print(f"   Preview: {preview}")
        
        # Show summary if available
        summary = result.payload.get('summary', '')
        if summary:
            summary_preview = summary[:150] + "..." if len(summary) > 150 else summary
            print(f"   Summary: {summary_preview}")
            
except Exception as e:
    print(f"Vector search failed: {e}")
    print("\nFalling back to simple text search in loaded generated cases...")
    
    # Simple fallback search using generated cases
    if not examples_df.empty:
        search_terms = sample_scenario.lower().split()
        relevant_cases = []
        
        for idx, row in examples_df.iterrows():
            # Search in case content and title
            case_text = str(row.get('enhanced_case', row.get('case_for_embedding', ''))).lower()
            title_text = str(row.get('title', '')).lower()
            
            # Simple keyword matching
            matches = sum(1 for term in search_terms if term in case_text or term in title_text)
            if matches > 0:
                relevant_cases.append((row, matches))
        
        # Sort by number of matches
        relevant_cases.sort(key=lambda x: x[1], reverse=True)
        
        print(f"Found {len(relevant_cases)} potentially relevant cases:")
        for i, (case, matches) in enumerate(relevant_cases[:3], 1):
            print(f"\n{i}. {case.get('title', 'Untitled Case')} (Matches: {matches})")
            print(f"   Case ID: {case.get('case_id', 'N/A')}")
            if 'realism_score' in case:
                print(f"   Quality: Realism={case['realism_score']}, Complexity={case['complexity_score']}")
            
            case_content = case.get('enhanced_case', case.get('case_for_embedding', ''))
            if case_content:
                content_preview = case_content[:300] + "..." if len(case_content) > 300 else case_content
                print(f"   Preview: {content_preview}")
    else:
        print("No generated cases available for search.")

## 6. Example Usage in Prompt Construction

Show how retrieved examples can be used to construct few-shot prompts for the LLM.

In [None]:
def format_case_for_prompt(case_data):
    """
    Format a generated case for use in a few-shot prompt.
    """
    formatted = "EXAMPLE CASE:\n"
    formatted += "=" * 50 + "\n"
    
    # Add case title and ID
    title = case_data.get('title', 'Untitled Case')
    case_id = case_data.get('case_id', 'N/A')
    formatted += f"Title: {title}\n"
    formatted += f"Case ID: {case_id}\n\n"
    
    # Add case scenario
    case_content = case_data.get('enhanced_case', case_data.get('case_for_embedding', ''))
    if case_content:
        # Limit case content for prompt size
        if len(case_content) > 1000:
            case_content = case_content[:1000] + "..."
        formatted += f"Scenario:\n{case_content}\n\n"
    
    # Add solution
    solution = case_data.get('solution', '')
    if solution:
        # Limit solution for prompt size  
        if len(solution) > 800:
            solution = solution[:800] + "..."
        formatted += f"Solution:\n{solution}\n\n"
    
    # Add quality metrics if available
    if 'realism_score' in case_data:
        formatted += f"Quality Metrics: Realism={case_data['realism_score']}, "
        formatted += f"Complexity={case_data['complexity_score']}, "
        formatted += f"Educational Value={case_data['educational_value']}\n\n"
    
    return formatted

# Demonstrate prompt construction with generated cases
print("EXAMPLE OF FEW-SHOT PROMPT CONSTRUCTION WITH GENERATED CASES:")
print("=" * 70)

if not examples_df.empty:
    # Use a qualified case for demonstration
    demo_case = None
    if 'overall_qualification' in examples_df.columns:
        qualified_cases = examples_df[examples_df['overall_qualification'] == 'QUALIFIED']
        if not qualified_cases.empty:
            demo_case = qualified_cases.iloc[0]
    
    if demo_case is None:
        demo_case = examples_df.iloc[0]
    
    example_prompt = format_case_for_prompt(demo_case)
    
    # Construct a sample few-shot prompt
    few_shot_prompt = f"""
You are an expert in maritime logistics and supply chain management. 
Based on the following example cases, analyze new scenarios and provide structured solutions.

{example_prompt}

Now analyze this new scenario:
Scenario: {sample_scenario}

Provide a structured analysis following the pattern shown in the example above.
"""
    
    print(few_shot_prompt)
    print("\n" + "="*70)
    print("Note: In practice, you would use 2-3 most relevant cases retrieved from the vector database.")
else:
    print("No generated cases available for prompt construction demonstration.")

## 7. Summary and Next Steps

This notebook demonstrated the basic workflow for working with examples in the SCM logistics assistant system.

In [None]:
print("EXAMPLES NOTEBOOK SUMMARY:")
print("=" * 40)
print("✓ Loaded generated case examples from train_cases.parquet")
print("✓ Explored the structure and quality metrics of generated cases")
print("✓ Demonstrated vector-based similarity search")
print("✓ Showed how to construct few-shot prompts with real cases")
print()
print("Available data:")
if not examples_df.empty:
    print(f"  - {len(examples_df)} total generated cases")
    if 'overall_qualification' in examples_df.columns:
        qualified_count = len(examples_df[examples_df['overall_qualification'] == 'QUALIFIED'])
        print(f"  - {qualified_count} qualified cases ready for use")
    print(f"  - Quality scores available for evaluation")
else:
    print("  - No cases loaded (check file path)")
    
print()
print("Next steps:")
print("  1. Use vector search to find relevant cases for specific scenarios")
print("  2. Construct few-shot prompts with 2-3 most relevant cases")
print("  3. Apply the MedPrompt approach for improved LLM performance")
print("\nNEXT STEPS:")
print("- Try different search queries to find relevant examples")
print("- Experiment with prompt construction using multiple examples")
print("- Use examples in the main case generation pipeline (04_Case_Generation.ipynb)")
print("- Evaluate example relevance and quality for your specific use cases")
print("\nRELATED NOTEBOOKS:")
print("- 02_Test_Queries.ipynb: Advanced search and retrieval")
print("- 03_Embed_Examples_Guidelines.ipynb: Creating and updating example embeddings")
print("- 04_Case_Generation.ipynb: Using examples in case generation")