# ðŸ§  Inference Tests

This notebook demonstrates how to use the `text2sql-eval-toolkit` inference tools for SQL generation.

## Overview

The toolkit provides several inference client APIs:
- **WXAIClientChatAPI**: IBM watsonx.ai models
- **VLLMClientChatAPI**: vLLM-hosted models (OpenAI-compatible)
- **ClaudeClientChatAPI**: Anthropic Claude models
- **OpenAIClientChatAPI**: OpenAI or LiteLLM proxy models

And helper classes:
- **Text2SQLPrompt**: Constructs prompts from questions and schemas
- **postprocess_sql**: Cleans up generated SQL

## Setup

In [11]:
# Auto-reload for development
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [12]:
import os
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables for LLM API credentials from project root
# The .env file should be in the project root directory (works when cwd is project root or notebooks/)
project_root = Path.cwd() if (Path.cwd() / ".env").exists() else Path.cwd().parent
env_path = project_root / '.env'
load_dotenv(env_path, override=True)

print(f"File exists: {env_path.exists()}")
print(f"Loading .env from: {env_path}")

File exists: True
Loading .env from: /Users/oktie/code/text2sql/text2sql-eval-toolkit/.env


## Example 1: Using Text2SQLPrompt Helper

The `Text2SQLPrompt` class helps construct prompts from a question and database schema.

In [13]:
from text2sql_eval_toolkit.inference.inference_tools import Text2SQLPrompt

# Define a simple schema
schema = {
    "description": "Concert and singer database",
    "tables": [
        {
            "name": "singer",
            "description": "Information about singers",
            "columns": [
                {"name": "Singer_ID", "type": "INT", "primary_key": True, "samples": [1, 2, 3, 4, 5]},
                {"name": "Name", "type": "TEXT", "samples": ["Joe Sharp", "Timbaland", "Justin Brown"]},
                {"name": "Country", "type": "TEXT", "samples": ["Netherlands", "United States", "France"]},
                {"name": "Song_Name", "type": "TEXT", "samples": ["You", "Dangerous", "Gentleman"]},
                {"name": "Song_release_year", "type": "TEXT", "samples": ["1992", "2008", "2014"]},
                {"name": "Age", "type": "INT", "samples": [52, 32, 29, 41, 43]},
                {"name": "Is_male", "type": "BOOL", "samples": ["F", "T"]}
            ]
        },
        {
            "name": "concert",
            "description": "Information about concerts",
            "columns": [
                {"name": "concert_ID", "type": "INT", "primary_key": True},
                {"name": "concert_Name", "type": "TEXT"},
                {"name": "Theme", "type": "TEXT"},
                {"name": "Stadium_ID", "type": "TEXT"},
                {"name": "Year", "type": "TEXT"}
            ]
        }
    ]
}

# Create a prompt
question = "What were the names and ages of all singers who released the song 'Gentleman'?"
db_type = "sqlite"

prompt = Text2SQLPrompt(question, schema, db_type)

# View the generated prompt
print("Generated Prompt:")
print("=" * 80)
print(prompt.prompt)

Generated Prompt:
Your task is to convert a natural language question into an accurate SQL query using the given sqlite database schema.

**Question:**:
What were the names and ages of all singers who released the song 'Gentleman'?

**Database Engine / Dialect:**:
sqlite

**Schema:**
Database description: Concert and singer database

Table: singer
  Description: Information about singers
  Columns:
    - Singer_ID (INT) (Primary Key) # Example values: 1, 2, 3, 4, 5
    - Name (TEXT) # Example values: Joe Sharp, Timbaland, Justin Brown
    - Country (TEXT) # Example values: Netherlands, United States, France
    - Song_Name (TEXT) # Example values: You, Dangerous, Gentleman
    - Song_release_year (TEXT) # Example values: 1992, 2008, 2014
    - Age (INT) # Example values: 52, 32, 29, 41, 43
    - Is_male (BOOL) # Example values: F, T

Table: concert
  Description: Information about concerts
  Columns:
    - concert_ID (INT) (Primary Key)
    - concert_Name (TEXT)
    - Theme (TEXT)
    

## Example 2: Using WXAIClientChatAPI (IBM watsonx.ai)

Generate SQL using IBM watsonx.ai models.

**Required environment variables:**
- `WATSONX_APIKEY`
- `WATSONX_API_BASE`
- `WATSONX_PROJECTID`

In [14]:
from text2sql_eval_toolkit.inference.inference_tools import WXAIClientChatAPI

# Configure model
model_name = "meta-llama/llama-3-3-70b-instruct"
model_parameters = {
    "max_new_tokens": 512,
    "temperature": 0.0,
}

# Create client
client = WXAIClientChatAPI(model_name, model_parameters)

# Generate SQL using Text2SQLPrompt
predicted_sql, token_usage = client.generate_sql(prompt)

print("\nGenerated SQL:")
print("=" * 80)
print(predicted_sql)
print("\nToken Usage:")
print(token_usage)

14:56:49 - DEBUG - Inference with constructed chat prompt: [{'role': 'system', 'content': 'You are a SQL expert. Your task is to convert natural language questions into accurate SQL queries using the given database schema and instructions.'}, {'role': 'user', 'content': "Your task is to convert a natural language question into an accurate SQL query using the given sqlite database schema.\n\n**Question:**:\nWhat were the names and ages of all singers who released the song 'Gentleman'?\n\n**Database Engine / Dialect:**:\nsqlite\n\n**Schema:**\nDatabase description: Concert and singer database\n\nTable: singer\n  Description: Information about singers\n  Columns:\n    - Singer_ID (INT) (Primary Key) # Example values: 1, 2, 3, 4, 5\n    - Name (TEXT) # Example values: Joe Sharp, Timbaland, Justin Brown\n    - Country (TEXT) # Example values: Netherlands, United States, France\n    - Song_Name (TEXT) # Example values: You, Dangerous, Gentleman\n    - Song_release_year (TEXT) # Example value

## Example 3: Using Custom Chat Prompts

You can also provide custom chat messages instead of using Text2SQLPrompt.

In [15]:
# Define custom chat messages
custom_chat_prompt = [
    {
        "role": "system",
        "content": "You are a SQL expert. Generate only SQL queries without explanations."
    },
    {
        "role": "user",
        "content": (
            "Convert this question to SQL:\n\n"
            "Question: List all singers from the United States\n\n"
            "Schema:\n"
            "Table: singer\n"
            "  - Singer_ID (INT, Primary Key)\n"
            "  - Name (TEXT)\n"
            "  - Country (TEXT)\n"
            "  - Age (INT)\n\n"
            "Output format: ```sql\n<query>\n```"
        )
    }
]

# Generate SQL with custom prompt
predicted_sql, token_usage = client.generate_sql(custom_chat_prompt)

print("Generated SQL:")
print("=" * 80)
print(predicted_sql)

14:56:50 - DEBUG - Inference with provided chat prompt: [{'role': 'system', 'content': 'You are a SQL expert. Generate only SQL queries without explanations.'}, {'role': 'user', 'content': 'Convert this question to SQL:\n\nQuestion: List all singers from the United States\n\nSchema:\nTable: singer\n  - Singer_ID (INT, Primary Key)\n  - Name (TEXT)\n  - Country (TEXT)\n  - Age (INT)\n\nOutput format: ```sql\n<query>\n```'}]


14:56:50 - DEBUG - Token usage: {'prompt_tokens': 108, 'completion_tokens': 18, 'total_tokens': 126}

14:56:50 - DEBUG - Generated SQL: SELECT Name 
FROM singer 
WHERE Country = 'United States'

Generated SQL:
SELECT Name 
FROM singer 
WHERE Country = 'United States'


## Example 4: Using ClaudeClientChatAPI (Anthropic)

Generate SQL using Anthropic's Claude models.

**Required environment variables:**
- `ANTHROPIC_API_KEY`

In [21]:
from text2sql_eval_toolkit.inference.inference_tools import ClaudeClientChatAPI

try:
    # Configure model
    model_name = "claude-sonnet-4-5"
    model_parameters = {
        "max_tokens": 512,
        "temperature": 0.0,
    }

    # Create client
    claude_client = ClaudeClientChatAPI(model_name, model_parameters)

    # Generate SQL
    predicted_sql, token_usage = claude_client.generate_sql(prompt)

    print("Generated SQL:")
    print("=" * 80)
    print(predicted_sql)
    print("\nToken Usage:")
    print(token_usage)
except (ValueError, ImportError) as e:
    print(f"Skipped (set ANTHROPIC_API_KEY to run): {e}")

14:58:34 - DEBUG - Inference with constructed chat prompt: [{'role': 'user', 'content': "Your task is to convert a natural language question into an accurate SQL query using the given sqlite database schema.\n\n**Question:**:\nWhat were the names and ages of all singers who released the song 'Gentleman'?\n\n**Database Engine / Dialect:**:\nsqlite\n\n**Schema:**\nDatabase description: Concert and singer database\n\nTable: singer\n  Description: Information about singers\n  Columns:\n    - Singer_ID (INT) (Primary Key) # Example values: 1, 2, 3, 4, 5\n    - Name (TEXT) # Example values: Joe Sharp, Timbaland, Justin Brown\n    - Country (TEXT) # Example values: Netherlands, United States, France\n    - Song_Name (TEXT) # Example values: You, Dangerous, Gentleman\n    - Song_release_year (TEXT) # Example values: 1992, 2008, 2014\n    - Age (INT) # Example values: 52, 32, 29, 41, 43\n    - Is_male (BOOL) # Example values: F, T\n\nTable: concert\n  Description: Information about concerts\n  

## Example 5: Using OpenAIClientChatAPI with Ollama (DeepSeek-R1)

Generate SQL using Ollama's DeepSeek-R1 model via OpenAI-compatible API.

**Required environment variables:**
- `OPENAI_API_KEY` (can be any value like "ollama")
- `OPENAI_BASE_URL` (e.g., "http://localhost:11434/v1")

**Prerequisites:**
- Ollama must be installed and running: `brew install ollama && ollama serve`
- DeepSeek-R1:14b model must be pulled: `ollama pull deepseek-r1:14b`

In [17]:
from text2sql_eval_toolkit.inference.inference_tools import OpenAIClientChatAPI

try:
    # Configure Ollama model (DeepSeek-R1:14b)
    model_name = "deepseek-r1:14b"
    model_parameters = {
        "max_tokens": 512,
        "temperature": 0.0,
    }

    # Create client (uses OPENAI_BASE_URL or OLLAMA_BASE_URL from .env)
    openai_client = OpenAIClientChatAPI(model_name, model_parameters)

    # Generate SQL
    predicted_sql, token_usage = openai_client.generate_sql(prompt)

    print("Generated SQL:")
    print("=" * 80)
    print(predicted_sql)
    print("\nToken Usage:")
    print(token_usage)
except (ImportError, ValueError) as e:
    print(f"Skipped (install openai and set OPENAI_BASE_URL/OPENAI_API_KEY or OLLAMA_BASE_URL to run): {e}")

14:56:50 - DEBUG - Inference with constructed chat prompt: [{'role': 'system', 'content': 'You are a SQL expert. Your task is to convert natural language questions into accurate SQL queries using the given database schema and instructions.'}, {'role': 'user', 'content': "Your task is to convert a natural language question into an accurate SQL query using the given sqlite database schema.\n\n**Question:**:\nWhat were the names and ages of all singers who released the song 'Gentleman'?\n\n**Database Engine / Dialect:**:\nsqlite\n\n**Schema:**\nDatabase description: Concert and singer database\n\nTable: singer\n  Description: Information about singers\n  Columns:\n    - Singer_ID (INT) (Primary Key) # Example values: 1, 2, 3, 4, 5\n    - Name (TEXT) # Example values: Joe Sharp, Timbaland, Justin Brown\n    - Country (TEXT) # Example values: Netherlands, United States, France\n    - Song_Name (TEXT) # Example values: You, Dangerous, Gentleman\n    - Song_release_year (TEXT) # Example value

## Example 6: Post-processing SQL

The `postprocess_sql` function cleans up generated SQL by removing markdown fences and extra whitespace.

In [18]:
from text2sql_eval_toolkit.inference.inference_tools import postprocess_sql

# Example raw outputs from LLMs
raw_outputs = [
    "```sql\nSELECT * FROM users;\n```",
    "```\nSELECT name FROM products WHERE price > 100;\n```",
    "SELECT id, email FROM customers;",
    "sql\nSELECT COUNT(*) FROM orders;\n",
]

print("Post-processing Examples:")
print("=" * 80)
for i, raw in enumerate(raw_outputs, 1):
    cleaned = postprocess_sql(raw)
    print(f"\nExample {i}:")
    print(f"Raw:     {repr(raw)}")
    print(f"Cleaned: {repr(cleaned)}")

Post-processing Examples:

Example 1:
Raw:     '```sql\nSELECT * FROM users;\n```'
Cleaned: 'SELECT * FROM users'

Example 2:
Raw:     '```\nSELECT name FROM products WHERE price > 100;\n```'
Cleaned: 'SELECT name FROM products WHERE price > 100'

Example 3:
Raw:     'SELECT id, email FROM customers;'
Cleaned: 'SELECT id, email FROM customers'

Example 4:
Raw:     'sql\nSELECT COUNT(*) FROM orders;\n'
Cleaned: 'SELECT COUNT(*) FROM orders'


## Example 7: Using with Evidence/Hints

Text2SQLPrompt supports adding evidence or hints to help the model.

In [19]:
# Create prompt with evidence
question_with_hint = "What is the average age of male singers?"
evidence = "Is_male = 'T' means male; Use AVG() function for average"

prompt_with_evidence = Text2SQLPrompt(
    utterance=question_with_hint,
    schema=schema,
    db_type="sqlite",
    evidence=evidence
)

print("Prompt with Evidence:")
print("=" * 80)
print(prompt_with_evidence.prompt)

Prompt with Evidence:
Your task is to convert a natural language question into an accurate SQL query using the given sqlite database schema.

**Question:**:
What is the average age of male singers?

**Database Engine / Dialect:**:
sqlite

**Schema:**
Database description: Concert and singer database

Table: singer
  Description: Information about singers
  Columns:
    - Singer_ID (INT) (Primary Key) # Example values: 1, 2, 3, 4, 5
    - Name (TEXT) # Example values: Joe Sharp, Timbaland, Justin Brown
    - Country (TEXT) # Example values: Netherlands, United States, France
    - Song_Name (TEXT) # Example values: You, Dangerous, Gentleman
    - Song_release_year (TEXT) # Example values: 1992, 2008, 2014
    - Age (INT) # Example values: 52, 32, 29, 41, 43
    - Is_male (BOOL) # Example values: F, T

Table: concert
  Description: Information about concerts
  Columns:
    - concert_ID (INT) (Primary Key)
    - concert_Name (TEXT)
    - Theme (TEXT)
    - Stadium_ID (TEXT)
    - Year (TE

## Example 8: Comparing Multiple Models

Generate SQL with multiple models and compare results.

In [20]:
import pandas as pd

# Define test question
test_question = "How many singers are from each country?"
test_prompt = Text2SQLPrompt(test_question, schema, "sqlite")

# List of clients to test (comment out unavailable ones)
clients_to_test = [
    ("WatsonX Llama-3.3-70B", WXAIClientChatAPI("meta-llama/llama-3-3-70b-instruct", {"max_new_tokens": 512})),
    # ("vLLM Qwen2.5-7B", VLLMClientChatAPI("Qwen/Qwen2.5-7B-Instruct", {"max_tokens": 512})),
    # ("Claude Sonnet", ClaudeClientChatAPI("claude-3-5-sonnet-20241022", {"max_tokens": 512})),
    # ("GPT-4o-mini", OpenAIClientChatAPI("gpt-4o-mini", {"max_tokens": 512})),
]

results = []
for model_name, client in clients_to_test:
    try:
        sql, usage = client.generate_sql(test_prompt)
        results.append({
            "Model": model_name,
            "SQL": sql,
            "Tokens": usage.get("total_tokens") if usage else "N/A"
        })
    except Exception as e:
        results.append({
            "Model": model_name,
            "SQL": f"Error: {str(e)}",
            "Tokens": "N/A"
        })

# Display results
if results:
    df = pd.DataFrame(results)
    print("\nComparison Results:")
    print("=" * 80)
    for _, row in df.iterrows():
        print(f"\n{row['Model']}:")
        print(f"SQL: {row['SQL']}")
        print(f"Tokens: {row['Tokens']}")
else:
    print("No clients configured. Uncomment client configurations above to test.")

14:56:57 - DEBUG - Inference with constructed chat prompt: [{'role': 'system', 'content': 'You are a SQL expert. Your task is to convert natural language questions into accurate SQL queries using the given database schema and instructions.'}, {'role': 'user', 'content': 'Your task is to convert a natural language question into an accurate SQL query using the given sqlite database schema.\n\n**Question:**:\nHow many singers are from each country?\n\n**Database Engine / Dialect:**:\nsqlite\n\n**Schema:**\nDatabase description: Concert and singer database\n\nTable: singer\n  Description: Information about singers\n  Columns:\n    - Singer_ID (INT) (Primary Key) # Example values: 1, 2, 3, 4, 5\n    - Name (TEXT) # Example values: Joe Sharp, Timbaland, Justin Brown\n    - Country (TEXT) # Example values: Netherlands, United States, France\n    - Song_Name (TEXT) # Example values: You, Dangerous, Gentleman\n    - Song_release_year (TEXT) # Example values: 1992, 2008, 2014\n    - Age (INT) # 

## Summary

This notebook demonstrated:

1. **Text2SQLPrompt**: Helper class for constructing prompts from questions and schemas
2. **WXAIClientChatAPI**: IBM watsonx.ai integration
3. **VLLMClientChatAPI**: vLLM (OpenAI-compatible) integration
4. **ClaudeClientChatAPI**: Anthropic Claude integration
5. **OpenAIClientChatAPI**: OpenAI/LiteLLM integration
6. **postprocess_sql**: SQL cleaning utility
7. **Custom prompts**: Using chat message format directly
8. **Evidence/hints**: Adding context to prompts
9. **Model comparison**: Testing multiple models

### Next Steps

- See `scripts/inference/run_inference.py` for batch inference examples
- Check `src/text2sql_eval_toolkit/inference/baseline_llm_pipeline.py` for pipeline usage
- Explore `src/text2sql_eval_toolkit/inference/agentic_pipeline.py` for advanced agentic workflows