# Smart Schema: Integration Quickbook

This notebook demonstrates various integration scenarios and use cases for the `smart-schema` library. We'll show how to leverage `smart-schema` for generating Pydantic models and validating data in common workflows.

## Setup

First, let's ensure `smart-schema` is installed and import necessary modules.

In [1]:
# Ensure you have smart-schema installed
# !pip install ../

import os
import json
import pandas as pd
from pathlib import Path
from pydantic import BaseModel, ValidationError
from typing import List, Dict, Optional, Any
from datetime import datetime

from smart_schema import ModelGenerator, ModelValidator
from smart_schema.core.model_utils import save_model_to_file, load_and_validate_json_as_model

# For OpenAI based generation (optional, ensure OPENAI_API_KEY is set in your environment)
os.environ["OPENAI_API_KEY"] = "API_KEY" 

MODELS_DIR = Path("generated_integration_models")
MODELS_DIR.mkdir(exist_ok=True)

DATA_DIR = Path("integration_data")

## Use Case 1: CSV Data Ingestion Pipeline

Scenario: You receive daily sales data as CSV files. Before loading this data into your database or analytics platform, you need to validate its structure and data types.

In [2]:
csv_file_path = DATA_DIR / "sales_records.csv"
sales_df = pd.read_csv(csv_file_path)
print("Sample Sales Data:")
print(sales_df.head())

Sample Sales Data:
         Date ProductID     ProductName     Category  QuantitySold  UnitPrice  \
0  2024-07-01      P001      Laptop Pro  Electronics          10.0     1200.0   
1  2024-07-01      P002  Wireless Mouse  Accessories          50.0       25.0   
2  2024-07-01      P003    Office Chair    Furniture           5.0      150.0   
3  2024-07-02      P001      Laptop Pro  Electronics           8.0     1200.0   
4  2024-07-02      P004        Keyboard  Accessories          30.0       75.0   

   TotalPrice Region  
0     12000.0  North  
1      1250.0  North  
2       750.0   West  
3      9600.0  South  
4      2250.0   East  


### Step 1.1: Generate a Pydantic Model from the CSV

We can use `ModelGenerator` to infer a schema from the CSV data. For fields like `Date`, we can provide hints for datetime parsing. We'll use the `smart_inference=True` for more robust type detection if an OpenAI key is available, otherwise, it will fall back to basic inference.

In [3]:
sales_model_generator = ModelGenerator(name="SalesRecord", smart_inference=True) # Set to False if no OpenAI key

try:
    # Attempt smart inference first
    SalesRecordModel = sales_model_generator.from_dataframe(
        sales_df, 
        datetime_columns=["Date"]
    )
    print("Successfully generated SalesRecordModel using smart inference.")
except ValueError as e:
    print(f"Smart inference failed (likely missing API key or invalid config): {e}")
    print("Falling back to basic inference.")
    sales_model_generator_basic = ModelGenerator(name="SalesRecord", smart_inference=False)
    SalesRecordModel = sales_model_generator_basic.from_dataframe(
        sales_df, 
        datetime_columns=["Date"]
    )
    print("Successfully generated SalesRecordModel using basic inference.")

# Display the generated model schema
print("SalesRecordModel Schema:")
SalesRecordModel.model_json_schema()

Successfully generated SalesRecordModel using smart inference.
SalesRecordModel Schema:


{'properties': {'Date': {'description': 'The date of the sales record in ISO format.',
   'format': 'date-time',
   'title': 'Date',
   'type': 'string'},
  'ProductID': {'description': 'The unique identifier for the product.',
   'title': 'Productid',
   'type': 'string'},
  'ProductName': {'description': 'The name of the product sold.',
   'title': 'Productname',
   'type': 'string'},
  'Category': {'description': 'The category under which the product falls.',
   'title': 'Category',
   'type': 'string'},
  'QuantitySold': {'description': 'The quantity of the product sold.',
   'title': 'Quantitysold',
   'type': 'number'},
  'UnitPrice': {'description': 'The price per unit of the product.',
   'title': 'Unitprice',
   'type': 'number'},
  'TotalPrice': {'description': 'The total price for the quantity sold.',
   'title': 'Totalprice',
   'type': 'number'},
  'Region': {'description': 'The region where the sale took place.',
   'title': 'Region',
   'type': 'string'}},
 'required': [

### Step 1.2: Save the Generated Model (Optional but Recommended)

Saving the model to a Python file allows you to reuse it without regenerating it every time.

In [4]:
# Use the imported utility from smart_schema.core.model_utils
sales_model_file_path = MODELS_DIR / "sales_record_model.py"

save_model_to_file(SalesRecordModel, output_path=str(sales_model_file_path), model_name="SalesRecord")
print(f"SalesRecordModel saved to {sales_model_file_path}")

SalesRecordModel saved to generated_integration_models/sales_record_model.py


### Step 1.3: Validate the CSV Data

Now, use `ModelValidator` to validate the DataFrame against the generated `SalesRecordModel`.

In [5]:
sales_validator = ModelValidator(SalesRecordModel)
valid_records_df, invalid_records_list = sales_validator.validate_dataframe(sales_df)

print(f"Number of valid sales records: {len(valid_records_df)}")
print(f"Number of invalid sales records: {len(invalid_records_list)}")

if invalid_records_list:
    print("Invalid Sales Records Details:")
    for record in invalid_records_list[:3]: # Print details for the first 3 invalid records
        print(f"Record Index: {record['index']}")
        print(f"Data: {record['record']}")
        print(f"Errors: {record['errors']}")
        print("---")

Number of valid sales records: 12
Number of invalid sales records: 0


This demonstrates a typical data ingestion validation step. You could then proceed to load `valid_records_df` into your system and log/handle `invalid_records_list` for review or correction.

## Use Case 2: API Request/Response Validation

Scenario: You are developing or integrating with an API. `smart-schema` can help define and enforce data contracts for API requests and responses.

### Step 2.1: Define Schema from API Documentation or Example JSON

Imagine you have an example JSON request body for creating a product in an e-commerce system.

In [6]:
api_request_path = DATA_DIR / "product_api_request.json"
with open(api_request_path, 'r') as f:
    product_api_example = json.load(f)

print("Sample Product API Request:")
print(json.dumps(product_api_example, indent=2))

Sample Product API Request:
{
  "product_id": "P008",
  "name": "Gaming Keyboard RGB",
  "category": "Electronics",
  "price": 129.99,
  "stock": 50,
  "description": "Mechanical gaming keyboard with customizable RGB lighting.",
  "specifications": {
    "switch_type": "Blue",
    "layout": "Full-size",
    "connectivity": [
      "USB-C",
      "Bluetooth"
    ]
  },
  "supplier_info": {
    "supplier_id": "SUPP-003",
    "name": "TechGear Inc."
  },
  "is_active": true,
  "tags": [
    "gaming",
    "keyboard",
    "rgb",
    "mechanical"
  ]
}


In [7]:
product_api_model_generator = ModelGenerator(name="ProductCreateRequest", smart_inference=True) # Set to False if no OpenAI key

try:
    ProductCreateRequestModel = product_api_model_generator.from_json(product_api_example)
    print("Successfully generated ProductCreateRequestModel using smart inference.")
except ValueError as e: # Catching broader errors as from_json with OpenAI might have specific error types
    print(f"Smart inference failed: {e}")
    print("Falling back to basic inference.")
    product_api_model_generator_basic = ModelGenerator(name="ProductCreateRequest", smart_inference=False)
    ProductCreateRequestModel = product_api_model_generator_basic.from_json(product_api_example)
    print("Successfully generated ProductCreateRequestModel using basic inference.")

print("ProductCreateRequestModel Schema:")
ProductCreateRequestModel.model_json_schema()

Successfully generated ProductCreateRequestModel using smart inference.
ProductCreateRequestModel Schema:


{'$defs': {'Specifications': {'properties': {'switch_type': {'description': 'Type of the switch in the keyboard.',
     'title': 'Switch Type',
     'type': 'string'},
    'layout': {'description': 'The layout of the keyboard.',
     'title': 'Layout',
     'type': 'string'},
    'connectivity': {'description': 'Supported connectivity options for the product.',
     'items': {'type': 'string'},
     'title': 'Connectivity',
     'type': 'array'}},
   'required': ['switch_type', 'layout', 'connectivity'],
   'title': 'Specifications',
   'type': 'object'},
  'SupplierInfo': {'properties': {'supplier_id': {'description': 'The unique identifier for the supplier.',
     'title': 'Supplier Id',
     'type': 'string'},
    'name': {'description': 'The name of the supplier.',
     'title': 'Name',
     'type': 'string'}},
   'required': ['supplier_id', 'name'],
   'title': 'SupplierInfo',
   'type': 'object'}},
 'properties': {'product_id': {'description': 'The unique identifier for the produ

In [8]:
products_model_file_path = MODELS_DIR / "product_create_request_model.py"
save_model_to_file(ProductCreateRequestModel, output_path=str(products_model_file_path), model_name="ProductCreateRequest")

### Step 2.2: Validate an Incoming API Request

In your API endpoint (e.g., using FastAPI or Flask), you would parse the incoming JSON and validate it using the generated model. We will use `ModelValidator` for this.

**Note:** The `ProductCreateRequestModel` was generated based on the single example in `product_api_request.json`. If our `valid_request_data` below differs in structure (e.g., missing fields like `specifications.switch_type` which were present in the generation example), it might initially fail validation. This demonstrates how the generated model enforces the schema derived from the example. We've adjusted `valid_request_data` to include these fields.

In [9]:
# Simulate an incoming valid request
valid_request_data = {
  "product_id": "P009",
  "name": "Smart Watch Series X",
  "category": "Wearables",
  "price": 299.99,
  "stock": 150,
  "description": "Latest smart watch with advanced health tracking.",
  "specifications": {
    "switch_type": "N/A",  
    "layout": "N/A",     
    "display_type": "AMOLED", 
    "water_resistance": "5ATM",
    "connectivity": ["Bluetooth 5.2", "NFC"] 
  },
  "supplier_info": {
    "supplier_id": "SUPP-001",
    "name": "ElectroGadgets Ltd."
  },
  "is_active": True,
  "tags": ["smartwatch", "health", "wearable"]
}

# Simulate an incoming invalid request (e.g., missing required field, wrong type)
# This request is missing 'category', 'supplier_info', 'tags' which are top-level required fields
# and 'specifications' is missing 'connectivity' and has different fields than the example.
invalid_request_data = {
  "product_id": "P010",
  "name": "Budget Tablet",
  # 'category' is missing
  "price": "ninety-nine", # Wrong type
  "stock": -10, # Invalid value 
  "description": "Affordable tablet for basic use.",
  "specifications": { 
      # "switch_type": "Membrane", # If we add these, other errors would remain
      # "layout": "Compact",
      "screen_size": "10 inch", 
      "ram": "4GB"              
      # "connectivity" is missing, which was in the example.
  },
  "is_active": "yes" # Wrong type
  # supplier_info is missing
  # tags is missing
}

# Create a validator instance for the Product API model
product_api_validator = ModelValidator(ProductCreateRequestModel)

print("--- Validating a Correct Request (now adjusted) ---")
is_valid, errors = product_api_validator.validate_record(valid_request_data)
if is_valid:
    validated_model = ProductCreateRequestModel(**valid_request_data) 
    print(f"Request is valid! Product: {validated_model.name}")
    # print(validated_model.model_dump_json(indent=2)) # Optional: view the validated data
else:
    print(f"Request is invalid!")
    print(errors)

print("\n--- Validating an Incorrect Request ---")
is_valid, errors = product_api_validator.validate_record(invalid_request_data)
if is_valid:
    # This case should ideally not be reached for invalid_request_data
    print(f"Request is valid (but was expected to be invalid)!") 
else:
    print(f"Request is invalid!")
    print(errors)

--- Validating a Correct Request (now adjusted) ---
Request is valid! Product: Smart Watch Series X

--- Validating an Incorrect Request ---
Request is invalid!
[{'type': 'missing', 'loc': ('category',), 'msg': 'Field required', 'input': {'product_id': 'P010', 'name': 'Budget Tablet', 'price': 'ninety-nine', 'stock': -10, 'description': 'Affordable tablet for basic use.', 'specifications': {'screen_size': '10 inch', 'ram': '4GB'}, 'is_active': 'yes'}, 'url': 'https://errors.pydantic.dev/2.6/v/missing'}, {'type': 'float_parsing', 'loc': ('price',), 'msg': 'Input should be a valid number, unable to parse string as a number', 'input': 'ninety-nine', 'url': 'https://errors.pydantic.dev/2.6/v/float_parsing'}, {'type': 'missing', 'loc': ('specifications', 'switch_type'), 'msg': 'Field required', 'input': {'screen_size': '10 inch', 'ram': '4GB'}, 'url': 'https://errors.pydantic.dev/2.6/v/missing'}, {'type': 'missing', 'loc': ('specifications', 'layout'), 'msg': 'Field required', 'input': {'sc

This illustrates how you'd integrate `smart-schema` into an API's request handling logic to ensure data integrity. The same principle applies to validating API responses before sending them out.

## Use Case 3: Configuration File Management

Scenario: Your application relies on a JSON configuration file. `smart-schema` can validate this configuration on startup to catch errors early.

In [10]:
config_file_path = DATA_DIR / "app_config.json"
with open(config_file_path, 'r') as f:
    app_config_example = json.load(f)

print("Sample Application Configuration:")
print(json.dumps(app_config_example, indent=2))

Sample Application Configuration:
{
  "application_name": "SmartApp",
  "version": "1.2.3",
  "debug_mode": true,
  "server_settings": {
    "host": "localhost",
    "port": 8080,
    "timeout_seconds": 30
  },
  "database_connection": {
    "type": "postgresql",
    "host": "db.example.com",
    "port": 5432,
    "username": "admin_user",
    "password_env_var": "DB_PASSWORD",
    "database_name": "app_data",
    "connection_options": {
      "ssl_mode": "require",
      "max_connections": 100
    }
  },
  "feature_flags": {
    "new_dashboard": true,
    "beta_user_access": false,
    "enable_analytics": true
  },
  "logging": {
    "level": "INFO",
    "format": "%(asctime)s - %(levelname)s - %(message)s",
    "file_path": "/var/log/smart_app.log"
  },
  "api_keys": {
    "payment_gateway": "env:PAYMENT_API_KEY",
    "geocoding_service": "env:GEO_API_KEY"
  }
}


In [11]:
app_config_model_generator = ModelGenerator(name="AppConfig", smart_inference=False) # Set to False if no OpenAI key

try:
    AppConfigModel = app_config_model_generator.from_json(app_config_example)
    print("Successfully generated AppConfigModel using smart inference.")
except ValueError as e:
    print(f"Smart inference failed: {e}")
    print("Falling back to basic inference.")
    app_config_model_generator_basic = ModelGenerator(name="AppConfig", smart_inference=False)
    AppConfigModel = app_config_model_generator_basic.from_json(app_config_example)
    print("Successfully generated AppConfigModel using basic inference.")

print("AppConfigModel Schema:")
AppConfigModel.model_json_schema()

Successfully generated AppConfigModel using smart inference.
AppConfigModel Schema:


{'$defs': {'ApiKeys': {'properties': {'payment_gateway': {'title': 'Payment Gateway',
     'type': 'string'},
    'geocoding_service': {'title': 'Geocoding Service', 'type': 'string'}},
   'required': ['payment_gateway', 'geocoding_service'],
   'title': 'ApiKeys',
   'type': 'object'},
  'ConnectionOptions': {'properties': {'ssl_mode': {'title': 'Ssl Mode',
     'type': 'string'},
    'max_connections': {'title': 'Max Connections', 'type': 'integer'}},
   'required': ['ssl_mode', 'max_connections'],
   'title': 'ConnectionOptions',
   'type': 'object'},
  'DatabaseConnection': {'properties': {'type': {'title': 'Type',
     'type': 'string'},
    'host': {'title': 'Host', 'type': 'string'},
    'port': {'title': 'Port', 'type': 'integer'},
    'username': {'title': 'Username', 'type': 'string'},
    'password_env_var': {'title': 'Password Env Var', 'type': 'string'},
    'database_name': {'title': 'Database Name', 'type': 'string'},
    'connection_options': {'$ref': '#/$defs/Connectio

In [12]:
app_config_model_file_path = MODELS_DIR / "app_config_model.py"
save_model_to_file(AppConfigModel, output_path=str(app_config_model_file_path), model_name="AppConfig")

### Step 3.2: Load and Validate Configuration

On application startup, you would load the JSON config file and validate it against the `AppConfigModel`.

In [13]:
print("--- Loading and Validating Correct Configuration ---")
# Using the new utility from smart_schema
config = load_and_validate_json_as_model(config_file_path, AppConfigModel) # Ensure config_file_path is defined from previous cell

if config:
    # The function already prints basic errors, we can add a success confirmation here
    print(f"Configuration for '{config.application_name}' loaded and validated successfully by the utility.")
    if hasattr(config, 'database_connection') and config.database_connection:
         print(f"DB Host: {config.database_connection.host}")
    else:
         print("DB Host: Not available (database_connection might be missing or None in the config)")
else:
    print("Loading/validation of correct configuration failed (see errors above from the utility).")


# Create a temporary invalid config file for demonstration
invalid_config_path = DATA_DIR / "app_config_invalid.json"
invalid_config_data = app_config_example.copy() # Ensure app_config_example is defined from previous cell
invalid_config_data["server_settings"]["port"] = "not-a-port" # Invalid type

# Ensure 'database_connection' is handled correctly for creating the invalid example
if "database_connection" in invalid_config_data:
    del invalid_config_data["database_connection"] 
    
with open(invalid_config_path, 'w') as f:
    json.dump(invalid_config_data, f, indent=2)

print("\n--- Loading and Validating Incorrect Configuration ---")
# The function will print error messages internally
invalid_config_result = load_and_validate_json_as_model(invalid_config_path, AppConfigModel)

if invalid_config_result is None:
    print("Validation of incorrect config failed as expected (see errors above from the utility).")
else:
    print("Incorrect config was unexpectedly validated by the utility.")


# Clean up temporary invalid config
if invalid_config_path.exists(): # Check if file exists before trying to unlink
   invalid_config_path.unlink()

--- Loading and Validating Correct Configuration ---
Configuration for 'SmartApp' loaded and validated successfully by the utility.
DB Host: db.example.com

--- Loading and Validating Incorrect Configuration ---
Error: Configuration validation failed for integration_data/app_config_invalid.json
Validation of incorrect config failed as expected (see errors above from the utility).


This ensures your application starts with a valid configuration, preventing runtime errors due to misconfigured settings.

## Conclusion

This quickbook has demonstrated several practical use cases for `smart-schema`:

1.  **Validating CSV data** during ingestion pipelines.
2.  **Enforcing data contracts** for API requests (and responses).
3.  **Validating application configuration files** on startup.

`smart-schema` simplifies the process of creating Pydantic models from various data sources, enabling robust data validation across different parts of your applications and workflows. The smart inference capabilities (powered by OpenAI when available) further streamline model generation, especially for complex and nested data structures.