# UNSPSC Hierarchical Classification with AI_CLASSIFY

Demonstration of using AI_CLASSIFY for hierarchical product classification.

In [None]:
USE DATABASE demodb;
USE SCHEMA UNSPSC_CODE_PROJECT;

## Hierarchical Classification

UNSPSC has 4 levels: Segment → Family → Class → Commodity

Given segment and family codes, we predict the class using AI_CLASSIFY.

## Ground Truth Hierarchy Table

The UNSPSC_CODES_UNDP table stores the complete hierarchical taxonomy. This is valuable because:

- **Grounded System**: AI_CLASSIFY uses real taxonomy codes, not made-up categories
- **Search Space Reduction**: Instead of all possible classes, we only use classes that exist within a specific segment/family
- **Easy Updates**: When UNSPSC releases new codes, just update the table
- **Consistent Classifications**: Everyone uses the same standardized hierarchy


In [None]:
-- Example: Show how many classes are available for a specific segment/family
SELECT 
    SEGMENT,
    FAMILY, 
    SEGMENT_TITLE,
    FAMILY_TITLE,
    COUNT(DISTINCT CLASS) as classes_available,
    COUNT(DISTINCT COMMODITY) as commodities_available
FROM UNSPSC_CODES_UNDP 
WHERE SEGMENT = 10000000 AND FAMILY = 10100000  -- Live Animals example
GROUP BY SEGMENT, FAMILY, SEGMENT_TITLE, FAMILY_TITLE;


In [None]:
-- Show the actual class options that would be passed to AI_CLASSIFY
SELECT DISTINCT 
    CLASS,
    CLASS_TITLE
FROM UNSPSC_CODES_UNDP 
WHERE SEGMENT = 10000000 AND FAMILY = 10100000  -- Live Animals
ORDER BY CLASS;


## Synthetic Test Products

These are example products with known segment/family and ground truth class codes for testing.

In [None]:
SELECT product_id, 
       product_description,
       known_segment_code, 
       known_family_code,
       actual_class_code
FROM classification_test_products;

## AI Classification with CTE

This uses Common Table Expressions (CTEs) to:
1. Get test products
2. Build class options array from UNSPSC data for each product's segment/family
3. Call AI_CLASSIFY with the options

To adapt: Change table names and column names for your data.

In [None]:
-- Create table with AI classification results
CREATE OR REPLACE TABLE ai_classification_results AS
WITH all_products AS (
    SELECT 
        product_id,
        product_description,
        known_segment_code,
        known_family_code,
        actual_class_code
    FROM classification_test_products
),
products_with_class_options AS (
    SELECT 
        p.product_id,
        p.product_description,
        p.known_segment_code,
        p.known_family_code,
        p.actual_class_code,
        ARRAY_AGG(DISTINCT u.CLASS_TITLE) as class_options_array,
        COUNT(DISTINCT u.CLASS) as available_class_count
    FROM all_products p
    JOIN UNSPSC_CODES_UNDP u ON u.SEGMENT = p.known_segment_code 
                             AND u.FAMILY = p.known_family_code
                             AND u.CLASS IS NOT NULL
    GROUP BY p.product_id, p.product_description, p.known_segment_code, p.known_family_code, p.actual_class_code
)
SELECT 
    product_id,
    product_description,
    known_family_code,
    actual_class_code as ground_truth_class_code,
    available_class_count as search_space_size,
    AI_CLASSIFY(product_description, class_options_array) as ai_classify_result
FROM products_with_class_options
ORDER BY product_id;

Select * from ai_classification_results
limit 10;

## Pre-Process for Accuracy Evaluation 

AI_CLASSIFY returns class titles (like "Livestock") but our ground truth is class codes (like 10101500). We need to map the predicted labels back to class codes for comparison.


In [None]:
-- Create temporary table with mapped predictions
CREATE OR REPLACE TEMPORARY TABLE accuracy_results AS
SELECT 
    a.product_id,
    a.known_family_code,
    a.ground_truth_class_code,
    a.search_space_size,
    a.ai_classify_result:labels[0]::STRING as predicted_class_title,
    u.CLASS as predicted_class_code
FROM ai_classification_results a
LEFT JOIN (SELECT DISTINCT CLASS, CLASS_TITLE FROM UNSPSC_CODES_UNDP) u 
    ON u.CLASS_TITLE = a.ai_classify_result:labels[0]::STRING;

Select * from accuracy_results;


## Accuracy Detection

Compare AI_CLASSIFY results with ground truth to measure accuracy.

In [None]:
-- General accuracy using preprocessed results
SELECT 
    COUNT(*) as total_products,
    COUNT(CASE WHEN predicted_class_code = ground_truth_class_code THEN 1 END) as correct_predictions,
    ROUND(COUNT(CASE WHEN predicted_class_code = ground_truth_class_code THEN 1 END) * 100.0 / COUNT(*), 1) as accuracy_pct
FROM accuracy_results;

In [None]:
-- Per family accuracy using preprocessed results
SELECT 
    known_family_code,
    COUNT(*) as products_tested,
    COUNT(CASE WHEN predicted_class_code = ground_truth_class_code THEN 1 END) as correct_predictions,
    ROUND(COUNT(CASE WHEN predicted_class_code = ground_truth_class_code THEN 1 END) * 100.0 / COUNT(*), 1) as family_accuracy_pct
FROM accuracy_results
GROUP BY known_family_code
ORDER BY family_accuracy_pct DESC;