# Vizard Preprocessing Feature Test Suite

**Test Coverage:**
- Preprocessing keywords: FILTER, SELECT, DROP, SORT, ADD, GROUP, SAVE
- Delimiter syntax: `||`
- Operation ordering and dependencies
- State management (ephemeral vs persistent)
- Context detection (standalone preprocessing)
- Backwards compatibility

**Datasets:** genes.csv, sales.csv, diff_expression.csv, timeseries.csv from test/data/

## 1. Setup & Configuration

### Load imports

In [None]:
import altair as alt
import matplotlib.pyplot as plt
import seaborn as sns
import polars as pl
import pandas as pd
import numpy as np
from altair.datasets import data

### Configure theme

In [None]:
alt.renderers.enable('html')

@alt.theme.register('bioinformatics_theme', enable=True)
def bioinformatics_theme():
    return alt.theme.ThemeConfig({'width': 600, 'height': 400})

### Load Claude Code magic

In [None]:
%load_ext vizard_magic

### Verify Vizard is working

In [None]:
%%time
%cc HELP

In [None]:
%%time
%cc --model haiku

In [None]:
%%time
%cc RESET

## 2. Basic Preprocessing Tests

In [None]:
%%time
%cc DATA data/genes.csv FILTER pvalue < 0.05 || PLOT scatter X expression Y pvalue

In [None]:
%%time
%cc DATA data/genes.csv SELECT gene_name, expression || PLOT bar X gene_name Y expression

In [None]:
%%time
%cc DATA data/genes.csv ADD log2_expr as log2(expression)|| PLOT bar X gene_name Y log2_expr

In [None]:
%%time
%cc DATA data/sales.csv GROUP by category aggregating sum(revenue) || PLOT bar X category Y revenue

In [None]:
%%cc DATA data/genes.csv SELECT gene_name, expression, pvalue ADD log2_expr as log2(expression) FILTER pvalue < 0.05 and abs(log2_expr) > 1.5 
ADD neg_log10_pv as -log10(pvalue) SORT by neg_log10_pv descending || PLOT scatter X log2_expr Y neg_log10_pv TITLE Volcano-Style Plot

## 3. Preprocessing Only Tests

In [None]:
%%time
%cc DATA data/genes.csv FILTER pvalue < 0.05 SELECT gene_name, expression SAVE data/significant_genes.csv ||

In [None]:
%%time
%cc THRESHOLD 0.05 DATA data/genes.csv FILTER pvalue < THRESHOLD || PLOT scatter X expression Y pvalue

## 4. Backwards Compatibility Tests

In [None]:
%%time
%cc DATA data/genes.csv PLOT scatter X expression Y pvalue

## 5. Additional Preprocessing Keywords

In [None]:
%%time
%cc DATA data/genes.csv DROP columns significant || PLOT scatter X expression Y pvalue

In [None]:
%%time
%cc DATA data/genes.csv SORT by expression descending || PLOT bar X gene_name Y expression

In [None]:
%%time
%cc DATA data/genes.csv ADD log2_expr as log2(expression) ADD abs_log2 as abs(log2_expr) || PLOT bar X gene_name Y abs_log2

## 6. State Management Tests

In [None]:
%%time
%cc KEYS

In [None]:
%%time
%cc KEYS

**Expected:** Should show DATA, PLOT, X, Y but NOT FILTER

In [None]:
%%time
%cc KEYS

In [None]:
%%time
%cc KEYS

**Expected:** Should show THRESHOLD: 0.05

## 7. Complex Tests

In [None]:
%%cc DATA data/sales.csv GROUP by category aggregating sum(revenue) as total, count() as n_products
|| PLOT bar X category Y total

In [None]:
%%time
%cc DATA data/genes.csv FILTER pvalue < 0.05 FILTER expression > 2.0 || PLOT scatter X expression Y pvalue

## 8. Context Detection Tests

In [None]:
%%time
%cc DATA data/sales.csv SELECT category, revenue FILTER revenue > 1000 GROUP by category aggregating sum(revenue) as total SAVE data/aggregated.csv

**Expected:** Should generate preprocessing only (no chart), df variable available

## 9. Operation Ordering Tests

In [None]:
%%time
%cc DATA data/genes.csv ADD log2_expr as log2(expression) ADD abs_log2 as abs(log2_expr) FILTER abs_log2 > 1.5 || PLOT bar X gene_name Y abs_log2

**Expected:** Tests that operations execute in order, derived columns available for subsequent ops

In [None]:
%%time
%cc DATA df PLOT bar X gene_name Y expression

In [None]:
%%time
%cc DATA df PLOT bar X gene_name Y expression

**Expected:** Should use df variable from previous cell

In [None]:
%%cc DATA data/genes.csv SELECT gene_name, expression, pvalue ADD log_expr as log2(expression)
FILTER log_expr > 3.0
SORT by log_expr descending
|| PLOT bar X gene_name Y log_expr

## 10. Combined Tests

In [None]:
%%time
%cc DATA data/genes.csv FILTER pvalue < 0.05 SELECT gene_name, expressionSAVE data/significant.csv || PLOT bar X gene_name Y expression

**Expected:** Should save CSV AND generate chart

## Test Summary

Run all tests above and verify:
- ✓ All preprocessing keywords work independently
- ✓ Complex multi-step chains generate correct code
- ✓ Preprocessing-only mode generates `df` variable
- ✓ Backwards compatibility maintained (no `||` works as before)
- ✓ State management correct (ephemeral vs persistent)
- ✓ Natural language expressions convert correctly to Polars
- ✓ Chained operations (no intermediate variables)
- ✓ Operation ordering preserved (ADD dependencies work)