# ðŸš€ Semantix Demo

Welcome to **Semantix**, a high-performance semantic data cleaning library.
This notebook demonstrates how to normalize, transform, and clean messy data using local LLMs.

In [None]:
import polars as pl
import semantix

### 1. Normalization: Weights -> kg
Convert various weight formats (g, kg) into a standardized `kg` float.

In [None]:
df_weights = pl.DataFrame({"raw_weight": ["500g", "10kg", "1000g"]})
print("Input Data:")
print(df_weights)

print("\nCleaning...")
df_clean = semantix.clean(df_weights, target_col="raw_weight", instruction="Convert to kg")
print(df_clean)

### 2. Transformation: Currency -> USD
Convert mixed currency strings into USD, assuming `1 EUR = 1.1 USD`.

In [None]:
df_price = pl.DataFrame({"raw_price": ["100 EUR", "$50", "200eur"]})
print("Input Data:")
print(df_price)

print("\nCleaning...")
df_clean_price = semantix.clean(df_price, target_col="raw_price", instruction="Convert to USD. Assume 1 EUR = 1.1 USD.")
print(df_clean_price)

### 3. Logic: Temperature -> Celsius
Convert temperatures (Fahrenheit/Celsius) to Celsius.

In [None]:
df_temp = pl.DataFrame({"raw_temp": ["100C", "32F", "212 F"]})
print("Input Data:")
print(df_temp)

print("\nCleaning...")
df_clean_temp = semantix.clean(df_temp, target_col="raw_temp", instruction="Convert to Celsius")
print(df_clean_temp)

### âš¡ Persistent Caching
Semantix automatically caches results. Try re-running the cells above!
The second run will be nearly instant (microsecond latency) as it skips the LLM inference.