# üìã Log Shredding

Parse unstructured logs into multiple relational tables using `loclean.shred_to_relations`.

**Use case:** You have a single column of raw server logs ‚Äî the LLM infers a relational schema (events, users, errors) and generates a parser that separates them into normalized DataFrames.

In [None]:
import polars as pl

import loclean

## Create raw log data

5 realistic server log entries mixing auth, API, payment, inventory, and ML events:

In [None]:
df = pl.DataFrame(
    {
        "log_entry": [
            (
                "2024-01-15 08:23:11 INFO  [auth-service] "
                "User john.doe@corp.com logged in from 192.168.1.42 "
                "using Chrome/120.0 on Windows 11. "
                "Session: sess_abc123. MFA: enabled."
            ),
            (
                "2024-01-15 08:24:05 WARN  [api-gateway] "
                "Rate limit approaching for client_id=clt_789 "
                "(plan: enterprise, limit: 10000/min, "
                "current: 8500/min). Endpoint: /v2/search."
            ),
            (
                "2024-01-15 08:25:30 ERROR [payment-svc] "
                "Transaction tx_456def failed for user jane.smith "
                "‚Äî amount: $149.99 USD, method: visa_*4242, "
                "reason: insufficient_funds. Retry #2 of 3."
            ),
            (
                "2024-01-15 08:26:00 INFO  [inventory] "
                "Stock update: SKU=WDG-1001, warehouse=US-EAST-1, "
                "qty_before=250, qty_after=248, "
                "order_id=ORD-2024-5678."
            ),
            (
                "2024-01-15 08:27:45 DEBUG [ml-pipeline] "
                "Model inference complete: model=fraud_v3.2, "
                "latency_ms=42, input_features=128, "
                "prediction=0.02, threshold=0.5, decision=ALLOW."
            ),
        ]
    }
)

for entry in df["log_entry"].to_list():
    print(entry)
    print()

## Shred into relational tables

In [None]:
tables = loclean.shred_to_relations(df, "log_entry", sample_size=5, max_retries=3)

print(f"Shredded 1 column ‚Üí {len(tables)} relational tables\n")

for name, tbl in tables.items():
    print(f"‚îÅ‚îÅ‚îÅ {name} ({len(tbl)} rows) ‚îÅ‚îÅ‚îÅ")
    print(tbl)
    print()