Skip to content

v0.1.0

@ramannanda9 ramannanda9 tagged this 30 Mar 19:45
Core architecture with provider-based sketch profiling and LLM interpretation:

- Sketch engine: MinHash, HLL, KLL via datasketches with mergeable baselines
- Sketch providers: pandas, spark (distributed), streaming
- Column profiling: deterministic scalar metrics per column per run
- Signal detection: Jaccard decay, cardinality shifts, null rate, schema drift
- Two-tier LLM interpretation: Anthropic and OpenAI providers
- Storage backends: Parquet (zero-infra) and DuckDB (optional)
- Heuristic rules engine for fast severity classification
- CI pipeline with per-provider test matrix and ruff/mypy checks
- Tag-triggered PyPI publish workflow with OIDC trusted publishing
- Pre-commit hooks with ruff linting and formatting (line-length 120)
Assets 2
Loading