-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the tsauditor wiki!
A data quality auditing library for time-series tabular data, with a focus on financial and sensor domains.
tsauditor scans a pandas DataFrame and returns a structured report of structural problems, anomalies, and — its core contribution — data leakage between features and the prediction target.
A same-day percentage-change feature (ChangeP) in an OGDC stock-direction model was mathematically near-identical to the prediction target. With it included, a Random Forest classifier reached 99.68% accuracy. Removing it dropped accuracy to 69.81% — a more honest number. Nothing about the feature looked wrong on inspection. No standard profiling tool caught it, because standard tools treat tabular data as i.i.d. and don't reason about when information was actually available relative to the prediction point.
tsauditor exists to catch this class of mistake automatically.
| Page | What it covers |
|---|---|
| Installation | pip install, requirements, development setup |
| Quickstart | Your first scan in five minutes |
| How It Works | The three modules explained |
| Issue Code Reference | Every PRF / ANO / LEK code |
| API Reference | scan(), GuardReport, Issue |
| Domain Presets | finance vs sensor differences |
| Contributing | How to open a PR or propose a feature |
import tsauditor as tsa
report = tsa.scan(df, target="Direction", domain="finance")
report.summary() # rich CLI table
report.critical # list of issues that block modeling
report.to_json("out.json")────────────────── tsauditor Report ──────────────────
Dataset
Rows : 299
Columns : 4
Time range : 2020-01-02 → 2021-02-23
Frequency : daily
Critical: 1 Warnings: 4 Info: 1
Severity Code Module Column Description
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CRITICAL LEK001 leakage ChangeP Feature near-deterministically
reproduces target 'Direction'
(auc=1.0000 >= 0.95)
-
Detect, never modify.
tsauditorreports and suggests. It never drops rows, fills gaps, or removes columns. Every remediation decision stays with the user. - Time-aware. Every check reasons about the temporal order of your data — not just its distribution.
-
Domain-aware. Finance and sensor data have different thresholds for "normal." The
domainparameter tunes every check accordingly. - Programmatic-first. The report is a structured Python object, not just a printed table. Filter by code, module, or severity; export to JSON; integrate into pipelines.