Skip to content
Iman edited this page Jun 21, 2026 · 1 revision

Welcome to the tsauditor wiki!

tsauditor

A data quality auditing library for time-series tabular data, with a focus on financial and sensor domains.

tsauditor scans a pandas DataFrame and returns a structured report of structural problems, anomalies, and — its core contribution — data leakage between features and the prediction target.


Why tsauditor exists

A same-day percentage-change feature (ChangeP) in an OGDC stock-direction model was mathematically near-identical to the prediction target. With it included, a Random Forest classifier reached 99.68% accuracy. Removing it dropped accuracy to 69.81% — a more honest number. Nothing about the feature looked wrong on inspection. No standard profiling tool caught it, because standard tools treat tabular data as i.i.d. and don't reason about when information was actually available relative to the prediction point.

tsauditor exists to catch this class of mistake automatically.


Quick navigation

Page What it covers
Installation pip install, requirements, development setup
Quickstart Your first scan in five minutes
How It Works The three modules explained
Issue Code Reference Every PRF / ANO / LEK code
API Reference scan(), GuardReport, Issue
Domain Presets finance vs sensor differences
Contributing How to open a PR or propose a feature

At a glance

import tsauditor as tsa
 
report = tsa.scan(df, target="Direction", domain="finance")
report.summary()          # rich CLI table
report.critical           # list of issues that block modeling
report.to_json("out.json")
────────────────── tsauditor Report ──────────────────
 
Dataset
  Rows       : 299
  Columns    : 4
  Time range : 2020-01-02 → 2021-02-23
  Frequency  : daily
 
Critical: 1  Warnings: 4  Info: 1
 
 Severity   Code    Module    Column    Description
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 CRITICAL   LEK001  leakage   ChangeP   Feature near-deterministically
                                        reproduces target 'Direction'
                                        (auc=1.0000 >= 0.95)

Design philosophy

  • Detect, never modify. tsauditor reports and suggests. It never drops rows, fills gaps, or removes columns. Every remediation decision stays with the user.
  • Time-aware. Every check reasons about the temporal order of your data — not just its distribution.
  • Domain-aware. Finance and sensor data have different thresholds for "normal." The domain parameter tunes every check accordingly.
  • Programmatic-first. The report is a structured Python object, not just a printed table. Filter by code, module, or severity; export to JSON; integrate into pipelines.

Clone this wiki locally