Skip to content

feat: strengthen CSV analysis safety, error handling, and robustness#7

Merged
rad1092 merged 2 commits into
mainfrom
codex/start-project-according-to-readme.md
Feb 13, 2026
Merged

feat: strengthen CSV analysis safety, error handling, and robustness#7
rad1092 merged 2 commits into
mainfrom
codex/start-project-according-to-readme.md

Conversation

@rad1092
Copy link
Copy Markdown
Owner

@rad1092 rad1092 commented Feb 13, 2026

Motivation

  • Make CSV ingestion more robust against empty input, missing headers, mixed types, different encodings and large UI payloads.
  • Prevent long-running external calls by adding timeouts to ollama invocations from CLI and web UI.
  • Provide richer numeric diagnostics (IQR-based outlier counts) to improve downstream prompt clarity.

Description

  • Added AnalysisError and explicit validation in bitnet_tools/analysis.py to surface errors for empty CSV content and missing headers, and trimmed/validated header names.
  • Implemented encoding fallbacks when reading files (utf-8-sig, utf-8, cp949) and unified CSV text parsing with csv.Sniffer delimiter detection.
  • Strengthened dtype inference (int, float, date, string) and expanded numeric stats to include outlier_count (IQR rule) alongside count/mean/min/q1/median/q3/max/std in summarize_rows.
  • Exposed AnalysisError and core helpers in bitnet_tools/__init__.py and added CLI (bitnet_tools/cli.py) with --timeout for ollama and friendly error handling for analysis failures.
  • Added a simple local UI server (bitnet_tools/web.py) with payload size guard (MAX_CSV_TEXT_CHARS) and ollama timeout support, plus static UI assets (ui/*.html/.js/.css).
  • Added pyproject.toml and expanded test coverage and test file (tests/test_analysis.py) to cover semicolon-delimited CSVs, empty input, mixed-type column behavior, and presence of outlier_count.

Testing

  • Ran unit tests with python -m pytest -q which returned 5 passed.
  • Performed a CLI smoke test with a semicolon-delimited CSV using python -m bitnet_tools.cli analyze /tmp/next2.csv --question '요약' --out /tmp/next2_payload.json and verified inferred dtypes were {'id': 'int', 'value': 'int', 'dt': 'date'} and outlier_count printed as 0.0.
  • Verified web API safeguards by adding a maximum CSV text size and that AnalysisError is returned as a clear HTTP error when input is invalid.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91ecdaea92

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bitnet_tools/analysis.py Outdated
if not columns:
raise AnalysisError("CSV header not found")

rows = list(reader)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize row keys after stripping CSV headers

_parse_csv_text trims header names into columns but returns raw DictReader rows unchanged, so padded headers like " amount " are later queried as row.get("amount") and treated as entirely missing in summarize_rows. This silently corrupts dtype/missing/stat outputs for otherwise valid CSV files that contain whitespace around header names; map row keys to the normalized header names before returning.

Useful? React with 👍 / 👎.

Comment thread bitnet_tools/analysis.py Outdated
@rad1092 rad1092 merged commit 327b6e1 into main Feb 13, 2026
0 of 4 checks passed
@rad1092 rad1092 deleted the codex/start-project-according-to-readme.md branch February 13, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant