Skip to content

test: increase coverage for ExclusionDetector (currently 59%) #144

@longieirl

Description

@longieirl

Problem

`exclusion_detector.py` sits at ~59% coverage — the lowest of any file in `bankstatements_core`. There are no tests for this class at all. It actively drags overall coverage down.

File

`packages/parser-core/src/bankstatements_core/templates/detectors/exclusion_detector.py`

Uncovered paths (from coverage report)

Lines Description
Line 27 `name` property
Lines 57–60 `crop()` exception fallback → `first_page.extract_text()`
Lines 64–66 `if not text` branch (no text in header area)
Lines 81–110 Template loop: no-exclusion-rules skip, keyword matching, excluded/allowed branches

Why it is hard to test

`ExclusionDetector.detect()` takes a `pdfplumber.page.Page` as its second argument. `Page` is tightly coupled to an open PDF file handle — not easily instantiated in isolation.

Approach

Use `unittest.mock.MagicMock` to simulate `pdfplumber.page.Page`:

```python
from unittest.mock import MagicMock

def make_page(text: str, crop_raises: bool = False) -> MagicMock:
page = MagicMock()
page.width = 800
cropped = MagicMock()
if crop_raises:
page.crop.side_effect = AttributeError("crop failed")
else:
cropped.extract_text.return_value = text
page.crop.return_value = cropped
page.extract_text.return_value = text # fallback
return page
```

Tests to add

Create `packages/parser-core/tests/detectors/test_exclusion_detector.py` covering:

  • Template with no `exclude_keywords` → returns empty list
  • Template excluded when keyword present in header text → confidence=0.0, `match_details` populated
  • Template allowed when keyword not present → returns empty list
  • Multiple templates, mix of excluded and allowed → only excluded templates in results
  • `crop()` raises `AttributeError` → falls back to `first_page.extract_text()`
  • `crop()` raises `ValueError` → falls back to `first_page.extract_text()`
  • No text in header area (`extract_text()` returns `None` or `""`) → returns empty list
  • Case-insensitive keyword matching (e.g. keyword `"IBAN"` matches `"iban"` in text)
  • Multiple exclude keywords, only one matches → template excluded, only matched keyword in `matched_keywords`

Expected outcome

Coverage on `exclusion_detector.py` rises from 59% to ≥ 90%. Overall project coverage stays at or above 94%.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions