Investigate OCR fallback returning empty text for scanned Bentley minutes PDF

## Summary

While validating Anytown's adoption of `@happyvertical/pdf@0.62.28`, the non-CI live Praeco discovery test exposed a scanned/minutes PDF where extraction returned empty text even after the reader logged that it was attempting OCR fallback.

This should be investigated upstream rather than papered over in Praeco/Anytown.

## Repro Context

Repo: `anytown/anytown.ai`
Command:

```bash
pnpm --filter @happyvertical/praeco test
```

Important note: this command runs live `agents/praeco/src/discovery.spec.ts` locally. In CI, that spec is skipped via `CI=true`.

Failing test:

```text
src/discovery.spec.ts > Discovery - Basic Workflow > should fetch live Bentley agenda and scanned minutes text from WordPress download pages
```

Observed source page/document path:

```text
https://townofbentley.ca/download/regular-council-meeting-march-24-2026-signed-minutes/
```

Runtime log excerpt:

```text
→ Fetching minutes from https://townofbentley.ca/download/regular-council-meeting-march-24-2026-signed-minutes/
No direct text found, attempting OCR fallback...
✗ Failed to fetch minutes: PDF extraction produced no text for https://townofbentley.ca/download/regular-council-meeting-march-24-2026-signed-minutes/?wpdmdl=21101&refresh=69ec58cbcc67c1777096907
```

## Expected

For scanned PDFs where direct text extraction returns nothing, OCR fallback should either:

- return recognized text, or
- fail explicitly with the OCR/rendering/provider error that explains why OCR could not complete.

It should not look like OCR ran successfully but produced an empty successful extraction unless the document truly has no recognizable text.

## Notes

This was seen after the child-process extraction release in `@happyvertical/pdf@0.62.28`, while working on Anytown PR https://github.com/anytown/anytown.ai/pull/264.

There was a separate local environment failure in the same live spec for missing Playwright browser dependencies. This issue is specifically about the PDF/OCR empty-text result after the document was fetched.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate OCR fallback returning empty text for scanned Bentley minutes PDF #70

Summary

Repro Context

Expected

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Investigate OCR fallback returning empty text for scanned Bentley minutes PDF #70

Description

Summary

Repro Context

Expected

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions