Skip to content

fix: add debug logging at silent exception swallows#162

Merged
KailasMahavarkar merged 2 commits intomainfrom
fix/silent-exception-logging
Apr 20, 2026
Merged

fix: add debug logging at silent exception swallows#162
KailasMahavarkar merged 2 commits intomainfrom
fix/silent-exception-logging

Conversation

@KailasMahavarkar
Copy link
Copy Markdown
Contributor

Summary

Audit of except Exception: pass sites in src/graphstore/ turned up 7 spots swallowing real failures without any trace. Each site keeps identical runtime behaviour - the same except branch still runs - but now emits logger.debug(err) so dev-time tracing works.

Sites changed

File Line What was swallowed
core/optimizer.py 635 document_store.delete_document during eviction - could leak disk space
dsl/handlers/ingest.py 263, 267 NER entity put_node/put_edge - hides incomplete graphs
dsl/handlers/mutations.py 314, 319 Sentence-level entity links on UPDATE paths
dsl/sys/pipeline.py 251 Evidence edge obs -> ev - breaks evidence chains
core/evolve/_impl.py 268 memory.measure() failure zeros the metric
core/evolve/_impl.py 284 optimizer.health_check() failure zeros tombstone/bloat metrics
core/memory.py 84 document_store.stats() failure drops disk-usage line
ingest/connector.py 43 User progress_callback exception - no breadcrumb

Added logger init to optimizer.py, mutations.py, memory.py (other files already had one).

Not changed

The remaining ~17 except-pass sites in src/ are legitimate (cleanup-during-cleanup, hardware probing, decode-with-replacement, optional-feature detection). Left intentionally opaque.

Test plan

  • 574 tests pass
  • Pre-existing flake in test_subgraph_extraction is on main too, unrelated to this change

🤖 Generated with Claude Code

KailasMahavarkar and others added 2 commits April 20, 2026 15:37
Audit of `except Exception: pass` sites in src/graphstore/ turned up 7
spots that were swallowing real failures without any trace. Each site
stays behaviourally identical - the same except branch keeps running -
but now emits logger.debug(err) so dev-time tracing of "why is this
empty/missing/zero" works.

Sites touched:
  core/optimizer.py:635       document_store.delete_document eviction
                              could silently leak disk space when delete
                              failed. Now logs slot + err.
  dsl/handlers/ingest.py:263  put_node/put_edge for NER entity link
                              silently dropped, hiding incomplete graphs.
  dsl/handlers/mutations.py:  same as above for sentence-level entity
    314, 319                  links on UPDATE paths.
  dsl/sys/pipeline.py:251     evidence edge (obs -> ev) silently dropped,
                              breaking evidence chains.
  core/evolve/_impl.py:268    memory.measure() failure silently zeroed
                              total_bytes metric.
  core/evolve/_impl.py:284    optimizer.health_check() failure silently
                              zeroed tombstone/string_bloat metrics.
  core/memory.py:84           document_store.stats() failure silently
                              dropped disk-usage line in memory report.
  ingest/connector.py:43      user progress_callback exception silently
                              ignored with no breadcrumb.

Added logger initialization to optimizer.py, mutations.py, memory.py
(other files already had one).

The remaining ~17 except-pass sites in src/ are legitimate (cleanup-
during-cleanup, hardware probing, decode-with-replacement, optional-
feature detection) and intentionally opaque.

574 tests pass (pre-existing flake in test_subgraph_extraction is on
main too, unrelated to this change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/test_e2e_real_embedder.py uses a module-scoped `brain` fixture
(persistent GraphStore + real embedder) that Phase 1 tests populate
via INGEST and Phase 3+4 tests read via CREATE EDGE / RECALL / SUBGRAPH.

The module is tagged `pytest.mark.xdist_group("e2e_real_embedder")`
so all tests land on the same xdist worker - otherwise each worker
forks its own empty GraphStore and downstream tests hit NodeNotFound
on the IDs ingest created in a different worker.

But pytest-xdist's default scheduler (LoadScheduling) ignores
xdist_group markers. The grouping is only honored when the scheduler
is set to `loadgroup` (or `loadfile` / `worksteal`, none of which were
configured here).

Result before this fix: running the test module via the default
`uv run pytest` addopts (-n 4) split the six Ingest tests across four
workers, so the `brain` fixture Phase 3/4 saw was populated by at most
one Ingest test - the other five paper: IDs did not exist. Failures:

  TestKnowledgeGraph::test_create_topic_hierarchy
  TestKnowledgeGraph::test_recall_from_paper
  TestKnowledgeGraph::test_subgraph_extraction
  TestKnowledgeGraph::test_traverse_topic_graph
  TestAgentLifecycle::test_checkpoint_and_stats
  TestRealCognitiveRetrieval::test_what_do_transformers_and_bert...
  TestRealCognitiveRetrieval::test_cross_document_search
  TestSemanticRetrieval::test_similar_to_finds_attention_content

Serial or --dist loadgroup: all pass.

Fix: set `--dist loadgroup` in addopts. Full suite 1802 passed,
101 skipped under default invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@KailasMahavarkar KailasMahavarkar merged commit 95fe429 into main Apr 20, 2026
4 checks passed
@KailasMahavarkar KailasMahavarkar deleted the fix/silent-exception-logging branch April 20, 2026 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant