perf: benchmark singletons — they're slower, keep them off#8
Conversation
|
why is it the case? cache check should be fast, and given we do see there are cache entries, i don't understand why it would be slower. |
50e9317 to
0c78429
Compare
Singleton benchmarking shows cache overhead makes them ~1.5-2x slower and use more memory than non-singletons. The dict lookup cost in __new__ and growing cache dominate any deduplication savings. Results (50 samples, 100 merges): - No singletons: 3.8s, 2.4 MB - Singletons: 7.2s, 5.3 MB, 7431 cache entries Add tests verifying singletons produce identical merges and aren't excessively slower (regression guard at 3x). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cache key tuple (cls, args, kwargs) built on every Node() call is more expensive than just allocating a fresh frozen dataclass with slots. Merge operations create thousands of nodes, so this overhead dominates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The cache lookup is fast, but the cache key construction on every key = (cls, args, tuple(sorted(kwargs.items())))This tuple creation + dict lookup costs more than just allocating a fresh frozen dataclass with Profiling shows:
The savings from singletons (fewer Microbenchmark: 50k |
|
wow, ok! would singletons help in scale? or even then, not worth it? i am thinking about a test like: create 10k linked lists for the word "The" (t -> h -> e) - then run |
The old key (cls,)+args missed kwargs — dataclass constructors pass fields as kwargs, so ALL instances of a class shared one key, creating self-referential cycles. New key includes kwargs.values(). Add 10k-word scale test verifying singletons produce identical merges. Overhead is now ~1.2x (was 1.4x with old sorted kwargs key). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Fixed the key — the bug was that dataclass constructors pass fields as kwargs, not args. So the fast key New key: Added a 10k-word scale test verifying identical merges with/without singletons. At this scale (10k "the"), singletons add ~15% overhead but are correct. The overhead comes from the dict lookup per Node() call — still more expensive than raw frozen dataclass allocation, but much better than the old |
- Remove _instances cache and __new__ override from GraphVertex - Remove identity-based __eq__/__hash__ from GraphVertex base class (dataclass subclasses already generate proper field-based versions) - Remove USE_SINGLETONS from GraphSettings - Delete test_singletons.py, test_singleton_perf.py, bench_singletons.py - Clean up conftest.py and bne.py Singletons added ~15-20% overhead due to cache key construction cost, with no measurable benefit for training performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
benchmarks/bench_singletons.pycomparing singleton vs non-singleton performanceStacked on #7.
What improved
Results (50 samples, 100 merges)
Test plan
ruff check .passes🤖 Generated with Claude Code