fix(plot): Fix dataset_id invalidation in encoding and metadata methods by lmeyerov · Pull Request #797 · graphistry/pygraphistry

lmeyerov · 2025-10-17T04:13:14Z

Summary

Fixed dataset_id invalidation logic for encoding, metadata, and style methods. Dataset IDs now properly invalidate when encodings, name, description, or visualization styles change, preventing stale ID reuse after modifying visualization properties.

Analysis Approach

Used complementary analysis tools to ensure complete coverage:

AST pattern matching: Found 9 methods with direct attribute modifications
Pysa call graph analysis: Found 11 methods total, including 2 additional bugs AST missed (+17% bug detection)

Changes

Fixed 7 bugs in graphistry/PlotterBase.py:

__encode() helper - Now invalidates dataset_id (fixes all 8 encoding methods)
encode_axis() - Now invalidates dataset_id
name() - Now invalidates dataset_id
description() - Now invalidates dataset_id
bind() - Conditional invalidation (only when changing encodings, not when setting IDs)
style() - Now invalidates dataset_id (Pysa-discovered)
addStyle() - Now invalidates dataset_id (Pysa-discovered)

Why Pysa Found More:

AST finds direct patterns: res._attr = value
Pysa finds indirect patterns: res = self.bind() followed by modifications
style() and addStyle() use double-copy pattern via bind() that AST missed

Test Coverage

Comprehensive test suite with 25 tests in graphistry/tests/test_dataset_id_invalidation.py:

✅ 11 core tests for basic invalidation scenarios
✅ 3 Pysa-discovered bug tests (style, addStyle, infer_labels)
✅ 8 parametric tests covering ALL encoding methods (future-proof)
✅ 3 bind() edge case tests (multiple encodings, no params, mixed usage)

Test design principles:

Parametric over individual (one test covers all 8 encoding methods)
Future-proof (adding new encoding = add to parametrize list)
Non-brittle (clear assertions, independent tests, representative samples)

Validation

✅ All 25 tests passing (Python 3.8 and 3.12)
✅ PlotterBase tests passing (48 passed, no regressions)
✅ Type checking clean (mypy)
✅ Linting clean (flake8)

Impact

Server-side uploads now correctly detect when datasets have changed, preventing visualization artifacts from stale data. Multi-tool analysis (AST + Pysa) provided 17% better bug detection than AST alone.

🤖 Generated with Claude Code

Dataset IDs were not being invalidated when encoding or metadata attributes changed, causing stale ID reuse and incorrect server-side uploads. **Problem**: After upload, modifying encodings (encode_point_color, bind with point_*/edge_* params) or metadata (name, description) would reuse the old dataset_id, sending outdated visualization configurations to the server. **Root cause**: Invalidation was only implemented for DataFrame-changing methods (.nodes(), .edges(), .graph()) but not for encoding/metadata methods. **Fixed methods** (5 total, ~10 LOC): - __encode() helper: Invalidates dataset_id after modifying _complex_encodings → Fixes all 7+ .encode_*() methods (encode_point_color, encode_edge_color, encode_point_size, encode_point_icon, encode_edge_icon, encode_point_badge, encode_edge_badge) with single choke-point fix - encode_axis(): Invalidates dataset_id after modifying _complex_encodings - name(): Invalidates dataset_id after modifying _name - description(): Invalidates dataset_id after modifying _description - bind(): Conditionally invalidates dataset_id when changing encodings → Preserves IDs when explicitly setting them via bind(dataset_id='...') → Invalidates when changing encodings via bind(point_color='...', etc.) **Analysis approach**: Used Python AST analysis to identify all methods modifying dataset-relevant attributes, confirming manual audit findings. **Testing**: Added comprehensive test suite with 11 tests covering: - 5 bug reproduction tests (now passing) - 2 correct behavior tests (settings, bind with IDs) - 2 file ID invalidation tests (already working) - 2 real-world mixed scenario tests **Impact**: Server-side uploads now correctly detect dataset changes, preventing visualization artifacts from stale encodings and metadata. **Documentation**: Added pyre-check setup guide (ai/prompts/PYRE_ANALYSIS.md) and tool selection guidance (ai/README.md) demonstrating AST-based analysis as practical alternative for custom code patterns. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Pyre-check hangs when analyzing graphistry/compute/hop.py due to the hop() function's complexity (384 lines, nested loops, runtime TypeVar resolution). Excluding this single file allows pyre to successfully analyze the entire codebase. Changes: - Exclude compute/hop.py from pyre analysis in .pyre_configuration - Include tests/ in pyre analysis (previously excluded) - Add .pyre/ cache directory to .gitignore - Document exclusion rationale in hop.py docstring - Update PYRE_ANALYSIS.md with working configuration and quick start - Add pyre investigation results to PLAN.md template Results: - Pyre now analyzes 3711 functions (including tests) in ~2.5 seconds - Previously hung at 1631/1709 functions indefinitely - No other files require exclusion The hop() function exceeds pyre's hard-coded complexity limit for deep analysis. Mypy handles this file without issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Both style() and addStyle() modify the _style attribute which affects visualization rendering. These methods should invalidate dataset_id to ensure proper server-side upload behavior. Found by Pysa call graph analysis - these methods use indirect modification via bind() that AST pattern matching missed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add 25 tests covering dataset_id and file_id invalidation logic: - 11 core tests for basic invalidation scenarios - 3 Pysa-discovered bug tests (style, addStyle, infer_labels) - 8 parametric tests covering ALL encoding methods - 3 bind() edge case tests (conditional invalidation) Uses parametric testing (@pytest.mark.parametrize) to ensure all encoding methods are tested and future methods are easily added. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Reduce from 373 to 65 lines by: - Removing generic pyre documentation - Focusing on PyGraphistry-specific commands - Adding timeout debugging section - Using $(pwd) instead of absolute paths - Removing future work section Addresses PR feedback for more concise, actionable documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add ai/assets/pysa_extract_callers.py for extracting caller information from Pysa's call-graph.json output. Moved from gitignored plans/ to committed ai/assets/ for team reuse. Update .pyre_configuration with taint_models_path for Pysa analysis. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Remove Status line (project-specific findings) and redundant Pysa Workflow section. Keep as reusable workflow guide focused on commands and gotchas. Addresses PR feedback on lines 3 and 25. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Reduce verbose 5-line explanation to concise 1-line note. Still conveys essential information about pyre exclusion. Addresses PR feedback on line 9. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Remove taint_models_path referencing plans/ (gitignored directory). This path would break pyre for other developers. Taint models are not needed for call graph analysis (documented workflow in PYRE_ANALYSIS.md). Configuration now works on all systems. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Keep .pyre_configuration clean (no gitignored paths). Document how users can optionally add taint models locally. Taint models are experimental and user-specific, not committed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add numbered workflow and missing commands: 1. Generate call graph (existing) 2. Verify call graph generated (new) 3. Extract callers with single/multiple method examples (enhanced) Makes the workflow clearer and more actionable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>