Skip to content

Conversation

@marcorudolphflex
Copy link
Contributor

@marcorudolphflex marcorudolphflex commented Oct 30, 2025

Issue
LocalCache._iter_entries is used when storing new data in the cache. This will be inefficient for many simulations. We should track changes and hold a file reflecting statistics like storage size and no. of simulations.

PR

  • Maintain a stats.json which summarizes relevant information for cache manipulation (number of entries, storage size, last_used by entry key).
  • This stats.json is altered on _store, _touch, _remove_entry and _clear
  • LocalCache._iter_entries is now a generator

Greptile Overview

Updated On: 2025-10-30 17:09:35 UTC

Greptile Summary

Implements a stats.json file to track cache metadata (entry count, total size, last access times) to avoid iterating all cache entries during store/eviction operations, significantly improving performance for large caches.

Key Changes:

  • Added CacheStats pydantic model to represent the stats file structure
  • Modified _iter_entries() from returning a list to yielding entries (generator pattern)
  • Implemented _record_store_stats(), _record_touch_stats(), and _record_remove_stats() to update stats incrementally
  • Added sync_stats() method to rebuild stats from filesystem when inconsistencies are detected
  • Updated __len__() to read from stats instead of iterating entries
  • Modified eviction logic to use stats data instead of iterating all entries
  • Enhanced _ensure_limits() to calculate projected size/count before eviction

Issues Found:

  • Critical logic bugs in all three _record_*_stats() methods where negative total_size triggers sync_stats() but returns early without writing the updated entries dict, leaving stats in an inconsistent state
  • Missing metadata write in _touch() early return path when file_size is missing
  • Race condition in _store() where _load_entry() is called inside a lock but doesn't use locking itself
  • Eviction loop may fail to free enough space when entries are missing (continues without incrementing reclaimed)

Confidence Score: 2/5

  • This PR has critical logic bugs that will cause cache stats to become corrupted, requiring careful fixes before merge
  • The performance improvement approach is sound, but there are multiple critical bugs in the stats update methods that cause incomplete writes when total_size goes negative. These bugs will lead to stats.json becoming out of sync with the actual cache state. Additionally, there's a potential race condition and an eviction bug that could prevent proper cache size management.
  • Pay close attention to tidy3d/web/cache.py, specifically the _record_store_stats, _record_touch_stats, _record_remove_stats, _touch, and _evict_by_size methods

Important Files Changed

File Analysis

Filename Score Overview
tidy3d/web/cache.py 2/5 Adds stats.json to track cache metadata for performance but has critical logic bugs in stats update methods that cause incomplete writes when total_size goes negative, and a race condition in _store
tests/test_web/test_local_cache.py 4/5 Adds comprehensive tests for stats tracking, sync, and verifies that _iter_entries is not called during store/fetch operations

Sequence Diagram

sequenceDiagram
    participant Client
    participant LocalCache
    participant StatsJSON as stats.json
    participant FileSystem as File System
    
    Note over Client,FileSystem: Store Operation
    Client->>LocalCache: _store(key, source_path, metadata)
    LocalCache->>LocalCache: _lock acquired
    LocalCache->>LocalCache: _load_entry(key) - check existing
    LocalCache->>StatsJSON: _load_stats()
    StatsJSON-->>LocalCache: current stats
    LocalCache->>LocalCache: _ensure_limits(file_size, incoming_key, replacing_size)
    LocalCache->>StatsJSON: _load_stats() - check limits
    StatsJSON-->>LocalCache: stats for eviction check
    alt Eviction needed
        LocalCache->>LocalCache: _evict() or _evict_by_size()
        loop For each candidate
            LocalCache->>FileSystem: _remove_entry(entry)
            LocalCache->>StatsJSON: _record_remove_stats(key, file_size)
        end
    end
    LocalCache->>FileSystem: copy artifact to cache
    LocalCache->>StatsJSON: _record_store_stats(key, last_used, file_size, previous_size)
    StatsJSON->>StatsJSON: update last_used[key], total_size
    LocalCache->>LocalCache: _lock released
    LocalCache-->>Client: CacheEntry
    
    Note over Client,FileSystem: Fetch Operation
    Client->>LocalCache: _fetch(key)
    LocalCache->>LocalCache: _lock acquired
    LocalCache->>LocalCache: _load_entry(key)
    LocalCache->>FileSystem: read metadata.json
    FileSystem-->>LocalCache: entry metadata
    LocalCache->>LocalCache: verify() - checksum
    alt Valid entry
        LocalCache->>LocalCache: _touch(entry)
        LocalCache->>FileSystem: update metadata.json
        LocalCache->>StatsJSON: _record_touch_stats(key, last_used, file_size)
        LocalCache-->>Client: CacheEntry
    else Invalid/missing
        LocalCache-->>Client: None
    end
    LocalCache->>LocalCache: _lock released
    
    Note over Client,FileSystem: Sync Stats Recovery
    Client->>LocalCache: sync_stats()
    LocalCache->>LocalCache: _lock acquired, _syncing_stats = True
    LocalCache->>LocalCache: _iter_entries() - scan all
    loop For each entry
        LocalCache->>FileSystem: read metadata.json
        FileSystem-->>LocalCache: file_size, last_used
    end
    LocalCache->>StatsJSON: _write_stats(rebuilt)
    StatsJSON->>StatsJSON: persist complete stats
    LocalCache->>LocalCache: _syncing_stats = False, _lock released
    LocalCache-->>Client: CacheStats
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

@github-actions
Copy link
Contributor

github-actions bot commented Oct 30, 2025

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/web/cache.py (74.2%): Missing lines 97-100,142,170,174-175,196-198,202,204-207,209-210,226-227,239,241-242,253-254,260,265-266,272-273,424-425,427,450-451,458-460,464,467,475-480,482-483,485-490,493,498,515,524,527-528,539,554-556,558-560,742-745

Summary

  • Total: 275 lines
  • Missing: 71 lines
  • Coverage: 74%

tidy3d/web/cache.py

   93     def get(self, key: str, default: Any = None) -> Any:
   94         return self.as_dict().get(key, default)
   95 
   96     def __getitem__(self, key: str) -> Any:
!  97         data = self.as_dict()
!  98         if key not in data:
!  99             raise KeyError(key)
! 100         return data[key]
  101 
  102 
  103 @dataclass
  104 class CacheEntry:

  138                 "Simulation cache checksum mismatch for key '%s'. Removing stale entry.", self.key
  139             )
  140             return False
  141         if self.metadata.file_size != file_size:
! 142             self.metadata.file_size = file_size
  143             _write_metadata(self.metadata_path, self.metadata)
  144         return True
  145 
  146     def materialize(self, target: Path) -> Path:

  166     def _stats_path(self) -> Path:
  167         return self._root / CACHE_STATS_NAME
  168 
  169     def _schedule_sync(self) -> None:
! 170         self._sync_pending = True
  171 
  172     def _run_pending_sync(self) -> None:
  173         if self._sync_pending and not self._syncing_stats:
! 174             self._sync_pending = False
! 175             self.sync_stats()
  176 
  177     @contextmanager
  178     def _with_lock(self) -> Iterator[None]:
  179         self._run_pending_sync()

  192 
  193     def _load_stats(self, *, rebuild: bool = False) -> CacheStats:
  194         path = self._stats_path
  195         if not path.exists():
! 196             if not self._syncing_stats:
! 197                 self._schedule_sync()
! 198             return CacheStats()
  199         try:
  200             data = json.loads(path.read_text(encoding="utf-8"))
  201             if "last_used" not in data and "entries" in data:
! 202                 data["last_used"] = data.pop("entries")
  203             stats = CacheStats.model_validate(data)
! 204         except Exception:
! 205             if rebuild and not self._syncing_stats:
! 206                 self._schedule_sync()
! 207             return CacheStats()
  208         if stats.total_size < 0:
! 209             self._schedule_sync()
! 210             return CacheStats()
  211         return stats
  212 
  213     def _record_store_stats(
  214         self,

  222         entries = dict(stats.last_used)
  223         entries[key] = last_used
  224         total_size = stats.total_size - previous_size + file_size
  225         if total_size < 0:
! 226             total_size = 0
! 227             self._schedule_sync()
  228         updated = stats.model_copy(update={"last_used": entries, "total_size": total_size})
  229         self._write_stats(updated)
  230 
  231     def _record_touch_stats(

  235         entries = dict(stats.last_used)
  236         existed = key in entries
  237         total_size = stats.total_size
  238         if not existed and file_size is not None:
! 239             total_size += file_size
  240         if total_size < 0:
! 241             total_size = 0
! 242             self._schedule_sync()
  243         entries[key] = last_used
  244         updated = stats.model_copy(update={"last_used": entries, "total_size": total_size})
  245         self._write_stats(updated)

  249         entries = dict(stats.last_used)
  250         entries.pop(key, None)
  251         total_size = stats.total_size - file_size
  252         if total_size < 0:
! 253             total_size = 0
! 254             self._schedule_sync()
  255         updated = stats.model_copy(update={"last_used": entries, "total_size": total_size})
  256         self._write_stats(updated)
  257 
  258     def _enforce_limits_post_sync(self, entries: list[CacheEntry]) -> None:

  256         self._write_stats(updated)
  257 
  258     def _enforce_limits_post_sync(self, entries: list[CacheEntry]) -> None:
  259         if not entries:
! 260             return
  261 
  262         entries_map = {entry.key: entry.metadata.last_used.isoformat() for entry in entries}
  263 
  264         if self.max_entries > 0 and len(entries) > self.max_entries:
! 265             excess = len(entries) - self.max_entries
! 266             self._evict(entries_map, remove_count=excess, exclude_keys=set())
  267 
  268         max_size_bytes = int(self.max_size_gb * (1024**3))
  269         if max_size_bytes > 0:
  270             total_size = sum(entry.metadata.file_size for entry in entries)

  268         max_size_bytes = int(self.max_size_gb * (1024**3))
  269         if max_size_bytes > 0:
  270             total_size = sum(entry.metadata.file_size for entry in entries)
  271             if total_size > max_size_bytes:
! 272                 bytes_to_free = total_size - max_size_bytes
! 273                 self._evict_by_size(entries_map, bytes_to_free, exclude_keys=set())
  274 
  275     def sync_stats(self) -> CacheStats:
  276         with self._lock:
  277             self._syncing_stats = True

  420         max_size_bytes = int(self.max_size_gb * (1024**3))
  421 
  422         try:
  423             incoming_size_int = int(incoming_size)
! 424         except (TypeError, ValueError):
! 425             incoming_size_int = 0
  426         if incoming_size_int < 0:
! 427             incoming_size_int = 0
  428 
  429         stats = self._load_stats()
  430         entries_info = dict(stats.last_used)
  431         existing_keys = set(entries_info)

  446 
  447         existing_size = stats.total_size
  448         try:
  449             replacing_size_int = int(replacing_size)
! 450         except (TypeError, ValueError):
! 451             replacing_size_int = 0
  452         if incoming_key and incoming_key in existing_keys:
  453             projected_size = existing_size - replacing_size_int + incoming_size_int
  454         else:
  455             projected_size = existing_size + incoming_size_int

  454         else:
  455             projected_size = existing_size + incoming_size_int
  456 
  457         if max_size_bytes > 0 and projected_size > max_size_bytes:
! 458             bytes_to_free = projected_size - max_size_bytes
! 459             exclude = {incoming_key} if incoming_key else set()
! 460             self._evict_by_size(entries_info, bytes_to_free, exclude_keys=exclude)
  461 
  462     def _evict(self, entries: dict[str, str], *, remove_count: int, exclude_keys: set[str]) -> None:
  463         if remove_count <= 0:
! 464             return
  465         candidates = [(key, entries.get(key, "")) for key in entries if key not in exclude_keys]
  466         if not candidates:
! 467             return
  468         candidates.sort(key=lambda item: item[1] or "")
  469         for key, _ in candidates[:remove_count]:
  470             self._remove_entry_by_key(key)

  471 
  472     def _evict_by_size(
  473         self, entries: dict[str, str], bytes_to_free: int, *, exclude_keys: set[str]
  474     ) -> None:
! 475         if bytes_to_free <= 0:
! 476             return
! 477         candidates = [(key, entries.get(key, "")) for key in entries if key not in exclude_keys]
! 478         if not candidates:
! 479             return
! 480         candidates.sort(key=lambda item: item[1] or "")
  481         reclaimed = 0
! 482         for key, _ in candidates:
! 483             if reclaimed >= bytes_to_free:
  484                 break
! 485             entry = self._load_entry(key)
! 486             if entry is None:
! 487                 log.debug("Could not find entry for eviction.")
! 488                 self._schedule_sync()
! 489                 break
! 490             size = entry.metadata.file_size
  491             self._remove_entry(entry)
  492             reclaimed += size
! 493             log.info(f"Simulation cache evicted entry '{key}' to reclaim {size} bytes.")
  494 
  495     def _iter_entries(self) -> Iterator[CacheEntry]:
  496         """Iterate lazily over all cache entries, including those in prefix subdirectories."""
  497         if not self._root.exists():
! 498             return
  499 
  500         for prefix_dir in self._root.iterdir():
  501             if not prefix_dir.is_dir() or prefix_dir.name.startswith(
  502                 (TMP_PREFIX, TMP_BATCH_PREFIX)

  511             for child in subdirs:
  512                 if not child.is_dir():
  513                     continue
  514                 if child.name.startswith((TMP_PREFIX, TMP_BATCH_PREFIX)):
! 515                     continue
  516 
  517                 meta_path = child / CACHE_METADATA_NAME
  518                 if not meta_path.exists():
  519                     continue

  520 
  521                 try:
  522                     metadata = _read_metadata(meta_path, child / CACHE_ARTIFACT_NAME)
  523                 except Exception:
! 524                     log.debug(
  525                         "Failed to parse metadata for '%s'; scheduling stats sync.", child.name
  526                     )
! 527                     self._schedule_sync()
! 528                     continue
  529 
  530                 yield CacheEntry(key=child.name, root=self._root, metadata=metadata)
  531 
  532     def _load_entry(self, key: str) -> Optional[CacheEntry]:

  535             return None
  536         try:
  537             metadata = _read_metadata(entry.metadata_path, entry.artifact_path)
  538         except Exception:
! 539             return None
  540         return CacheEntry(key=key, root=self._root, metadata=metadata)
  541 
  542     def _touch(self, entry: CacheEntry) -> None:
  543         entry.metadata.bump_last_used()

  550 
  551     def _remove_entry_by_key(self, key: str) -> None:
  552         entry = self._load_entry(key)
  553         if entry is None:
! 554             path = get_cache_entry_dir(self._root, key)
! 555             if path.exists():
! 556                 shutil.rmtree(path, ignore_errors=True)
  557             else:
! 558                 log.debug("Could not find entry for key '%s' to delete.", key)
! 559             self._record_remove_stats(key, 0)
! 560             return
  561         self._remove_entry(entry)
  562 
  563     def _remove_entry(self, entry: CacheEntry) -> None:
  564         file_size = entry.metadata.file_size

  738 
  739 def _read_metadata(meta_path: Path, artifact_path: Path) -> CacheEntryMetadata:
  740     raw = json.loads(meta_path.read_text(encoding="utf-8"))
  741     if "file_size" not in raw:
! 742         try:
! 743             raw["file_size"] = artifact_path.stat().st_size
! 744         except FileNotFoundError:
! 745             raw["file_size"] = 0
  746     raw.setdefault("created_at", _now())
  747     raw.setdefault("last_used", raw["created_at"])
  748     raw.setdefault("cache_key", meta_path.parent.name)
  749     return CacheEntryMetadata.model_validate(raw)

@marcorudolphflex marcorudolphflex force-pushed the FXC-3902-make-manipulations-on-local-cache-more-performant branch from b077d1c to 149a936 Compare October 30, 2025 17:52
@marcorudolphflex marcorudolphflex force-pushed the FXC-3902-make-manipulations-on-local-cache-more-performant branch 2 times, most recently from 39c81d8 to 2113337 Compare November 4, 2025 17:46
@marcorudolphflex marcorudolphflex force-pushed the FXC-3902-make-manipulations-on-local-cache-more-performant branch from 2113337 to 5166930 Compare November 5, 2025 09:29
@marcorudolphflex marcorudolphflex force-pushed the FXC-3902-make-manipulations-on-local-cache-more-performant branch from 5166930 to e3361d4 Compare November 5, 2025 10:23
Copy link
Collaborator

@yaugenst-flex yaugenst-flex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaugenst-flex yaugenst-flex added this pull request to the merge queue Nov 5, 2025
Merged via the queue into develop with commit e73fbb4 Nov 5, 2025
37 checks passed
@yaugenst-flex yaugenst-flex deleted the FXC-3902-make-manipulations-on-local-cache-more-performant branch November 5, 2025 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants