Skip to content

st.cache_data on class methods intermittently raises UnserializableReturnValueError #14593

@v-pi

Description

@v-pi

Checklist

  • I have searched the existing issues for similar issues.
  • I added a very descriptive title to this issue.
  • I have provided sufficient information below to help reproduce this issue.

Summary

When @st.cache_data decorates a method that returns a custom class instance, pickle.dumps() intermittently fails with PicklingError: Can't pickle <class 'X'>: it's not the same object as X. This happens because Streamlit's LocalSourcesWatcher deletes watched modules from sys.modules on file changes, creating a class identity mismatch between the instance being pickled and the class found by pickle's resolution mechanism.

Reproducible Code Example

### `repro_model.py`


"""Local module — simulates any user-defined class in a local .py file."""

class MyResult:
    def __init__(self, output: str = "", value: int = 0):
        self.output = output
        self.value = value


### `repro_agent.py`


"""
Helper module imported once by repro.py.
Simulates agents/__init__.py — only loaded once, holds a stale class reference
after sys.modules is invalidated by the file watcher.
"""
import streamlit as st
from repro_model import MyResult  # bound once at import time


@st.cache_data(show_spinner=False)
def get_result(key: str) -> MyResult:
    import time
    time.sleep(0.1)
    return MyResult(output=f"result for {key}", value=42)


### `repro.py` — run with `streamlit run repro.py`


import streamlit as st
import sys
import importlib
import repro_agent  # imported once — simulates agents/__init__.py


st.title("cache_data pickle race condition repro")

if st.button("1) Normal call (works)"):
    try:
        result = repro_agent.get_result("test")
        st.success(f"Got: {result.output}")
    except Exception as e:
        st.error(f"{type(e).__name__}: {e}")

if st.button("2) Simulate file-watcher then call (fails, and keeps failing)"):
    # This is exactly what LocalSourcesWatcher.on_path_changed does:
    #   for wm in self._watched_modules.values():
    #       del sys.modules[wm.module_name]
    if "repro_model" in sys.modules:
        del sys.modules["repro_model"]

    # Re-import creates a NEW class object in sys.modules
    importlib.import_module("repro_model")

    # Clear cache so pickle.dumps is attempted again
    repro_agent.get_result.clear()

    try:
        # repro_agent still uses the OLD MyResult class.
        # pickle resolves sys.modules["repro_model"].MyResult → NEW class.
        # OLD is not NEW → PicklingError → UnserializableReturnValueError
        result = repro_agent.get_result("test")
        st.success(f"Got: {result.output}")
    except Exception as e:
        st.error(f"{type(e).__name__}: {e}")

Steps To Reproduce

Steps:

  1. streamlit run repro.py
  2. Click "1) Normal call" — succeeds.
  3. Click "2) Simulate file-watcher" — fails with UnserializableReturnValueError.
  4. Click "1) Normal call" again — also fails, until Streamlit is restarted.

Expected Behavior

Step 4 demonstrates persistence: repro_agent is never re-imported, so it keeps
using the old class, while sys.modules keeps the new one.

Neither step 3 not step 4 should throw the UnserializableReturnValueError.

Expected behavior :

Image

Current Behavior

UnserializableReturnValueError is thrown

Image

Is this a regression?

  • Yes, this used to work in a previous version.

Debug info

  • Streamlit: 1.55.0
  • Python: 3.13.7
  • OS: Windows 11 / Linux (Kubernetes)
  • Occurs more frequently in local dev (active filesystem) than in deployed containers

Additional Information

Disclosure: This bug report was written by an LLM (Claude) after a deep investigation into a production issue. The root cause analysis, reproduction script, and identified problematic code path are all LLM-generated

Suggested Fixes

Option A: Chain the original exception (low-effort, improves DX)

In _handle_cache_miss, chain the original exception so users can see the real error:

raise UnserializableReturnValueError(
    return_value=computed_value, func=self._info.func
) from ex  # <-- add "from ex"

Option B: Guard write_result against stale class references

Before calling pickle.dumps() in DataCache.write_result, re-import the module and verify class identity:

import importlib
cls = type(value)
mod = sys.modules.get(cls.__module__)
if mod is not None:
    current_cls = getattr(mod, cls.__qualname__, None)
    if current_cls is not None and current_cls is not cls:
        # Module was reloaded — update the instance's class to the current one
        value.__class__ = current_cls

Option C: Don't delete modules from sys.modules preemptively

Instead of deleting all watched modules on every file change, only invalidate the module that actually changed. Or defer the deletion to the start of the next script run rather than doing it immediately from the watcher thread.

Workaround

Return only pickle-safe primitives (dicts, strings, numbers) from @st.cache_data functions, and reconstruct custom objects outside the cache boundary:

@st.cache_data(show_spinner=False)
def _cached_call(key: str) -> dict:
    result = do_expensive_work(key)
    return result.__dict__  # dict is always pickle-safe

def get_result(key: str) -> MyResult:
    return MyResult(**_cached_call(key))

Metadata

Metadata

Assignees

Labels

area:backendRelated to Python backendfeature:cacheRelated to `st.cache_data` and `st.cache_resource`feature:file-watcherRelated to file watching for hot reloadpriority:P3Medium prioritystatus:confirmedBug has been confirmed by the Streamlit teamtype:bugSomething isn't working as expected

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions