fix: follow pycache_prefix for __marimo__ location#8797
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
for more information, see https://pre-commit.ci
| placed next to the notebook file. | ||
|
|
||
| Resolution order: | ||
| 1. ``sys.pycache_prefix`` (if set and path is absolute): mirror tree |
There was a problem hiding this comment.
how often is sys.pycache_prefix set? will this break existing notebooks given <notebook_parent>/__marimo__ is always the expectation.
There was a problem hiding this comment.
Unsure, it was unset in the 4-5 envs I just tried (including molab).
I can add some docs so this feels less magic
There was a problem hiding this comment.
Pull request overview
This PR centralizes computation of the per-notebook __marimo__ output directory and updates multiple subsystems (session cache, persistent cache, exports, and OpenGraph assets) to follow sys.pycache_prefix when set, addressing read-only filesystem usage and nested-notebook cache placement.
Changes:
- Add
notebook_output_dir()(andMARIMO_DIR_NAME) to compute the__marimo__directory, mirroring absolute notebook paths undersys.pycache_prefixwhen configured. - Route session cache paths, persistent cache store paths, auto-export output paths, and OpenGraph image resolution through
notebook_output_dir(). - Update/extend tests to cover pycache-prefix relocation and make FileStore’s default path resolution lazy.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| marimo/_utils/paths.py | Introduces MARIMO_DIR_NAME and notebook_output_dir() with sys.pycache_prefix support. |
| marimo/_session/state/serialize.py | Session cache files now go under notebook_output_dir(...)/session. |
| marimo/_save/stores/file.py | FileStore default cache path now uses notebook_output_dir() and is resolved lazily. |
| marimo/_server/export/exporter.py | Auto-export output directory now uses notebook_output_dir() (relocatable). |
| marimo/_server/api/endpoints/assets.py | OpenGraph thumbnail serving resolves __marimo__/... paths against the relocated output dir. |
| marimo/_metadata/opengraph.py | Adds absolute default image path helper and switches existence check to use relocated output dir. |
| marimo/_cli/export/thumbnail.py | Default thumbnail output now uses the relocated absolute path helper. |
| tests/_utils/test_paths.py | Adds unit tests for notebook_output_dir() including sys.pycache_prefix behavior. |
| tests/_session/state/test_serialize_session.py | Adds coverage for session cache location under sys.pycache_prefix. |
| tests/_save/stores/test_file.py | Adds test ensuring FileStore default path resolution is lazy. |
| tests/_metadata/test_opengraph.py | Updates tests to use the new absolute image path helper and adds sys.pycache_prefix coverage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # On Unix "/" -> parts[1:], on Windows "C:\\" -> parts[1:]. | ||
| relative = Path(*parent.parts[1:]) if len(parent.parts) > 1 else Path() |
There was a problem hiding this comment.
On Windows, building the mirrored path via Path(*parent.parts[1:]) drops the drive (e.g., C:/D:). That can cause collisions under sys.pycache_prefix when two notebooks live on different drives but share the same path suffix. Consider including a drive component when parent.drive is set (similar to Python’s __pycache__ mirroring) so C:\foo\bar and D:\foo\bar map to distinct trees.
| # On Unix "/" -> parts[1:], on Windows "C:\\" -> parts[1:]. | |
| relative = Path(*parent.parts[1:]) if len(parent.parts) > 1 else Path() | |
| # On Unix "/" -> parts[1:], on Windows mirror the drive so that | |
| # different drives map to distinct trees under the prefix. | |
| if parent.drive: | |
| # E.g. "C:" -> "C", matching how Python mirrors __pycache__. | |
| drive_component = parent.drive.replace(":", "") | |
| relative = ( | |
| Path(drive_component, *parent.parts[1:]) | |
| if len(parent.parts) > 1 | |
| else Path(drive_component) | |
| ) | |
| else: | |
| relative = ( | |
| Path(*parent.parts[1:]) | |
| if len(parent.parts) > 1 | |
| else Path() | |
| ) |
| def default_opengraph_image(filepath: str) -> str: | ||
| """Return the default relative image path for a given notebook.""" | ||
| stem = Path(filepath).stem | ||
| return f"__marimo__/assets/{stem}/{DEFAULT_OPENGRAPH_IMAGE_FILENAME}" | ||
| return str( | ||
| Path(MARIMO_DIR_NAME) | ||
| / "assets" | ||
| / stem | ||
| / DEFAULT_OPENGRAPH_IMAGE_FILENAME | ||
| ) |
There was a problem hiding this comment.
default_opengraph_image() now returns str(Path(...)), which will use OS-specific separators (backslashes on Windows). Since this value can be persisted in config/metadata and later interpreted on non-Windows hosts (where backslashes are not path separators), it’s safer to return a normalized POSIX-style relative path (e.g., via PurePosixPath(...).as_posix() or manual '/' joining).
| relative_parent = Path( | ||
| *notebook.parent.parts[1:] | ||
| ) # strip root (/ or C:\) |
There was a problem hiding this comment.
These tests mirror notebook_output_dir() by stripping only parts[0] from an absolute path. If notebook_output_dir() is updated to preserve the Windows drive letter (to avoid collisions across drives), the expected relative_parent computation here should be updated similarly (e.g., include a drive component when present).
| relative_parent = Path( | |
| *notebook.parent.parts[1:] | |
| ) # strip root (/ or C:\) | |
| parent = notebook.parent | |
| drive_component = (parent.drive,) if parent.drive else () | |
| relative_parent = Path(*drive_component, *parent.parts[1:]) # strip root (/ or C:\) but keep drive if present |
| path = tmp_path / "app" / "notebooks" / "notebook.py" | ||
| cache_file = get_session_cache_file(path) | ||
| relative_parent = Path(*path.parent.parts[1:]) # strip root | ||
| assert cache_file == ( | ||
| prefix | ||
| / relative_parent | ||
| / "__marimo__" | ||
| / "session" | ||
| / "notebook.py.json" | ||
| ) |
There was a problem hiding this comment.
This expected path mirrors the current implementation by stripping only the root component from path.parent.parts. If the implementation is adjusted to include a Windows drive component under sys.pycache_prefix (to prevent collisions across drives), this test should be updated accordingly.
| @@ -160,12 +160,27 @@ def derive_title_from_path(filepath: str) -> str: | |||
| def default_opengraph_image(filepath: str) -> str: | |||
There was a problem hiding this comment.
@peter-gy is this used in the file-system or a URL? should this be posix or detect the OS file separator?
There was a problem hiding this comment.
It is a notebook-relative internal path that we store in OpenGraph metadata and later resolve on disk when serving the thumbnail. I'd keep this posix and use default_opengraph_image_abs introduced by this pr for actual fs operations.
| `foo/bar/__marimo__/session/baz.py.json`. | ||
| """ | ||
| return path.parent / "__marimo__" / "session" / f"{path.name}.json" | ||
| from marimo._utils.paths import notebook_output_dir | ||
|
|
||
| return notebook_output_dir(path) / "session" / f"{path.name}.json" |
There was a problem hiding this comment.
The docstring/examples here hard-code foo/bar/__marimo__/session/..., but the implementation now relocates via notebook_output_dir() when sys.pycache_prefix is set. Update the docstring to mention the mirrored-under-prefix behavior (and that the shown path only applies when the prefix is unset).
|
@dmadisetti I don't see any missing opengraph dump. in #8097 the main related parts I introduced were (1) default thumbnail write path in |
|
We can do a breaking release next. |
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.21.2-dev62 |
📝 Summary
fixes #7173 #8676
The
__marimo__path was previously immutable. This PR introduces the convention of followingsys.pycache_prefix.Additionally, this PR fixes the cache location for notebooks in nested directories. Note, this will break cache, but that's fine because next release contains a different breaking PR #8793
There may be some opengraph dump location I'm missing @peter-gy