fix: Use decoding as a heuristic for binary data, and download accordingly by dmadisetti · Pull Request #8858 · marimo-team/marimo

dmadisetti · 2026-03-24T23:23:51Z

📝 Summary

We previously encoded blobs as base64 if they were not utf-8 compat. As a consequence, for files whose mimetype we could not dertermine, we saved the result as a base64 blob.

This PR introduces a is_base64 flag to inform the fe it should handle the file contents a little differently. The media download behavior is preserved if the mimetype is recognized- otherwise the raw bytes are downloaded.

Closes #5361

…ing on the frontend

vercel · 2026-03-24T23:23:56Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
marimo-docs	Ready	Preview, Comment	Mar 25, 2026 5:15pm

for more information, see https://pre-commit.ci

Copilot

Pull request overview

Adds an explicit isBase64/is_base64 flag to the file-details API so the frontend can distinguish textual contents from base64-encoded binary payloads and download files correctly when MIME type detection is missing/insufficient.

Changes:

Backend: FileDetailsResponse now includes is_base64; OSFileSystem.open_file returns bytes for non-decodable files and get_details base64-encodes those bytes while setting the flag.
OpenAPI + frontend: schema updated and download logic updated to decode base64 to bytes before creating a Blob.
Tests/smoke test: adds coverage for binary file behavior and a smoke test app for GH-5361.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/_server/files/test_os_file_system.py	Adds assertions for `open_file` returning `str` vs `bytes` and verifies `is_base64` behavior.
tests/_server/api/endpoints/test_file_explorer.py	Extends endpoint tests to assert `isBase64` and base64 contents for binary.
packages/openapi/src/api.ts	Updates generated TS types to include `isBase64`.
packages/openapi/api.yaml	Adds `isBase64` to the `FileDetailsResponse` schema.
marimo/_smoke_tests/issues/5361.py	New smoke test for binary download scenarios (GH-5361).
marimo/_server/models/files.py	Adds `is_base64` to `FileDetailsResponse` (camel-cased in JSON).
marimo/_server/files/os_file_system.py	Implements `is_base64` computation and changes `open_file` to return raw bytes on decode failure.
marimo/_server/files/file_system.py	Updates the abstract `open_file` return type to `str
marimo/_server/api/endpoints/file_explorer.py	Removes outdated TODO comment about raw bytes support.
frontend/src/components/editor/file-tree/file-viewer.tsx	Decodes base64 payloads to bytes for downloads; continues media download path.
frontend/src/components/editor/file-tree/file-explorer.tsx	Decodes base64 payloads to bytes for downloads from the tree dropdown.

Copilot · 2026-03-24T23:29:14Z

marimo/_server/files/os_file_system.py

+    def open_file(self, path: str, encoding: str | None = None) -> str | bytes:
        file_path = Path(path)
        try:
            return file_path.read_text(encoding=encoding)
        except UnicodeDecodeError:
-            # If its a UnicodeDecodeError, try as bytes and convert to base64
-            return base64.b64encode(file_path.read_bytes()).decode("utf-8")
+            return file_path.read_bytes()

    def create_file_or_directory(


open_file() currently falls back to Path.read_text() with encoding=None, which uses the platform’s preferred encoding (e.g. cp1252 on Windows). That encoding can successfully decode arbitrary bytes, meaning truly-binary files may be misclassified as text and never trigger the UnicodeDecodeError path. This breaks the new is_base64 behavior (and the associated tests) and can corrupt downloads. Consider always attempting a strict UTF-8 decode when encoding is not provided (or implement an explicit bytes-read + decode attempt with the desired encoding) so binary detection is consistent across OSes.

marimo/_server/files/file_system.py

frontend/src/components/editor/file-tree/file-viewer.tsx

JiwaniZakir

The change in os_file_system.py from file_path.read_text(encoding=encoding) to file_path.read_text(encoding=encoding or "utf-8") is a subtle but impactful behavioral shift: previously, passing encoding=None would use the platform's default encoding, which correctly handled Latin-1, Shift-JIS, or Windows-1252 text files on appropriately configured systems. Now, those files will silently fail the UTF-8 decode, get treated as binary, and be base64-encoded — meaning they'll download as raw bytes rather than being viewable as text in the file viewer. If the intent is to standardize on UTF-8, that's reasonable, but the downstream consequence (text files in other encodings now presented as binary) deserves a comment or explicit design decision note.

In file-viewer.tsx, the handleDownload path for isBase64 && !isMediaMime does a round-trip through base64ToDataURL → deserializeBlob, which converts base64 → data URL string → Blob. This could be done more directly by decoding the base64 string to a Uint8Array and constructing the Blob from that, avoiding the intermediate data URL string allocation — worth considering for large binary files.

The addition of the explicit is_base64: bool = False field to FileDetailsResponse is a clean improvement over the previous implicit convention where consumers had to guess whether contents was raw text or base64 depending on context.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Copilot · 2026-03-26T03:26:33Z

frontend/src/components/editor/file-tree/file-viewer.tsx

+    if (data.isBase64 && data.contents) {
+      if (isMediaMime(mimeType)) {
+        const dataURL = base64ToDataURL(
+          data.contents as Base64String,
+          mimeType,
+        );
+        downloadByURL(dataURL, data.file.name);
+      } else {
+        const blob = deserializeBlob(
+          base64ToDataURL(
+            data.contents as Base64String,
+            data.mimeType || "application/octet-stream",
+          ),
+        );
+        downloadBlob(blob, data.file.name);
+      }


isBase64 is only used to decide how to download, but the rest of this component still treats data.contents as text/base64 based purely on mimeType. In particular, media rendering later in the file passes data.contents as base64 even when isBase64 is false (e.g., image/svg+xml where contents is UTF-8 text), which will produce an invalid data URL and broken preview. Consider gating media rendering (and the editable text editor / copy-save actions) on data.isBase64, and for non-base64 media either render via a Blob/object URL or fall back to a text preview.

Copilot · 2026-03-26T03:26:34Z

frontend/src/components/editor/file-tree/file-explorer.tsx

+                  downloadBlob(
+                    new Blob([details.contents || ""]),
+                    node.data.name,
+                  );


When details.isBase64 is false, the download path creates new Blob([details.contents || ""]) without setting a MIME type. This can cause the downloaded file to be treated as an unknown type (and can break expected behavior for text files like SVG/XML). Prefer passing { type: details.mimeType || "text/plain" } (or a sensible default) when constructing the Blob.

Suggested change

downloadBlob(

new Blob([details.contents || ""]),

node.data.name,

);

const blob = new Blob([details.contents || ""], {

type: details.mimeType || "text/plain",

});

downloadBlob(blob, node.data.name);

Copilot · 2026-03-26T03:26:34Z

tests/_server/api/endpoints/test_file_explorer.py

+    with open(bin_path, "wb") as f:
+        f.write(raw_bytes)
+    response = client.post(
+        "/api/files/file_details",
+        headers=HEADERS,
+        json={"path": bin_path},
+    )
+    assert response.status_code == 200, response.text
+    data = response.json()
+    assert data["isBase64"] is True
+    assert data["contents"] == base64.b64encode(raw_bytes).decode("utf-8")
+    os.remove(bin_path)


This test manually writes binary_file.bin into the shared module-level TemporaryDirectory and only deletes it at the end. If an assertion fails before os.remove(bin_path), the file will be left behind and can affect subsequent tests. Use a try/finally for cleanup (or a per-test tmp_path fixture) to guarantee removal.

Suggested change

with open(bin_path, "wb") as f:

f.write(raw_bytes)

response = client.post(

"/api/files/file_details",

headers=HEADERS,

json={"path": bin_path},

)

assert response.status_code == 200, response.text

data = response.json()

assert data["isBase64"] is True

assert data["contents"] == base64.b64encode(raw_bytes).decode("utf-8")

os.remove(bin_path)

try:

with open(bin_path, "wb") as f:

f.write(raw_bytes)

response = client.post(

"/api/files/file_details",

headers=HEADERS,

json={"path": bin_path},

)

assert response.status_code == 200, response.text

data = response.json()

assert data["isBase64"] is True

assert data["contents"] == base64.b64encode(raw_bytes).decode("utf-8")

finally:

if os.path.exists(bin_path):

os.remove(bin_path)

Copilot · 2026-03-26T03:26:35Z

marimo/_smoke_tests/issues/5361.py

+    test_dir = os.path.dirname(os.path.abspath(__file__))
+
+    # 1. cProfile .prof file
+    prof_path = os.path.join(test_dir, "sample.prof")
+    profiler = cProfile.Profile()
+    profiler.enable()
+    sum(range(10000))
+    profiler.disable()
+    profiler.dump_stats(prof_path)
+
+    # 2. Generic binary file with non-UTF8 bytes
+    bin_path = os.path.join(test_dir, "sample.bin")
+    with open(bin_path, "wb") as f:
+        f.write(struct.pack("!4sIf16s", b"MAGIC", 42, 3.14, b"\xff\xfe\xfd" + b"\x00" * 13))
+
+    # 3. Pickle file
+    pkl_path = os.path.join(test_dir, "sample.pkl")
+    with open(pkl_path, "wb") as f:
+        pickle.dump({"key": "value", "numbers": [1, 2, 3], "bytes": b"\xff\xfe"}, f)
+
+    # 4. A plain text file (control: should still work)
+    txt_path = os.path.join(test_dir, "sample.txt")
+    with open(txt_path, "w") as f:
+        f.write("This is a plain text file.\nIt should download correctly.\n")
+
+    # 5. A minimal valid PNG (1x1 red pixel) — tests the isMediaMime/dataURL path
+    def _make_png():
+        def chunk(ctype, data):
+            c = ctype + data
+            return struct.pack(">I", len(data)) + c + struct.pack(">I", zlib.crc32(c) & 0xFFFFFFFF)
+
+        return (
+            b"\x89PNG\r\n\x1a\n"
+            + chunk(b"IHDR", struct.pack(">IIBBBBB", 1, 1, 8, 2, 0, 0, 0))
+            + chunk(b"IDAT", zlib.compress(b"\x00\xff\x00\x00"))
+            + chunk(b"IEND", b"")
+        )
+
+    png_path = os.path.join(test_dir, "sample.png")
+    with open(png_path, "wb") as f:
+        f.write(_make_png())
+
+    print(f"Created test files in {test_dir}:")
+    for name in ["sample.prof", "sample.bin", "sample.pkl", "sample.txt", "sample.png"]:
+        p = os.path.join(test_dir, name)
+        print(f"  {name}: {os.path.getsize(p)} bytes")


This smoke test writes several files (sample.prof, sample.bin, sample.pkl, sample.txt, sample.png) directly into the repository directory containing the test (os.path.dirname(__file__)) and never cleans them up. Running smoke tests locally/CI can leave the working tree dirty and may cause confusion if these artifacts are committed accidentally. Consider writing into a temporary directory (and pointing the user to it), or add cleanup logic at the end of the cell/script.

github-actions · 2026-03-26T17:54:17Z

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.21.2-dev59

dmadisetti added 2 commits March 24, 2026 15:58

fix: Use decoding as a heuristic for binary data, and download accord…

5731e9a

…ing on the frontend

test: added verification smoke test

15fd103

Copilot AI review requested due to automatic review settings March 24, 2026 23:23

github-actions bot added the bash-focus Area to focus on during release bug bash label Mar 24, 2026

dmadisetti changed the title ~~Dm/mo 3217~~ @dmadisetti fix: Use decoding as a heuristic for binary data, and download accordingly Mar 24, 2026

Copilot started reviewing on behalf of dmadisetti March 24, 2026 23:24 View session

dmadisetti changed the title ~~@dmadisetti fix: Use decoding as a heuristic for binary data, and download accordingly~~ fix: Use decoding as a heuristic for binary data, and download accordingly Mar 24, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

bec42ba

for more information, see https://pre-commit.ci

pre-commit-ci bot requested review from Light2Dark and manzt as code owners March 24, 2026 23:25

vercel bot deployed to Preview March 24, 2026 23:26 View deployment

dmadisetti added the bug Something isn't working label Mar 24, 2026

Copilot AI reviewed Mar 24, 2026

View reviewed changes

dmadisetti added 2 commits March 24, 2026 16:35

lint: type fix

3d7c848

fix: merge res

8c06920

vercel bot deployed to Preview March 24, 2026 23:39 View deployment

fix: typing

23004ee

vercel bot deployed to Preview March 25, 2026 00:15 View deployment

vercel bot deployed to Preview March 25, 2026 00:18 View deployment

Light2Dark previously approved these changes Mar 25, 2026

View reviewed changes

fix: frontend typing

abd9236

dmadisetti dismissed Light2Dark’s stale review via abd9236 March 25, 2026 17:09

dmadisetti force-pushed the dm/mo-3217 branch from 9616bfe to abd9236 Compare March 25, 2026 17:09

vercel bot deployed to Preview March 25, 2026 17:15 View deployment

JiwaniZakir reviewed Mar 25, 2026

View reviewed changes

dmadisetti requested a review from Light2Dark March 25, 2026 18:11

Light2Dark requested a review from Copilot March 26, 2026 03:20

Copilot started reviewing on behalf of Light2Dark March 26, 2026 03:21 View session

Light2Dark approved these changes Mar 26, 2026

View reviewed changes

Copilot AI reviewed Mar 26, 2026

View reviewed changes

dmadisetti merged commit 285f02c into main Mar 26, 2026
52 checks passed

dmadisetti deleted the dm/mo-3217 branch March 26, 2026 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Use decoding as a heuristic for binary data, and download accordingly#8858

fix: Use decoding as a heuristic for binary data, and download accordingly#8858
dmadisetti merged 7 commits intomainfrom
dm/mo-3217

dmadisetti commented Mar 24, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JiwaniZakir left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dmadisetti commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Summary

Uh oh!

vercel bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dmadisetti commented Mar 24, 2026 •

edited

Loading

vercel bot commented Mar 24, 2026 •

edited

Loading