Skip to content

improvement: patch polars I/O in wasm#9413

Merged
mscolnick merged 6 commits intomainfrom
ms/improvement-patch-polars-io-in-wasm
Apr 28, 2026
Merged

improvement: patch polars I/O in wasm#9413
mscolnick merged 6 commits intomainfrom
ms/improvement-patch-polars-io-in-wasm

Conversation

@mscolnick
Copy link
Copy Markdown
Contributor

@mscolnick mscolnick commented Apr 28, 2026

Polars I/O fails in pyodide because the Rust core can't reach the network and its Python fallbacks reference modules (fsspec/aiohttp) that aren't installed, surfacing as NameError and friends. URL-based read_csv / read_parquet / scan_* etc. are unusable today.

Add a small WASM patch framework (WasmPatchSet) that wraps a target callable: try original, on caught exception run a fallback; if the fallback also raises, re-raise the original with the fallback chained so users see the real error. Returns a single unpatch handle and is a no-op outside pyodide.

Use it to wrap polars read_csv/scan_csv, read_parquet/scan_parquet, read_ipc/scan_ipc, read_ndjson/scan_ndjson, and read_json. The fallback resolves the source to bytes (URLs via sync XMLHttpRequest, file-likes/paths/bytes locally), decodes via pyarrow, and returns polars via pl.from_arrow. scan_* materialize eagerly then .lazy() — true streaming wouldn't work in WASM anyway.

Copilot AI review requested due to automatic review settings April 28, 2026 13:47
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Apr 28, 2026 7:12pm

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Polars behavior in WASM/Pyodide by introducing a small WASM-only patching framework and using it to provide fallback implementations for Polars I/O (URL/file-like) and DataFrame.write_json when native paths fail in Pyodide.

Changes:

  • Add WasmPatchSet to install/undo WASM-only monkey patches with a single unpatch handle.
  • Add patch_polars_for_wasm() to patch Polars read/scan I/O and DataFrame.write_json with Pyodide-friendly fallbacks.
  • Update formatters and tests (unit + Pyodide acceptance) to exercise the new behavior and remove the old Polars JSON patch.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/_runtime/test_patches.py Switch to patch_polars_for_wasm() and add targeted tests for the patch framework + Polars I/O fallbacks.
tests/_pyodide/test_pyodide_acceptance.mjs Extend the acceptance harness to serve in-memory fixtures and verify Polars URL I/O in Pyodide.
marimo/_runtime/patches.py Remove the legacy patch_polars_write_json implementation.
marimo/_runtime/_wasm_patches.py New WASM-only patching utility with centralized unpatching.
marimo/_runtime/_polars_wasm.py New Polars WASM fallback logic for read/scan I/O and write_json.
marimo/_output/formatters/df_formatters.py Apply the new Polars WASM patching entry point during Polars formatter registration.

Comment thread marimo/_runtime/_wasm/_polars.py
Comment thread marimo/_runtime/_wasm_patches.py Outdated
Comment thread marimo/_runtime/_polars_wasm.py Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_runtime/_polars_wasm.py">

<violation number="1" location="marimo/_runtime/_polars_wasm.py:207">
P1: Naive `line.split(",")` CSV parsing silently corrupts data containing commas, quotes, or newlines, and coerces every value to a string. Use `self.to_dicts()` instead—it preserves types, handles all values correctly, and doesn't involve I/O so it works in WASM.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant User as Python User Code
    participant Polars as Polars API
    participant Wrapper as NEW: WasmPatchSet Wrapper
    participant Rust as Polars Rust Core
    participant FB as NEW: Fallback Handler
    participant XHR as Browser XMLHttpRequest
    participant Arrow as Pyarrow (Fallback Decoder)
    participant Server as External Web Server

    Note over User,Server: Runtime behavior in WASM (Pyodide) environment

    User->>Polars: read_csv("https://data.csv")
    Polars->>Wrapper: Call intercepted
    
    Wrapper->>Rust: CHANGED: Attempt original call
    activate Rust
    Note right of Rust: Fails: No network access<br/>or missing modules (fsspec)
    Rust-->>Wrapper: Raise NameError/Exception
    deactivate Rust

    alt Catch specific exceptions
        Wrapper->>FB: NEW: Execute fallback(original, *args)
        activate FB
        
        opt Source is URL
            FB->>XHR: NEW: GET request (Synchronous)
            XHR->>Server: Fetch data
            Server-->>XHR: 200 OK (Bytes)
            XHR-->>FB: Return bytes
        end
        
        FB->>Arrow: NEW: Decode bytes via pyarrow.csv
        Arrow-->>FB: pyarrow.Table
        
        FB->>Polars: pl.from_arrow(table)
        Polars-->>FB: polars.DataFrame
        
        opt Is "scan_*" operation
            FB->>FB: NEW: materialize .lazy()
        end
        
        FB-->>Wrapper: Return DataFrame/LazyFrame
        deactivate FB
        Wrapper-->>User: Result returned to user
        
    else Fallback also fails
        FB-->>Wrapper: Raise Exception
        Wrapper-->>User: Reraise original exception chained with fallback error
    end

    Note over User,Polars: Outside WASM environment: WasmPatchSet is a no-op
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread marimo/_runtime/_polars_wasm.py Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="marimo/_runtime/_polars_wasm.py">

<violation number="1" location="marimo/_runtime/_polars_wasm.py:207">
P1: `to_dicts()` preserves Python-native temporal/decimal types (`datetime`, `date`, `timedelta`, `Decimal`) that are not JSON-serializable. The old CSV path stringified everything, so `json.dumps` always succeeded. Now any DataFrame with a date or datetime column will raise `TypeError` here.

Pass `default=str` to all four downstream `json.dumps`/`json.dump` calls to maintain the same broad-type support.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread marimo/_runtime/_wasm/_polars.py
- pyarrow missing now raises ModuleNotFoundError from the WASM fallback
  via DependencyManager.pyarrow.require, so marimo can prompt to install
  it. The wasm wrapper no longer masks ModuleNotFoundError with the
  original exception.
- Preserve the original traceback when re-raising via with_traceback.
- write_json fallback uses DataFrame.to_dicts() instead of naive
  CSV splitting, which preserves types and handles quoted/embedded
  delimiters correctly.
- Pass default=str to write_json fallback's json dumps so temporal/decimal
  types (datetime, date, Decimal) round-trip like the old CSV path did.
- Replace URL-based polars test with a direct fallback exercise. Sync HTTP
  isn't available in Node.js pyodide (no XHR shim, no SharedArrayBuffer
  setup), so the URL path was failing both for original polars and the
  urllib-based fallback. URL fetching is covered by the unit tests.
Comment thread marimo/_runtime/_wasm/_polars.py
SignatureParamName.__init__ = enhanced_init


def patch_polars_write_json() -> Unpatch:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice refactor

@mscolnick mscolnick requested a review from dmadisetti April 28, 2026 19:20
@mscolnick mscolnick merged commit 2fdba9d into main Apr 28, 2026
44 checks passed
@mscolnick mscolnick deleted the ms/improvement-patch-polars-io-in-wasm branch April 28, 2026 19:31
@github-actions
Copy link
Copy Markdown

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.5-dev3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants