improvement: patch polars I/O in wasm#9413
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR improves Polars behavior in WASM/Pyodide by introducing a small WASM-only patching framework and using it to provide fallback implementations for Polars I/O (URL/file-like) and DataFrame.write_json when native paths fail in Pyodide.
Changes:
- Add
WasmPatchSetto install/undo WASM-only monkey patches with a single unpatch handle. - Add
patch_polars_for_wasm()to patch Polars read/scan I/O andDataFrame.write_jsonwith Pyodide-friendly fallbacks. - Update formatters and tests (unit + Pyodide acceptance) to exercise the new behavior and remove the old Polars JSON patch.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/_runtime/test_patches.py | Switch to patch_polars_for_wasm() and add targeted tests for the patch framework + Polars I/O fallbacks. |
| tests/_pyodide/test_pyodide_acceptance.mjs | Extend the acceptance harness to serve in-memory fixtures and verify Polars URL I/O in Pyodide. |
| marimo/_runtime/patches.py | Remove the legacy patch_polars_write_json implementation. |
| marimo/_runtime/_wasm_patches.py | New WASM-only patching utility with centralized unpatching. |
| marimo/_runtime/_polars_wasm.py | New Polars WASM fallback logic for read/scan I/O and write_json. |
| marimo/_output/formatters/df_formatters.py | Apply the new Polars WASM patching entry point during Polars formatter registration. |
There was a problem hiding this comment.
1 issue found across 6 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="marimo/_runtime/_polars_wasm.py">
<violation number="1" location="marimo/_runtime/_polars_wasm.py:207">
P1: Naive `line.split(",")` CSV parsing silently corrupts data containing commas, quotes, or newlines, and coerces every value to a string. Use `self.to_dicts()` instead—it preserves types, handles all values correctly, and doesn't involve I/O so it works in WASM.</violation>
</file>
Architecture diagram
sequenceDiagram
participant User as Python User Code
participant Polars as Polars API
participant Wrapper as NEW: WasmPatchSet Wrapper
participant Rust as Polars Rust Core
participant FB as NEW: Fallback Handler
participant XHR as Browser XMLHttpRequest
participant Arrow as Pyarrow (Fallback Decoder)
participant Server as External Web Server
Note over User,Server: Runtime behavior in WASM (Pyodide) environment
User->>Polars: read_csv("https://data.csv")
Polars->>Wrapper: Call intercepted
Wrapper->>Rust: CHANGED: Attempt original call
activate Rust
Note right of Rust: Fails: No network access<br/>or missing modules (fsspec)
Rust-->>Wrapper: Raise NameError/Exception
deactivate Rust
alt Catch specific exceptions
Wrapper->>FB: NEW: Execute fallback(original, *args)
activate FB
opt Source is URL
FB->>XHR: NEW: GET request (Synchronous)
XHR->>Server: Fetch data
Server-->>XHR: 200 OK (Bytes)
XHR-->>FB: Return bytes
end
FB->>Arrow: NEW: Decode bytes via pyarrow.csv
Arrow-->>FB: pyarrow.Table
FB->>Polars: pl.from_arrow(table)
Polars-->>FB: polars.DataFrame
opt Is "scan_*" operation
FB->>FB: NEW: materialize .lazy()
end
FB-->>Wrapper: Return DataFrame/LazyFrame
deactivate FB
Wrapper-->>User: Result returned to user
else Fallback also fails
FB-->>Wrapper: Raise Exception
Wrapper-->>User: Reraise original exception chained with fallback error
end
Note over User,Polars: Outside WASM environment: WasmPatchSet is a no-op
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="marimo/_runtime/_polars_wasm.py">
<violation number="1" location="marimo/_runtime/_polars_wasm.py:207">
P1: `to_dicts()` preserves Python-native temporal/decimal types (`datetime`, `date`, `timedelta`, `Decimal`) that are not JSON-serializable. The old CSV path stringified everything, so `json.dumps` always succeeded. Now any DataFrame with a date or datetime column will raise `TypeError` here.
Pass `default=str` to all four downstream `json.dumps`/`json.dump` calls to maintain the same broad-type support.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
- pyarrow missing now raises ModuleNotFoundError from the WASM fallback via DependencyManager.pyarrow.require, so marimo can prompt to install it. The wasm wrapper no longer masks ModuleNotFoundError with the original exception. - Preserve the original traceback when re-raising via with_traceback. - write_json fallback uses DataFrame.to_dicts() instead of naive CSV splitting, which preserves types and handles quoted/embedded delimiters correctly.
187d22a to
b8ce19c
Compare
- Pass default=str to write_json fallback's json dumps so temporal/decimal types (datetime, date, Decimal) round-trip like the old CSV path did. - Replace URL-based polars test with a direct fallback exercise. Sync HTTP isn't available in Node.js pyodide (no XHR shim, no SharedArrayBuffer setup), so the URL path was failing both for original polars and the urllib-based fallback. URL fetching is covered by the unit tests.
| SignatureParamName.__init__ = enhanced_init | ||
|
|
||
|
|
||
| def patch_polars_write_json() -> Unpatch: |
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.5-dev3 |
Polars I/O fails in pyodide because the Rust core can't reach the network and its Python fallbacks reference modules (fsspec/aiohttp) that aren't installed, surfacing as NameError and friends. URL-based
read_csv / read_parquet / scan_*etc. are unusable today.Add a small WASM patch framework (
WasmPatchSet) that wraps a target callable: try original, on caught exception run a fallback; if the fallback also raises, re-raise the original with the fallback chained so users see the real error. Returns a single unpatch handle and is a no-op outside pyodide.Use it to wrap polars read_csv/scan_csv, read_parquet/scan_parquet, read_ipc/scan_ipc, read_ndjson/scan_ndjson, and read_json. The fallback resolves the source to bytes (URLs via sync XMLHttpRequest, file-likes/paths/bytes locally), decodes via pyarrow, and returns polars via pl.from_arrow. scan_* materialize eagerly then .lazy() — true streaming wouldn't work in WASM anyway.