Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mo.ui.table crashes browser when used with very large dataframe #1311

Closed
delennc opened this issue May 3, 2024 · 4 comments · Fixed by #1314
Closed

mo.ui.table crashes browser when used with very large dataframe #1311

delennc opened this issue May 3, 2024 · 4 comments · Fixed by #1314
Assignees
Labels
bug Something isn't working

Comments

@delennc
Copy link

delennc commented May 3, 2024

Describe the bug

using mo.ui.table with pandas dataframe of 1M+ rows causes the browser tab to be unresponsive, and requires force-killing the kernel

Environment

{
  "marimo": "0.4.10",
  "OS": "Darwin",
  "OS Version": "22.6.0",
  "Processor": "arm",
  "Python Version": "3.11.4",
  "Binaries": {
    "Browser": "124.0.6367.119",
    "Node": "v20.3.1"
  },
  "Requirements": {
    "click": "8.1.7",
    "importlib-resources": "6.1.0",
    "jedi": "0.19.1",
    "markdown": "3.5.1",
    "pymdown-extensions": "10.3.1",
    "pygments": "2.16.1",
    "tomlkit": "0.12.1",
    "uvicorn": "0.23.2",
    "starlette": "0.27.0",
    "websocket": "missing",
    "typing-extensions": "4.10.0",
    "black": "23.10.1"
  }
}

Code to reproduce

mo.ui.table(very_large_df)

@delennc delennc added the bug Something isn't working label May 3, 2024
@akshayka
Copy link
Contributor

akshayka commented May 3, 2024

Thanks for reporting.

@mscolnick could we lazy-load large tables with RPCs? Though we might need to solve the problem of RPCs not working with unbound variables.

It might also be worth investigating sending Parquet instead of CSV.

@mscolnick
Copy link
Contributor

Though we might need to solve the problem of RPCs not working with unbound variables.

Yea this would be a blocker

It might also be worth investigating sending Parquet instead of CSV.

It's not the file size - i tested with 1million rows which is approx 35mb. It ends up being on the table renderer/library we use. Specifically, its some logic in the library for sorting, but not actually rendering (since we only render the page size of 10)


I have a PR up #1314 that at least limits this. If people want to display above this limit - then it would be good to ask why (and likely add a different feature instead of bumping the limit)

@mscolnick mscolnick self-assigned this May 5, 2024
@akshayka
Copy link
Contributor

akshayka commented May 5, 2024

It ends up being on the table renderer/library we use. Specifically, its some logic in the library for sorting, but not actually rendering (since we only render the page size of 10)

Oh interesting. I'm surprised, sorting is usually cheap.

I have a PR up #1314 that at least limits this.

Nice, good first line of defense.

If people want to display above this limit - then it would be good to ask why

I think it's just useful to be able to view the contents of a reasonably sized frame — whether you're using mo.ui.dataframe or mo.ui.table, the need is the same.

i tested with 1million rows which is approx 35mb.

A reasonably sized frame with 1m rows can easily take >200MB when stored as CSV. But the wire format I guess won't help much unless the frontend can operate on columnar data natively.

@mscolnick
Copy link
Contributor

Sorting 1m rows is not cheap and blocks the main thread - even when virtualized. We can eventually move this to the backend for larger dfs.

Viewing 1m rows is fine - but I don't see a use case for sending them all to the frontend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants