Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading pandas is quite slow #347

Open
wlach opened this issue Mar 18, 2019 · 27 comments
Open

Loading pandas is quite slow #347

wlach opened this issue Mar 18, 2019 · 27 comments

Comments

@wlach
Copy link
Contributor

wlach commented Mar 18, 2019

I think we discussed this a few months back but I can't remember what the resolution was.

pandas seems very slow to load. I don't think it's the network connection (at least here in Toronto, the pyodide assets download almost instantly, even when not cached), but rather something that's bound on the CPU: I noticed that this problem was much less pronounced on jaredkerim's new macbook on Friday (whereas on my machine it takes 20+ seconds to initialize pandas, and also pops up the slow script dialog).

I feel like we should spend some time resolving this (if possible), as I think pandas is probably the library many people would like to use inside an iodide notebook. I am ok with stopgap solutions that don't solve the fundamental issue but move the ball forward.

@vadimkantorov
Copy link

Maybe it's pandas fault: https://alpha.iodide.io/notebooks/222/ running hangs at:
Loading pandas, pytz, python-dateutil, numpy, matplotlib, pyparsing, kiwisolver, cycler

@rth
Copy link
Member

rth commented Apr 9, 2019

@vadimkantorov Which browser (and versions) are you using? As far as I know, pandas works erratically with Chrome (#128) and the corresponding tests are currently marked as a known failure. Though maybe the situation has evolved since.

@vadimkantorov
Copy link

Ah, it's unfortunate the hanging notebook is suggested as a demo in README.

I use a Chrome for Windows, help says: Google Chrome is up to date. Version 73.0.3683.103 (Official Build) (64-bit).

@vadimkantorov
Copy link

Though running import pandas after executing https://alpha.iodide.io/notebooks/300/ works fine for me. So it can be something else causes the hang

@vadimkantorov
Copy link

When I run import pandas from a code cell, everything hangs. When I run it from the REPL, it hangs a bit an typing is unresponsive for a few seconds, but then it works fine.

@vadimkantorov
Copy link

(filed a separate issue #378)

@wlach
Copy link
Contributor Author

wlach commented May 7, 2019

Some improvements: on Firefox nightly, pandas now only takes about 6 seconds to load:

https://definitely-staging.iodide.io/notebooks/44/

Still a little slow for my liking.

@rth
Copy link
Member

rth commented May 9, 2020

@wlach when you experienced this are you sure it wasn't partially due to download speed? Nelify CDN seems to be quite slow now around 600kB/s which would correspond to 20+s for pandas: #651

Edit: I see you mentioned in the issue description that it shouldn't be the case.

@vadimkantorov
Copy link

Also, it was every time like that, if it was download problem, it would probably have been cached

@rth
Copy link
Member

rth commented May 9, 2020

Indeed, makes sense. Now however it seems there is a download issue in addition. @vadimkantorov Could you please try the download link from #651 to double-check?

@vadimkantorov
Copy link

It took for me ~10 secs (on a French connection)

@aptiko
Copy link

aptiko commented Jan 11, 2021

Hey, people, not certain I'm talking about the same problem (and I'm a pyodide beginner), but here's my findings. I'm running a CPython interpreter on my system and I'm telling it things like the following, which measures how long it takes to import numpy.

from time import perf_counter; start=perf_counter(); import numpy; print(perf_counter() - start)

I also run the same thing in pyodide. Here are some results:

  import numpy import pandas loop
CPython 0.10 0.28 0.43
pyodide 3 9 6

Times are in seconds.

(Notes: The CPython timings are with a warm disk cache, so importing is performed with little or no disk activity. Likewise, the pyodide timings are with the packages cached. In both cases, the pandas importing is performed when numpy is already imported.)

This is the loop mentioned in the last column of the table:

from time import perf_counter

start = perf_counter()
for i in range(10000000):
    pass
print(perf_counter() - start)

Therefore it doesn't look like there's anything wrong with loading pandas. It's slow alright, but this slowness seems consistent with the general slowness. Yes, importing numpy and pandas is 30 times slower than in CPython, and looping is only 14 times slower, but I'd worry more about the order of magnitude first.

Is it normal for the loop to be an order of magnitude slower than CPython?

@rth
Copy link
Member

rth commented Jan 11, 2021

Thanks for doing the benchmarks @aptiko!

Is it normal for the loop to be an order of magnitude slower than CPython?

Yes it is expected to be slower #1120 but we should try to improve it.

Yes, importing numpy and pandas is 30 times slower than in CPython, and looping is only 14 times slower, but I'd worry more about the order of magnitude first

That's interesting. It could also mean that if was only a matter of Cpython run time performance, pandas would load in 0.28 * (5 to 20) = 1.5 to 4s, not 9s. So the remaining 5s must be due to something else, likely compilation / virtual file system operations.

@rth
Copy link
Member

rth commented Jan 11, 2021

This also means that currently trying saving a few 100ms on package import in native environments could save seconds in pyodide. For instance, recently such fix was contributed to scikit-learn/scikit-learn#19102 . Such contributions upstream (likely starting from the core scientific packages) could be beneficial even if we make pyodide run faster.

@aptiko
Copy link

aptiko commented Apr 12, 2021

For the time being I've abandoned pyodide and I have experimented with C. I'm mentioning my findings here in case they help.

Initially my C code was running too slowly on the browser, something like 10x or more slower. After investigating, I found out that it was because of the standard library strtod() and strotol() functions.

Naïve versions of these functions are trivial to write, are 10-20 lines long, and they run much faster. But glibc's (or maybe musl's; I don't remember clearly) strtod() is around 1.5k lines! Much of this has to do with working in different locales.

I didn't need locales, I solved my problem by writing naïve versions, and I didn't investigate further. However I'm speculating that what causes the slowness is interaction with JS when fetching locale information.

The Python interpreter needs to read numbers while parsing Python files. If it uses strtol() and strtod() for this purpose, this could be responsible for the slowness when importing stuff.

@acreskeyMoz
Copy link

I'm seeing similar behaviour when using Pyodide.

The first use is a bit slow (under 3 seconds), but when importing packages it becomes very slow (>10 seconds on fast hardware).
I used pandas as an example, but the problem is reproduced with other packages.

Scenario Total time until executed (first run)
Run trivial python 2.7 seconds
from pandas import DataFrame
 and run trivial python 11.7 seconds
import pandas as pd and run trivial python 13.2 seconds

This profile shows that the long gap (while the DOM Worker thread is waiting) is mostly spent ion compiling the WASM.
https://share.firefox.dev/3FrCKUL

@rth
Copy link
Member

rth commented Oct 5, 2021

@aptiko Thanks a lot for sharing your insights @aptiko ! Maybe we could come up with some minimal reproducible example, possibly with ctypes then report it to emscripten.

@acreskeyMoz https://profiler.firefox.com is really neat, thanks for sharing it! Also for confirming that WASM compilation is indeed the bottleneck there (and likely for other large scientific packages). That interactive profile is indeed very helpful to understand what is happening. Do you know if WASM compilation is largely a question of size of produced .wasm binaries, or if there are any compilation flags that could we act on? We should probably go back to #1572 to reduce the binaries size as a start.

Also for subsequent loads in Firefox, as far as I understand there is some hope that compilation caching could be enabled with mozilla#1487113 assuming all security concerns can be worked around after mozilla#1487113? Chrome already does that caching.

@acreskeyMoz
Copy link

acreskeyMoz commented Oct 6, 2021

Thanks for looking into this @rth

I don't have a good understanding of the bottlenecks in WASM compilation, although a teammate has reached out to the JS team to discuss.
FWIW, I am seeing similar performance when running these tests in Chrome.

Since I'm seeing the WASM being compiled to ion, and to my understanding ion is not disc-serializable, we may still have a ways to go before this is performant enough for user-facing tasks.

I've also reproduced the findings with sklearn. For instance, importing datasets from sklearn takes about 17 seconds on my MacBook Pro.

wlach added a commit to wlach/pyodide that referenced this issue May 2, 2022
We found in pyodide#347 that one of the main culprits of slow load times
for pandas was WASM compilation times. Let's try using emscripten's
-Oz optimization just for this package, which seems to reduce
the size of the wheel from 5.0M -> 4.4M on my machine.
@wlach
Copy link
Contributor Author

wlach commented May 2, 2022

Have been looking at this a bit today during the pyodide sprint. A few ideas have come up:

  • We could use a space-saving optimization when compiling pandas specifically. @rth has apparently tried this with pyodide as a whole and there are speed penalties, but the same might not apply to the case of pandas. Or we could decide that a small speed penalty during runtime might be worth it if we can load specifically faster. I've started to explore this in Use emscripten's -Oz option for pandas #2457
  • There 41 seperate .so files and my understanding is that there is a fair amount of redundant code between them. This a known, open issue in cython without a clear resolution (see e.g. Use a shared library for shared code cython/cython#2356). One potential workaround is to concatenate the various modules into one and then compile them together (apparently other people have done this). Since they don't actually expose an external interface, it should theoretically be possible to optimize this. Though given the fact that the smallest generated version on my machine (reduction.cpython-310-x86_64-linux-gnu.so) is 18K, the potential speedups are probably limited.
  • We could try and see if there are any possible optimizations to be done in pandas itself (e.g. no longer used methods, opportunities to combine methods, etc.)

@vadimkantorov
Copy link

vadimkantorov commented May 2, 2022

Also, maybe delay loading all those 41 so's somehow? And make sure that everything indeed loads from cache (chrome won't cache anything larger than ~40Mb)
Also may be worth importing by default only the most used packages, and keeping the rest of submodules as lazy imports somehow

On other time, maybe some modules can be split so that the WASM compiler doesn't have to compile everything at once...

@rth
Copy link
Member

rth commented May 2, 2022

Thanks a lot for doing this analysis @wlach !

wlach added a commit to wlach/pyodide that referenced this issue May 3, 2022
We found in pyodide#347 that one of the main culprits of slow load times
for pandas was WASM compilation times. Let's try using emscripten's
-Oz optimization just for this package, which seems to reduce
the size of the wheel from 5.0M -> 4.4M on my machine.
@wlach
Copy link
Contributor Author

wlach commented May 3, 2022

Also, maybe delay loading all those 41 so's somehow? And make sure that everything indeed loads from cache (chrome won't cache anything larger than ~40Mb)

Ah right! There's actually something really strange pandas does more or less as a developer convenience which would for sure cause most everything to get loaded all at once:

https://github.com/pandas-dev/pandas/blob/8bc083298103eb71ed0f8f9b713a35aa5264761b/pandas/__init__.py#L24

We can probably take that out in a patch and see if it helps.

@jbrockmendel
Copy link

I spent some time optimizing pandas's import time in in late 2019-early 2020, got most of the low-hanging fruit. There's a decent chance more has accumulated since then. For me python3 -X importtime -c "import pandas as pd" takes 456867 microseconds. Is the use case here measuring something different?

There 41 seperate .so files [...]

If you're going down a path of forking pandas with the goal of decreasing the .so size, the fastest way to trim it will be to look for ctypedef fused and trim out dtypes that you know you won't be using.

@wlach
Copy link
Contributor Author

wlach commented May 4, 2022

I spent some time optimizing pandas's import time in in late 2019-early 2020, got most of the low-hanging fruit. There's a decent chance more has accumulated since then. For me python3 -X importtime -c "import pandas as pd" takes 456867 microseconds. Is the use case here measuring something different?

The constraints and performance characteristics in this environment are a little different. In particular, loading code into memory for execution seems to be significantly slower (see #347 (comment))

If you're going down a path of forking pandas with the goal of decreasing the .so size, the fastest way to trim it will be to look for ctypedef fused and trim out dtypes that you know you won't be using.

I think the preference would be to keep things as compatible as possible with upstream pandas.

wlach added a commit to wlach/pyodide that referenced this issue May 4, 2022
…code

It's not certain that this will improve things, but it can't hurt to
try. See pyodide#347.
@hoodmane
Copy link
Member

hoodmane commented May 4, 2022

I believe it is not possible to lazily load .so files unless we split pandas into several sub packages. To load a .so file requires an async call on the current thread which is very hard to arrange from import. Synclink or similar would provide no help here.

@hoodmane
Copy link
Member

hoodmane commented May 4, 2022

In particular, loading code into memory for execution seems to be significantly slower

The loader needs to carefully enforce security guarantees. From a security point of view the loader is the main attack surface in the wasm runtime, any bugs are critical security hazards. So there are a large number of assertions checked in the browser. Native loaders would disable similar assertions in production builds.

@hoodmane
Copy link
Member

hoodmane commented May 4, 2022

Thanks for investigating this @wlach! It is very much appreciated.

wlach added a commit to wlach/pyodide that referenced this issue May 4, 2022
We found in pyodide#347 that one of the main culprits of slow load times
for pandas was WASM compilation times. Let's try using emscripten's
-Oz optimization just for this package, which seems to reduce
the size of the wheel from 5.0M -> 4.4M on my machine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants