New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
df.take is much slower against pandas #6876
Labels
Performance 🚀
Performance related issues and pull requests.
Comments
cc @dchigarev |
# import pandas as pd
import modin.pandas as pd
import numpy as np
import time
df = pd.DataFrame(data=np.random.randint(99999, 99999999, size=(100000000,1)),
columns=['C1']).squeeze(axis=1)
to_take = np.random.randint(0, 100000000, size=80000000)
t0 = time.time()
df.take(to_take, axis=0)
t1 = time.time()
print('time to take: ', t1 - t0)
# time for take: 37.6995530128479 in Modin
# time for take: 1.492713212966919 in pandas |
dchigarev
added a commit
to dchigarev/modin
that referenced
this issue
Jan 24, 2024
…icial Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
dchigarev
added a commit
to dchigarev/modin
that referenced
this issue
Jan 24, 2024
…icial Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
7 tasks
anmyachev
added a commit
that referenced
this issue
Jan 24, 2024
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
On a machine with 192 CPUs.
The text was updated successfully, but these errors were encountered: