Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up _compute_dtypes function #2751

Closed
anmyachev opened this issue Feb 17, 2021 · 4 comments
Closed

speed up _compute_dtypes function #2751

anmyachev opened this issue Feb 17, 2021 · 4 comments
Labels
P2 Minor bugs or low-priority feature requests Performance 🚀 Performance related issues and pull requests.

Comments

@anmyachev
Copy link
Collaborator

anmyachev commented Feb 17, 2021

The function works extremely inefficiently when the number of columns is approximately equal to the number of rows.

For example, if df.shape = (5000, 5000) and MODIN_CPUS=44 then count_partitions=44^2.

@anmyachev anmyachev self-assigned this Feb 17, 2021
anmyachev added a commit to anmyachev/modin that referenced this issue Feb 18, 2021
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit to anmyachev/modin that referenced this issue Feb 18, 2021
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
@anmyachev anmyachev added the Performance 🚀 Performance related issues and pull requests. label Apr 21, 2022
@pyrito
Copy link
Collaborator

pyrito commented Aug 23, 2022

@anmyachev what's the status of this issue?

@pyrito pyrito added the P2 Minor bugs or low-priority feature requests label Aug 23, 2022
@anmyachev anmyachev removed their assignment Aug 25, 2022
@anmyachev
Copy link
Collaborator Author

@pyrito need someone to check how long this function takes on the master in the case above.

@Retribution98
Copy link
Collaborator

@anmyachev
This issue was fixed in PR #7245.
Time results for DF with shape (5600, 5600) using 112 CPU:

  before PR after PR
Time, s 4.6 1.8

@anmyachev
Copy link
Collaborator Author

@anmyachev This issue was fixed in PR #7245. Time results for DF with shape (5600, 5600) using 112 CPU:

  before PR after PR
Time, s 4.6 1.8

Good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Minor bugs or low-priority feature requests Performance 🚀 Performance related issues and pull requests.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants