Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the default value for MinPartitionSize variable #6384

Open
dchigarev opened this issue Jul 13, 2023 · 1 comment
Open

Increase the default value for MinPartitionSize variable #6384

dchigarev opened this issue Jul 13, 2023 · 1 comment
Assignees
Labels
P1 Important tasks that we should complete soon

Comments

@dchigarev
Copy link
Collaborator

'32' seems like a very small value for such a variable.

I personally doubt that we would actually need 4 partitions for a dataframe with the shape of (nrows=256, columns=4) as the overhead of the execution engine will likely overdo all the benefits we can get from parallelization on such small data:

class MinPartitionSize(EnvironmentVariable, type=int):
"""
Minimum number of rows/columns in a single pandas partition split.
Once a partition for a pandas dataframe has more than this many elements,
Modin adds another partition.
"""
varname = "MODIN_MIN_PARTITION_SIZE"
default = 32

We should run some tests and reconsider the default value for this

@dchigarev dchigarev added the P1 Important tasks that we should complete soon label Jul 13, 2023
@dchigarev dchigarev assigned dchigarev and Garra1980 and unassigned dchigarev Jul 13, 2023
@anmyachev
Copy link
Collaborator

anmyachev commented Sep 12, 2023

@dchigarev it seems that at first it makes sense to increase the size only by row. However, now changing this parameter will also affect the data columns. Here, apparently, it will also be necessary to revise the approach of minimal square partitions towards rectangular ones (a set of values describing two dimensions).

UPD: #7284

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Important tasks that we should complete soon
Projects
None yet
Development

No branches or pull requests

3 participants