Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The query_compiler.merge reconstructs the Right dataframe for every partition of Left Dataframe #6879

Closed
arunjose696 opened this issue Jan 25, 2024 · 0 comments · Fixed by #6880
Labels
Internals Internal modin functionality

Comments

@arunjose696
Copy link
Collaborator

The query_compiler.merge reconstructs the Right dataframe from its partitions for every partition of Left Dataframe, The concat operation results in higher memory consumption when the size of right dataframe is large.

A possible option is to combine the right Dataframe partitions to a single partition dataframe by calling a remote function. This single partiotion dataframe is then passed to each partition of left dataframe thus avoiding the reconstruction in every worker while doing merge.

@anmyachev anmyachev added the Internals Internal modin functionality label Jan 26, 2024
anmyachev added a commit that referenced this issue Feb 13, 2024
…ng in query_compiler.merge (#6880)

Signed-off-by: arunjose696 <arunjose696@gmail.com>
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Co-authored-by: Anatoly Myachev <anatoliimyachev@mail.com>
Co-authored-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Internal modin functionality
Projects
None yet
2 participants