Skip to content

df.loc much slower compared to pandas #2893

@ayushdas

Description

@ayushdas

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux 7 (Core)
  • Modin version (modin.__version__): 0.8.3
  • Python version: 3.6.8
  • Code we can use to reproduce:
    %%timeit -r 4
    import pandas as pd
    d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5,6]}
    df = pd.DataFrame(d)
    df = df.set_index(['col1', 'col2'])
    df.loc[1]

%%timeit -r 4
import modin.pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5,6]}
df = pd.DataFrame(d)
df = df.set_index(['col1', 'col2'])
df.loc[1]

Describe the problem

df.loc is taking much longer to run in Modin pandas as opposed to the vanilla pandas. The timing has been recorded by averaging over 4 runs by running the code in Jupiter notebook using timeit. (refer code snippet for reproducing the code).

Source code / logs

3.41 ms ± 268 µs per loop using vanilla pandas df.loc
29.4 ms ± 514 µs per loop using Modin pandas df.loc

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🦗Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions