Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Using vanilla NumPy methods on a Modin Dataframe fails #6287

Open
3 tasks done
RehanSD opened this issue Jun 23, 2023 · 0 comments
Open
3 tasks done

BUG: Using vanilla NumPy methods on a Modin Dataframe fails #6287

RehanSD opened this issue Jun 23, 2023 · 0 comments
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon

Comments

@RehanSD
Copy link
Collaborator

RehanSD commented Jun 23, 2023

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

# Modin
import modin.pandas as pd
import modin.numpy as np

# Vanilla 
import pandas as vpd
import numpy as vnp

cols = [f"feature_{i}" for i in range(10)]
# This is not enabled in modin.numpy hence using vanilla
npy_array = vnp.random.rand(50, 10)

vdf = vpd.DataFrame(npy_array, columns=cols)
df = pd.DataFrame(vdf)

# Works
tmp = vnp.where(
    df["feature_0"] + df["feature_1"] > df["feature_2"] + df["feature_3"], 1, 0
)

# Doesn't work
tmp = vnp.where(
    df["feature_0"] + df["feature_1"] > df["feature_2"] + df["feature_3"], 1, 0
)
# Doesn't work
tmp = np.where(
    df["feature_0"] + df["feature_1"] > df["feature_2"] + df["feature_3"], 1, 0
)
# Doesn't work
tmp = vnp.where(np.array(df["feature_0"]) + np.array(["feature_1"]) > np.array(df["feature_2"]) + np.array(df["feature_3"]), 1, 0)

Issue Description

We need to fix some of the numpy hooks we provide.

Expected Behavior

This should work - without ExperimentalNumPyAPI on, it should default to pandas, and with it on, it should work distributed.

Error Logs

Replace this line with the error backtrace (if applicable).

Installed Versions

Replace this line with the output of pd.show_versions()

@RehanSD RehanSD added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Jun 23, 2023
RehanSD added a commit to RehanSD/modin that referenced this issue Jun 23, 2023
Signed-off-by: Rehan Durrani <rehan@ponder.io>
@pyrito pyrito added P1 Important tasks that we should complete soon and removed Triage 🩹 Issues that need triage labels Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon
Projects
None yet
Development

No branches or pull requests

2 participants