Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map() function removes columns when input_columns is not None #4858

Closed
pramodith opened this issue Aug 16, 2022 · 3 comments · Fixed by #4971 or #5008
Closed

map() function removes columns when input_columns is not None #4858

pramodith opened this issue Aug 16, 2022 · 3 comments · Fixed by #4971 or #5008
Labels
bug Something isn't working

Comments

@pramodith
Copy link

Describe the bug

The map function, removes features from the dataset that are not present in the input_columns list of columns, despite the columns being removed not mentioned in the remove_columns argument.

Steps to reproduce the bug

from datasets import Dataset
ds = Dataset.from_dict({"a" : [1,2,3],"b" : [0,1,0], "c" : [2,4,5]})

def double(x,y):
  x = x*2
  y = y*2
  return {"d" : x, "e" : y}

ds.map(double, input_columns=["a","c"])

Expected results

Dataset({
    features: ['a', 'b', 'c', 'd', 'e'],
    num_rows: 3
})

Actual results

Dataset({
    features: ['a', 'c', 'd', 'e'],
    num_rows: 3
})

In this specific example feature b should not be removed.

Environment info

  • datasets version: 2.4.0
  • Platform: linux (colab)
  • Python version: 3.7.13
  • PyArrow version: 6.0.1
@mariosasko
Copy link
Collaborator

Hi! Thanks for reporting! This looks like a bug. I've just opened a PR with the fix.

@pramodith
Copy link
Author

Awesome! Thank you. I'll close the issue once the PR gets merged. :-)

@albertvillanova
Copy link
Member

I guess we should reopen after the revert by:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants