Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: DataFrame.lookup #35224

Merged
merged 36 commits into from Sep 17, 2020
Merged

DEPR: DataFrame.lookup #35224

merged 36 commits into from Sep 17, 2020

Conversation

erfannariman
Copy link
Member

  • xref DEPR: let's deprecate #18262
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@erfannariman
Copy link
Member Author

erfannariman commented Jul 11, 2020

This probably needs more discussion, but thought I open this PR as a starting point.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to change the existing tests to assert that there is a FutureWarning otherwise tests will fail

pandas/core/frame.py Outdated Show resolved Hide resolved
@jreback jreback added Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 11, 2020
@erfannariman
Copy link
Member Author

In the xref you mention "Maybe a standalone function somewhere", is this still the idea or are we going to mention melt and loc? @jreback

@jreback
Copy link
Contributor

jreback commented Jul 11, 2020

no just put a nice example in the indexing docs - reference this in the deprecation
you can melt and use loc there

don’t need a standalone function

doc/source/user_guide/indexing.rst Outdated Show resolved Hide resolved
pandas/core/frame.py Outdated Show resolved Hide resolved
@pep8speaks
Copy link

pep8speaks commented Jul 15, 2020

Hello @erfannariman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-16 17:58:12 UTC

@erfannariman
Copy link
Member Author

@WillAyd I put Label-based "fancy indexing" on top, but it still fails with the same message. The message is not really informative, "warning before the summary" really does not make sense to me.

pandas/core/frame.py Outdated Show resolved Hide resolved
doc/source/user_guide/indexing.rst Show resolved Hide resolved
'B': [80, 55, 76, 67]})
df
melt = df.melt('col')
df['lookup'] = melt.query('col == variable')['value'].to_numpy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than query, use .loc here as its more idiomatic (and plus its a single statement). Do we actually need to convert to numpy? (e.g. to avoid alignment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used to_numpy is indeed for the index alignment. With .loc and without to_numpy, I would need reset_index. So that would look like:

melt = df.melt('col')
melt = melt.loc[melt['col'] == melt['variable'], 'value']
df['lookup'] = melt.reset_index(drop=True)

print(df)
  col     A   B  lookup
0   A  80.0  80    80.0
1   A  23.0  55    23.0
2   B   NaN  76    76.0
3   B  22.0  67    67.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one do you prefer? I can understand that this is more idiomatic, althought it's a bit more code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw now I'm typing it, I realize this method would fail if the index was not 0, .., n-1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a horrible choice as an alternative for lookup. Essentially, melt = df.melt('col') duplicates the data (not to mention the extra variable column) while one can resolve to numpy indexing.

I'm not sure what's the implementation of lookup, but it's a real request emerging from what I could see on SO. To me pivot and pivot_table are more redundant than lookup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quanghm my response is to the tone of the argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not averse to the function of lookup but it's not going to be a top level api

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not averse to the function of lookup but it's not going to be a top level api

Yet another reason why I don't go and make a pull request. I don't understand what's other than top level api that I can implement lookup functionality, and still don't understand performant.

@quanghm my response is to the tone of the argument

Can you please be more specific? The way I see it, @erfannariman was basically saying that unless I made a pull request, I cannot criticize the suggested alternative as horrible even when I spent time to analyze, test and show that it is, in fact, horrible. OK, if horrible is an offensive word, my apology, English is not my first language. Let me re-phrase it is very bad.

I do feel that my effort to contribute here is not valued. That was the best I can afford to better Pandas. Not everyone can afford to make a pull request, modify the code, run and pass all the unit tests.

Thanks for all the work maintaining and developing Pandas anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I learned from the other discussion, you are not the one that has a say on bringing back lookup, and those that do don't seem to be willing to review the issue.

Opening a ticket with everything you wrote here has more chance to get reviewed by the core devs than to write all this on a closed PR. That way the other devs can see it as well, since I am quite sure they are not seeing this whole discussion.

If you don't feel comfortable open an issue, please let me know and I wil open one, since Jeff mentioned he's open for a (discussion) for re-implementing lookup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I already opened the ticket since it's better to have the discussion there: see #40140 @quanghm @jreback

doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.1.0.rst Outdated Show resolved Hide resolved
pandas/core/frame.py Show resolved Hide resolved
pandas/core/frame.py Show resolved Hide resolved
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

doc/source/user_guide/indexing.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.1.2.rst Outdated Show resolved Hide resolved
@erfannariman erfannariman mentioned this pull request Sep 16, 2020
3 tasks
@jreback jreback added this to the 1.2 milestone Sep 17, 2020
@jreback jreback mentioned this pull request Sep 17, 2020
34 tasks
@jreback jreback merged commit 1a3a2c1 into pandas-dev:master Sep 17, 2020
@jreback
Copy link
Contributor

jreback commented Sep 17, 2020

thanks @erfannariman very nice!

@erfannariman erfannariman deleted the 18262-depr-lookup branch September 17, 2020 09:22
rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request Sep 17, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants