Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix item check for pandas Series #12973

Closed

Conversation

timhoffm
Copy link
Member

PR Summary

Fixes #12971.

This was a problem specific to pandas Series, which has a None data.dtype.names but still is index-accessible and can use in. Note that numpy arrays cannot use in and therefore must check for data.dtype.names.

PR Checklist

  • Has Pytest style unit tests

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue is also fixed by #10928, which has been opened for more than 8 months, and which would need yet another rebase if this gets merged.

Please consider this a suggestion to reviewing and merging that PR instead of this one. However, if you can still ignore this review if you wish.

@timhoffm
Copy link
Member Author

timhoffm commented Dec 10, 2018

The same issue is also fixed by #10928, which has been opened for more than 8 months, and which would need yet another rebase if this gets merged.

@anntzer Just a short feedback:

I agree that we need more reviews to get open PRs down. But, no offence meant, you're not particularly the one leading the review/PR ratio. That's ok by me, but then you cannot expect to get all your PRs reviewed soon. On that background, I feel a little offended that you block this PR and moan about rebasing because nobody else reviewed #10928 (which by the way, I did).

@anntzer
Copy link
Contributor

anntzer commented Dec 11, 2018

I clarified my rejection above as a suggestion for reviewers to look at my PR instead to avoid duplication of work. As you have kindly already reviewed it, it is close to the finish line and it would seem less than optimal to review and merge piecemeal fixes.

However other devs should feel free to merge despite my rejection if reviewing #10928 is too difficult.

@anntzer
Copy link
Contributor

anntzer commented Dec 11, 2018

[As a tangent, you got me curious as to the number of merges per dev (which doesn't include the first review, so those who tend to review "first" get shortchanged, but the data is easily available from the git log). Looks like we really need to thank @jklymak for keeping things running :-)

$ git log --merges v3.0.0..HEAD --format=format:'%an|%s' | grep 'Merge pull request' | grep -v meeseeksmachine | cut -d'|' -f1 | sort | uniq -c
     42 Antony Lee
      9 Benjamin Root
     36 David Stansby
      1 Dietmar Schwertberger
      9 Elan Ernest
     36 Elliott Sales de Andrade
      2 Eric Firing
      2 hannah
    119 Jody Klymak
     34 Nelle Varoquaux
      6 Paul Hobson
      2 Paul Ivanov
     12 Ryan May
     64 Thomas A Caswell
     18 Tim Hoffmann

(excludes meeseeksdevs backports)]

@jklymak
Copy link
Member

jklymak commented Dec 11, 2018

Ha, that’s just because I’m more aggressive and trust more knowledgeable folks. Those first reviews really matter...

@dstansby
Copy link
Member

Would be nice to have a "higher level" test, ie. passing a pandas Series as the data kwarg to ax.plot. Otherwise 👍

re. PRs, personally I find it exponentially harder to review PRs with large diffs, especially if they include large reworks of code.

@anntzer
Copy link
Contributor

anntzer commented Dec 14, 2018

Not to derail the review of this PR, but the reason why #10928 is large is because it replaces a complex, 522-line implementation by a simpler 271-line implementation, by getting rid of overly general functionality. Admittedly there's 143 lines that come from deleting tests for unused functionality, but even ignoring these it's still 30% shorter (and it's not clear how that could have been broken into smaller chunks).

It's not as if that PR was implementing some intricate new functionality, it's just making code hopefully more readable.

@timhoffm
Copy link
Member Author

timhoffm commented Dec 15, 2018

OT: PR complexity

@dstansby @anntzer I agree with both of you. Long PR become indeed exponentially harder to review. And likewise the motivation to review drops (at least for in my case 😃). Therefore we should really aim at not too long PRs.

OTOH, sometimes you need larger changes, as in #10928. In these cases I would really try to minimize the changes. In particular, I‘d not include non-essential changes like import cleanups, formatting, list comprehensions, (unrelated) docstring updates, removing unnecessary kwargs. These are trivial to review on their own, but make an already complex PR much harder to overview. Moving them to separate PRs really helps.

While a certain complexity is inherent to a PR, I think it‘s really worth spending some extra attention on keeping longer PRs as simple as possible. But overall, I find the PRs to matplotlib mostly reasonable-sized. We‘re already not too bad at this 😃.

@timhoffm timhoffm added this to the v3.0.3 milestone Dec 22, 2018
@timhoffm timhoffm added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Dec 22, 2018
@timhoffm
Copy link
Member Author

Marking as release critical as it fixes a recently introduced bug. This or #10928 (which is preferable the long-term solution but still needs a second review) should go into the next patch release.

@timhoffm
Copy link
Member Author

Was about to close this as #10928 is merged now. However, #10928 is targeted at 3.1. should I Rebase this to 3.0.x so that #12971 is fixed in 3.0.3?

@tacaswell tacaswell modified the milestones: v3.0.3, v3.1 Dec 25, 2018
@tacaswell
Copy link
Member

This is a regression from 2.2 in 3.0 so fixing it in 3.0.3 is reasonable, however the full fix is to big to backport and I am not sure that it is worth the complexity of backporting a more limited fix and then dealing with the merge conflicts. I think that this is a pretty niche use of this functionality.

I am happy with milestoning all of this to 3.1, but am also OK if others disagree about the importance and/or effort involved and want to fix it for 3.0.3

I'm also not wild about checking the __name__ value....

@timhoffm timhoffm changed the base branch from master to v3.0.x December 25, 2018 19:58
@timhoffm timhoffm changed the base branch from v3.0.x to master December 25, 2018 19:58
@timhoffm
Copy link
Member Author

Closing as #10928 fixes the issue for 3.1. Please feel free to ping me if you think we should fix the issue for 3.0.3 (After all, I introduced the regression in 3.0 and I would work out a custom fix for 3.0.3 if somebody feels it's important to fix this sooner than 3.1).

@timhoffm timhoffm closed this Dec 25, 2018
@timhoffm timhoffm deleted the fix-item-check-for-pandas-series branch June 10, 2022 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants