Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: other object type check in Series/DataFrame.equals #34402

Merged
merged 9 commits into from
Jul 15, 2020

Conversation

jorisvandenbossche
Copy link
Member

xref geopandas/geopandas#1420

First, this PR is fixing the "bug" that we shouldn't rely on _constructor being a class that can be used in isinstance (see #32638 for the general discussion about this, this is the only place in our code where _constructor is used like this, AFAIK).
And even if _constructor would be a class, it wouldn't necessarily be the correct class to check with (or not more correct than type(self))

But, so this also brings up the API question: what are the "requirements" we put on the type of other ? Should it be the same type? (as then could also change it to if not type(self) is type(other): ...)
Or is a subclass sufficient (with isinstance) ?
The problem with an isinstance checks with subclasses is that then the order matters (eg subdf.equals(df) would not necessarily give the same answer as df.equals(subdf))

@TomAugspurger
Copy link
Contributor

I vaguely recall having a similar discussion on this recently, but I don't recall the conclusion. Perhaps it was on the PR changing .equals.

Checking that the type matches in .equals is probably safest.

Side-note: we should consider changing DataFrame._constructor to DataFrame.__init__ to ensure this doesn't happen again.

@jorisvandenbossche
Copy link
Member Author

Yes, we recently had the discussion about whether the dtype should be equal -> #33940 (which also touched on the subclasses)

I am fine with exact type checking. I would also be OK with instance check but then in both directions (isinstance(self, type(other)) or isinstance(other, type(self))), so subclasses work in both cases

we should consider changing DataFrame._constructor to DataFrame.init to ensure this doesn't happen again.

That's would actually be fine for me (geopandas is also a good test case ;), we just didn't have any "equals" tests).
I don't think there should be any implications of changing it to DataFrame.__init__?

@jreback
Copy link
Contributor

jreback commented May 27, 2020

is there a test that triggers this (so we don't regress)

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label May 27, 2020
@jbrockmendel
Copy link
Member

makes sense

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel there are multiple options being discussed, so can you be more explicit in what makes sense to you?

@jorisvandenbossche
Copy link
Member Author

For example, do we want exact type equality, or are subclasses fine to be considered equal?

@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone May 28, 2020
@jorisvandenbossche jorisvandenbossche removed the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label May 28, 2020
@jbrockmendel
Copy link
Member

there are multiple options being discussed, so can you be more explicit in what makes sense to you?

I was referring to the edit in the PR. Haven't formed a strong opinion otherwise.

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label May 28, 2020
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented May 29, 2020

@jreback since you keep adding back the Reshaping label, what has the equals method to do with Reshaping?
(not that this label is very important, but I don't see the link, so just wondering ;))

@jorisvandenbossche
Copy link
Member Author

Checking that the type matches in .equals is probably safest.

For now, I went with isinstance (but in both directions), as that is actually backwards compatible.
Rigth now series.equals(subclassed_series) passes, so doing an exact type match would break that.
(now I am also fine with seeing that as a bug fix, though, and require the exact type match)

@jreback
Copy link
Contributor

jreback commented May 29, 2020

why do u keep taking it off? this doesn’t have any labels and is weird otherwise

@jorisvandenbossche jorisvandenbossche added API Design and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 30, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a whatsnew note, yes?


class MySeries(pd.Series):
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put this in tests/series/test_sublcass and similar for frame/test_sublcass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a test about the subclass working correctly (anything related to the subclassing machinery, eg correct use of _constructor to preserve the subclass in operations etc), but only about our own equals method accepting subclasses, so I think the test rather fits here with the other equals tests.

But I can certainly use the tm.SubclassedSeries helper to avoid creating a dummy class here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this belongs with the rest of the tests & needs for DataFrame subclass as well. having 2 places is just very friendly to future folks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having 2 places where equals is tested is equally unfriendly I think ..
Both ways have some aspect of the tests splitted in two places, and as mentioned above, this test is actually about our equals implementation, not about the behaviour of a subclassing mechanism, and thus more belongs here IMO.

Will add dataframe to the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls just move this, ALL subclass tests are in the same file. This is breaking the pattern for no reason.

@jorisvandenbossche
Copy link
Member Author

All good here?

@jreback
Copy link
Contributor

jreback commented Jul 3, 2020

no i asked to move the test quite a while ago

@TomAugspurger
Copy link
Contributor

Fixed the merge conflict.

@jreback jreback merged commit a1e028f into pandas-dev:master Jul 15, 2020
@jreback
Copy link
Contributor

jreback commented Jul 15, 2020

thanks @jorisvandenbossche

fangchenli pushed a commit to fangchenli/pandas that referenced this pull request Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants