Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add 'is_' method to Index for identity checks #4909

Merged
merged 1 commit into from Sep 22, 2013

Conversation

@jtratner
Copy link
Contributor

commented Sep 21, 2013

Closes #4859.

Adds a method that can quickly tell whether something is a view or not
(in the way that actually matters). Also changes checks in equals() to
use is_ rather than is.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2013

worth it to push these into one of the mixin classes so that could be used on a series/frame? I guess in that case easier just to do if df1 is df2 (as the only way they will be equal if they are actually the same object with 2 diff references)

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2013

@jreback it's really easy to do with index, because they always go through __new__ or __array_finalize__ and they aren't mutable, so it's very simple to get working. Series/Frame are mutable and don't make sense for this...too likely to have an edge case. Could do it with blockmanager (maybe) but I don't think it's worth it for speed boost.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2013

yep....as I was writing I was like...naw!

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2013

One upside to this: makes equals() and identical() much faster when you're comparing and it's the same object or a view.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2013

so we need identical then? which is equiv to equals & is_?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2013

I guess though could be identical but not a view...duh.....hmm

@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 21, 2013

vbench results:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
groupby_series_simple_cython                 |   4.3203 |   4.3933 |   0.9834 |
groupby_multi_cython                         |  14.0613 |  14.2470 |   0.9870 |
groupby_multi_different_functions            |  11.1340 |  11.2560 |   0.9892 |
groupby_frame_singlekey_integer              |   1.5600 |   1.5720 |   0.9924 |
groupby_multi_size                           |  24.0187 |  24.1320 |   0.9953 |
groupby_multi_different_numpy_functions      |   9.9383 |   9.9711 |   0.9967 |
groupby_multi_python                         | 115.3820 | 115.5843 |   0.9983 |
groupby_simple_compress_timing               |  20.6167 |  20.5903 |   1.0013 |
groupby_indices                              |   5.1444 |   5.1357 |   1.0017 |
groupby_last_float32                         |   2.7974 |   2.7874 |   1.0036 |
groupby_frame_median                         |   6.6326 |   6.5880 |   1.0068 |
groupby_multi_series_op                      |  12.7397 |  12.6499 |   1.0071 |
groupby_first_float32                        |   2.6823 |   2.6610 |   1.0080 |
groupby_last                                 |   2.8450 |   2.8173 |   1.0098 |
groupby_sum_booleans                         |   0.8566 |   0.8457 |   1.0130 |
groupby_first                                |   2.7236 |   2.6850 |   1.0144 |
groupby_pivot_table                          |  15.4137 |  15.1213 |   1.0193 |
groupby_transform                            | 127.4860 | 123.4310 |   1.0329 |
groupby_apply_dict_return                    |  29.0103 |  27.1187 |   1.0698 |
groupbym_frame_apply                         |  55.3214 |  45.1977 |   1.2240 |
groupby_frame_apply_overhead                 |  10.6590 |   8.5543 |   1.2460 |
groupby_frame_cython_many_columns            |   3.5886 |   2.7813 |   1.2903 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 21, 2013

run with ./test_perf.sh -b master -t HEAD --repeats 5 --hrepeats 10 --burnin 10 -r 'groupby'

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2013

So mostly okay? I can't really tell. We can squeeze a bit more performance out by removing the function call, so the line would be:

if self.index._id is res.index._id

that said, could be extra overhead from view creation too. What do you think?

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2013

@jreback also, for most intents and purposes, it really doesn't matter if two indices have the same names or not, we just want to know if their labels are the same. And that's pretty much the entire function of is_, to allow for more views everywhere without totally sacrificing performance/equality testing.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2013

@jtratner yep...this all looks fine...perf is about the same (you might need a very specific test to pick up if you actually changing something)

@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 21, 2013

i'll run vbench one more time just to see if anything changes drastically ... biggest hit is 10 ms...

@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 21, 2013

orm aybe not if @jreback thinks its okay....honestly i feel unsure whenever vbench is not less than about 1.15 ish...i have no idea of the variability of the timings

@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 21, 2013

end of vbench:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_large_lookup_value                |   0.0246 |   0.0220 |   1.1191 |
join_dataframe_index_single_key_small        |   4.8723 |   4.3330 |   1.1245 |
timeseries_timestamp_tzinfo_cons             |   0.0126 |   0.0110 |   1.1439 |
append_frame_single_homogenous               |   0.2307 |   0.2000 |   1.1538 |
mask_bools                                   |  13.8144 |  11.9720 |   1.1539 |
frame_xs_col                                 |   0.0290 |   0.0250 |   1.1624 |
dti_reset_index                              |   0.1973 |   0.1686 |   1.1701 |
reindex_multiindex                           |   1.0610 |   0.9023 |   1.1758 |
reshape_stack_simple                         |   1.7027 |   1.4450 |   1.1784 |
groupbym_frame_apply                         |  55.8174 |  44.3370 |   1.2589 |
groupby_frame_apply_overhead                 |  10.6933 |   8.4824 |   1.2607 |
frame_reindex_both_axes                      |  17.5817 |  13.7600 |   1.2777 |
frame_fillna_many_columns_pad                |  10.9213 |   6.3963 |   1.7074 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2013

that's a weird vbench...the frame_fillna_many_columns_pad.

@jtratner

This comment has been minimized.

Copy link
Contributor Author

commented Sep 21, 2013

this works, but if others could take a look (@jreback @cpcloud) and see whether this is useful. Without this, not sure how to do #4202 effectively.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2013

looks fine to me

ENH: Add 'is_' method to Index for identity checks
Adds a method that can quickly tell whether something is a view or not
(in the way that actually matters).

CLN: change 'is' check to 'is_' check in 'equals()'
jtratner added a commit that referenced this pull request Sep 22, 2013
Merge pull request #4909 from jtratner/add-is_-method-to-index
ENH: Add 'is_' method to Index for identity checks

@jtratner jtratner merged commit ebfb4c8 into pandas-dev:master Sep 22, 2013

@jtratner jtratner deleted the jtratner:add-is_-method-to-index branch Sep 22, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.