Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX use selected_obj rather the obj throughout groupby #6570

Merged
merged 2 commits into from
Mar 9, 2014

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Mar 7, 2014

fixes many parts of for #5264... Changes lots of self.obj to self._selected_obj.

probably should vbench before merging?

cc @jreback @TomAugspurger

@jreback jreback added this to the 0.14.0 milestone Mar 7, 2014
@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

Ah, something subtle is that df.groupby(as_index=False)[col] is a DataFrameGroupby, hence the need for _obj_with_exclusions....

@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

This should pass now, will add test iterating over those in 5264 and see what happens.

# TODO check groupby with > 1 col ?

methods = ['count', 'corr', 'cummax', 'cummin',
'cumprod', 'describe', 'rank']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes for these (at least)

@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

Passes travis, and I think this probably fixes for resample and maybe a few others too. Need to add some tests later.

Can't get vbench working on this machine, perf change shouldn't be too bad...

@hayd
Copy link
Contributor Author

hayd commented Mar 8, 2014

@jreback time's look ok... (tests not that consistent), what do you think?

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
groupby_first                                |   3.2577 |   3.5373 |   0.9209 |
groupby_frame_cython_many_columns            |   2.8653 |   3.0453 |   0.9409 |
groupby_first_float32                        |   3.2770 |   3.4743 |   0.9432 |
groupby_indices                              |   7.7537 |   8.2003 |   0.9455 |
groupby_frame_singlekey_integer              |   2.0133 |   2.1056 |   0.9561 |
groupby_multi_size                           |  22.1736 |  22.8753 |   0.9693 |
groupby_last_float32                         |   3.5443 |   3.6490 |   0.9713 |
groupby_frame_median                         |   7.4780 |   7.6803 |   0.9737 |
groupby_multi_python                         | 113.6327 | 115.0453 |   0.9877 |
groupby_multi_cython                         |  15.5954 |  15.7886 |   0.9878 |
groupby_pivot_table                          |  16.5450 |  16.7054 |   0.9904 |
groupby_transform                            | 183.3767 | 184.9531 |   0.9915 |
groupby_sum_booleans                         |   0.9657 |   0.9660 |   0.9997 |
groupby_frame_apply_overhead                 |   8.2910 |   8.2756 |   1.0019 |
groupby_multi_different_functions            |  11.9800 |  11.9527 |   1.0023 |
groupby_apply_dict_return                    |  29.5260 |  29.4273 |   1.0034 |
groupby_frame_apply                          |  37.7584 |  37.3343 |   1.0114 |
groupby_multi_different_numpy_functions      |  10.9617 |  10.8347 |   1.0117 |
groupby_last                                 |   3.5717 |   3.5110 |   1.0173 |
groupby_simple_compress_timing               |  37.7727 |  37.1196 |   1.0176 |
groupby_multi_series_op                      |  12.7730 |  12.4243 |   1.0281 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |

@jreback
Copy link
Contributor

jreback commented Mar 8, 2014

these r fine
their is some variability in these particular tests

@hayd
Copy link
Contributor Author

hayd commented Mar 8, 2014

I've included the other tests, as I realised that quite a few other functions use groupyby under the hood, quite a bit of fluctuation here...

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
reindex_fillna_backfill_float32              |   0.3073 |   0.4017 |   0.7651 |
join_dataframe_index_single_key_bigger       |   6.0960 |   7.2437 |   0.8416 |
eval_frame_mult_python_one_thread            |  13.8293 |  15.7553 |   0.8778 |
join_dataframe_index_single_key_small        |   5.6837 |   6.3853 |   0.8901 |
frame_xs_col                                 |   0.0257 |   0.0287 |   0.8947 |
stat_ops_level_frame_sum_multiple            |   7.4847 |   8.3333 |   0.8982 |
concat_empty_frames2                         |   0.6217 |   0.6877 |   0.9041 |
join_dataframe_integer_2key                  |   5.0010 |   5.5126 |   0.9072 |
append_frame_single_homogenous               |   0.7333 |   0.8030 |   0.9132 |
frame_dtypes                                 |   0.1887 |   0.2043 |   0.9234 |
eval_frame_chained_cmp_all_threads           |  22.3497 |  24.1364 |   0.9260 |
frame_ctor_dtindex_BYearEnd(1)               |   0.3850 |   0.4146 |   0.9287 |
packers_read_pack                            |   2.1021 |   2.2566 |   0.9315 |
concat_empty_frames1                         |   0.5993 |   0.6386 |   0.9384 |
merge_2intkey_sort                           |  31.4830 |  33.4563 |   0.9410 |
append_frame_single_mixed                    |   1.0603 |   1.1230 |   0.9442 |
query_store_table                            |   4.6051 |   4.8773 |   0.9442 |
write_store_table_panel                      |  36.7020 |  38.8227 |   0.9454 |
groupby_last_float32                         |   3.5290 |   3.7280 |   0.9466 |
groupby_first_float32                        |   3.3860 |   3.5716 |   0.9480 |
merge_2intkey_nosort                         |  14.0420 |  14.7884 |   0.9495 |
series_align_int64_index                     |  25.9810 |  27.2744 |   0.9526 |
groupby_frame_apply_overhead                 |   7.7263 |   8.1030 |   0.9535 |
concat_series_axis1                          |  77.0020 |  80.7457 |   0.9536 |
join_dataframe_integer_key                   |   1.6843 |   1.7617 |   0.9561 |
index_datetime_intersection                  |   8.5763 |   8.9640 |   0.9568 |
concat_small_frames                          |  14.3650 |  15.0017 |   0.9576 |
frame_ctor_dtindex_BYearBegin(1)             |   0.3667 |   0.3819 |   0.9600 |
indexing_dataframe_boolean_rows              |   0.3690 |   0.3833 |   0.9627 |
dataframe_resample_mean_numpy                |   1.9047 |   1.9763 |   0.9638 |
read_store_table_panel                       |  20.8527 |  21.6309 |   0.9640 |
strings_strip                                |   3.2073 |   3.3240 |   0.9649 |
strings_replace                              |  11.5170 |  11.9354 |   0.9649 |
join_dataframe_index_multi                   |  17.0101 |  17.6187 |   0.9655 |
datetimeindex_normalize                      |   2.3206 |   2.3913 |   0.9704 |
frame_iloc_dups                              |   0.2530 |   0.2607 |   0.9704 |
eval_frame_add_all_threads                   |   9.3187 |   9.5979 |   0.9709 |
dataframe_resample_mean_string               |   1.8477 |   1.9017 |   0.9716 |
frame_constructor_ndarray                    |   0.0467 |   0.0480 |   0.9719 |
groupby_indices                              |   6.7987 |   6.9757 |   0.9746 |
sparse_frame_constructor                     |   8.4037 |   8.6173 |   0.9752 |
stats_rolling_mean                           |   0.8194 |   0.8397 |   0.9758 |
panel_from_dict_two_different_indexes        |  48.1570 |  49.3080 |   0.9767 |
groupby_multi_cython                         |  13.9600 |  14.2783 |   0.9777 |
reshape_pivot_time_series                    | 147.9640 | 151.1617 |   0.9788 |
series_drop_duplicates_string                |   0.4133 |   0.4220 |   0.9795 |
timeseries_asof_single                       |   0.0316 |   0.0323 |   0.9803 |
reindex_fillna_pad                           |   0.3537 |   0.3606 |   0.9806 |
series_value_counts_strings                  |   4.3784 |   4.4634 |   0.9810 |
series_getitem_pos_slice                     |   0.0513 |   0.0523 |   0.9818 |
datetime_index_intersection                  |   0.2553 |   0.2600 |   0.9820 |
frame_ctor_dtindex_Second(2)                 |   0.8634 |   0.8790 |   0.9823 |
timestamp_ops_diff1                          |   6.0760 |   6.1820 |   0.9829 |
frame_from_series                            |   0.0554 |   0.0563 |   0.9831 |
frame_iloc_big                               |   0.1734 |   0.1764 |   0.9833 |
stat_ops_level_series_sum                    |   2.4974 |   2.5384 |   0.9838 |
frame_boolean_row_select                     |   0.2660 |   0.2703 |   0.9841 |
ctor_index_array_string                      |   0.0197 |   0.0200 |   0.9841 |
indexing_dataframe_boolean                   |   7.3300 |   7.4470 |   0.9843 |
series_drop_duplicates_int                   |   0.6216 |   0.6307 |   0.9856 |
frame_ctor_dtindex_Day(2)                    |   0.8340 |   0.8460 |   0.9858 |
join_dataframe_index_single_key_bigger_sort  |  14.0401 |  14.2380 |   0.9861 |
index_str_boolean_series_indexer             |  13.5016 |  13.6890 |   0.9863 |
datetimeindex_unique                         |   0.1007 |   0.1020 |   0.9875 |
frame_reindex_axis1                          | 559.1563 | 566.0380 |   0.9878 |
stat_ops_frame_sum_float_axis_0              |   0.5447 |   0.5514 |   0.9879 |
eval_frame_add_python                        |  13.3873 |  13.5450 |   0.9884 |
frame_ctor_dtindex_QuarterBegin(1)           |   1.1160 |   1.1287 |   0.9887 |
index_datetime_union                         |   8.7073 |   8.8010 |   0.9894 |
index_str_boolean_indexer                    |  13.5457 |  13.6871 |   0.9897 |
read_store_table_wide                        |  14.0137 |  14.1594 |   0.9897 |
dataframe_resample_max_string                |   1.5150 |   1.5279 |   0.9915 |
frame_ctor_dtindex_Week(2)                   |   0.9410 |   0.9487 |   0.9919 |
frame_assign_timeseries_index                |   0.5864 |   0.5910 |   0.9921 |
groupby_frame_singlekey_integer              |   2.0166 |   2.0323 |   0.9923 |
frame_getitem_single_column2                 |  19.4636 |  19.6087 |   0.9926 |
frame_ctor_list_of_dict                      |  67.0710 |  67.5426 |   0.9930 |
series_value_counts_int64                    |   1.8243 |   1.8360 |   0.9936 |
read_parse_dates_iso8601                     |   1.1307 |   1.1373 |   0.9941 |
reindex_frame_level_reindex                  |   0.5380 |   0.5410 |   0.9946 |
series_align_left_monotonic                  |  12.6847 |  12.7363 |   0.9960 |
frame_loc_dups                               |   0.6750 |   0.6777 |   0.9960 |
frame_dropna_axis1_any                       | 153.7097 | 154.3113 |   0.9961 |
write_store_table_wide                       |  85.5070 |  85.8370 |   0.9962 |
frame_to_string_floats                       |  28.9767 |  29.0396 |   0.9978 |
read_csv_infer_datetime_format_custom        |  11.5033 |  11.5274 |   0.9979 |
stats_rank_pct_average                       |  27.3020 |  27.3527 |   0.9981 |
frame_fillna_many_columns_pad                |  12.6944 |  12.7000 |   0.9996 |
datetime_index_union                         |   0.0650 |   0.0650 |   1.0000 |
frame_get_dtype_counts                       |   0.0837 |   0.0837 |   1.0000 |
series_constructor_ndarray                   |   0.0153 |   0.0153 |   1.0000 |
frame_dropna_axis1_all                       | 286.1723 | 286.1033 |   1.0002 |
frame_add_no_ne                              |   4.4146 |   4.4113 |   1.0008 |
frame_isnull                                 |   1.1274 |   1.1264 |   1.0009 |
strings_contains_few                         |   4.3983 |   4.3920 |   1.0014 |
frame_ctor_dtindex_MonthEnd(2)               |   1.0953 |   1.0937 |   1.0015 |
frame_ctor_dtindex_Hour(2)                   |   0.8470 |   0.8454 |   1.0020 |
frame_ctor_dtindex_Micro(2)                  |   0.8173 |   0.8150 |   1.0028 |
strings_count                                |   5.2283 |   5.2117 |   1.0032 |
stat_ops_frame_sum_float_axis_1              |   0.7047 |   0.7024 |   1.0033 |
frame_ctor_dtindex_BusinessDay(1)            |   1.0670 |   1.0633 |   1.0034 |
frame_mult_no_ne                             |   4.4313 |   4.4160 |   1.0035 |
read_table_multiple_date_baseline            |  82.3259 |  82.0254 |   1.0037 |
frame_ctor_dtindex_CustomBusinessDay(1)      |   1.3784 |   1.3733 |   1.0037 |
frame_mult                                   |   4.3753 |   4.3573 |   1.0041 |
datetimeindex_infer_dst                      |   3.1750 |   3.1613 |   1.0043 |
packers_write_hdf_store                      |   7.0310 |   6.9989 |   1.0046 |
frame_ctor_dtindex_BQuarterEnd(1)            |   1.1430 |   1.1377 |   1.0046 |
series_getitem_label_slice                   |   0.0567 |   0.0563 |   1.0056 |
frame_ctor_dtindex_Hour(1)                   |   0.8333 |   0.8283 |   1.0059 |
frame_sort_index_by_columns                  |  33.8724 |  33.6493 |   1.0066 |
frame_ctor_dtindex_QuarterBegin(2)           |   0.3760 |   0.3734 |   1.0070 |
eval_frame_add_python_one_thread             |  13.6666 |  13.5690 |   1.0072 |
dtindex_from_series_ctor                     |   0.0110 |   0.0110 |   1.0072 |
write_store_table                            |  27.8997 |  27.6983 |   1.0073 |
frame_get_numeric_data                       |   0.0846 |   0.0840 |   1.0076 |
dti_reset_index                              |   0.2197 |   0.2180 |   1.0077 |
read_store_table                             |   1.9330 |   1.9183 |   1.0077 |
frame_ctor_dtindex_Milli(2)                  |   0.8323 |   0.8260 |   1.0077 |
read_store                                   |   1.6093 |   1.5970 |   1.0077 |
frame_add_st                                 |   4.3030 |   4.2697 |   1.0078 |
frame_ctor_dtindex_MonthBegin(1)             |   1.0930 |   1.0840 |   1.0083 |
indexing_dataframe_boolean_no_ne             |  75.3783 |  74.7070 |   1.0090 |
groupby_sum_booleans                         |   0.9833 |   0.9743 |   1.0092 |
strings_match                                |   4.9327 |   4.8857 |   1.0096 |
frame_drop_duplicates                        |  13.8110 |  13.6790 |   1.0096 |
frame_ctor_dtindex_BDay(1)                   |   1.0803 |   1.0697 |   1.0099 |
frame_ctor_dtindex_BusinessDay(2)            |   1.1240 |   1.1126 |   1.0102 |
eval_frame_and_all_threads                   |  28.2520 |  27.9600 |   1.0104 |
groupby_frame_apply                          |  38.6966 |  38.2837 |   1.0108 |
stat_ops_level_series_sum_multiple           |   6.7636 |   6.6894 |   1.0111 |
indexing_dataframe_boolean_rows_object       |   0.5737 |   0.5674 |   1.0112 |
frame_ctor_dtindex_BDay(2)                   |   1.1063 |   1.0936 |   1.0116 |
sort_level_one                               |   4.5073 |   4.4546 |   1.0118 |
strings_repeat                               |   3.7710 |   3.7266 |   1.0119 |
stats_corr_spearman                          |  78.5230 |  77.5753 |   1.0122 |
stats_rank_average_int                       |  20.8476 |  20.5924 |   1.0124 |
query_store_table_wide                       |  10.0893 |   9.9653 |   1.0124 |
stat_ops_frame_mean_int_axis_0               |   0.5810 |   0.5737 |   1.0127 |
frame_ctor_dtindex_Nano(2)                   |   1.1767 |   1.1610 |   1.0135 |
strings_extract                              |  34.8640 |  34.3997 |   1.0135 |
frame_add                                    |   4.4823 |   4.4223 |   1.0136 |
frame_apply_user_func                        |  90.4700 |  89.1867 |   1.0144 |
frame_repr_wide                              |  13.7884 |  13.5927 |   1.0144 |
index_int64_intersection                     |  22.9560 |  22.6297 |   1.0144 |
frame_mult_st                                |   4.3329 |   4.2710 |   1.0145 |
frame_ctor_dtindex_BMonthBegin(2)            |   1.1167 |   1.1000 |   1.0152 |
frame_ctor_dtindex_Day(1)                    |   0.8296 |   0.8167 |   1.0159 |
frame_ctor_dtindex_BQuarterBegin(1)          |   1.1920 |   1.1730 |   1.0162 |
index_from_series_ctor                       |   0.0199 |   0.0196 |   1.0162 |
frame_repr_tall                              |  19.8793 |  19.5593 |   1.0164 |
frame_float_equal                            |   3.3473 |   3.2920 |   1.0168 |
period_setitem                               |  97.4634 |  95.8176 |   1.0172 |
frame_ctor_dtindex_MonthEnd(1)               |   1.1137 |   1.0947 |   1.0174 |
frame_reindex_columns                        |   0.3694 |   0.3630 |   1.0175 |
frame_ctor_dtindex_CustomBusinessDay(2)      |   1.3810 |   1.3570 |   1.0177 |
series_align_irregular_string                |  52.6010 |  51.6787 |   1.0178 |
panel_from_dict_equiv_indexes                |  33.0609 |  32.4710 |   1.0182 |
indexing_dataframe_boolean_st                |   7.3637 |   7.2320 |   1.0182 |
stats_rank2d_axis1_average                   |  10.4193 |  10.2300 |   1.0185 |
frame_mask_floats                            |   4.6980 |   4.6086 |   1.0194 |
frame_ctor_dtindex_MonthBegin(2)             |   1.1173 |   1.0960 |   1.0194 |
frame_to_html_mixed                          | 231.4520 | 226.9681 |   1.0198 |
reshape_unstack_simple                       |   3.1296 |   3.0687 |   1.0199 |
replace_replacena                            |   1.2457 |   1.2213 |   1.0200 |
strings_lstrip                               |   2.9523 |   2.8944 |   1.0200 |
read_csv_vb                                  |  18.4686 |  18.1007 |   1.0203 |
stats_rank2d_axis0_average                   |  19.2573 |  18.8727 |   1.0204 |
timeseries_asof_nan                          |   7.0647 |   6.9167 |   1.0214 |
eval_frame_add_one_thread                    |   9.2577 |   9.0613 |   1.0217 |
match_strings                                |   0.3747 |   0.3667 |   1.0219 |
frame_apply_lambda_mean                      |   3.1657 |   3.0974 |   1.0221 |
frame_reindex_axis0                          |  84.2364 |  82.4027 |   1.0223 |
frame_drop_duplicates_na                     |  13.7281 |  13.4270 |   1.0224 |
frame_ctor_dtindex_BQuarterBegin(2)          |   0.3627 |   0.3547 |   1.0226 |
panel_from_dict_same_index                   |  32.3587 |  31.6316 |   1.0230 |
groupby_transform                            | 185.9623 | 181.7700 |   1.0231 |
groupby_simple_compress_timing               |  31.1086 |  30.4040 |   1.0232 |
frame_html_repr_trunc_si                     |  29.9733 |  29.2943 |   1.0232 |
frame_dropna_axis0_any                       |  25.1024 |  24.5264 |   1.0235 |
frame_fillna_inplace                         |   9.8739 |   9.6467 |   1.0236 |
frame_ctor_nested_dict                       |  55.5983 |  54.3067 |   1.0238 |
timeseries_timestamp_tzinfo_cons             |   0.0137 |   0.0134 |   1.0238 |
groupby_multi_python                         | 118.0639 | 115.2973 |   1.0240 |
frame_nonunique_equal                        |   4.0549 |   3.9586 |   1.0243 |
frame_insert_500_columns_end                 |  93.2833 |  91.0653 |   1.0244 |
frame_ctor_dtindex_BMonthEnd(2)              |   1.1226 |   1.0959 |   1.0244 |
reshape_stack_simple                         |   3.1093 |   3.0353 |   1.0244 |
panel_from_dict_all_different_indexes        |  60.7297 |  59.2647 |   1.0247 |
series_ctor_from_dict                        |   2.1270 |   2.0754 |   1.0249 |
frame_dropna_axis0_all                       |  49.7097 |  48.4956 |   1.0250 |
timeseries_infer_freq                        |   6.5007 |   6.3416 |   1.0251 |
frame_ctor_dtindex_Week(1)                   |   0.9780 |   0.9540 |   1.0252 |
strings_join_split                           |  23.9020 |  23.3134 |   1.0252 |
frame_reindex_both_axes                      |  32.8187 |  31.9977 |   1.0257 |
dataframe_reindex                            |   0.3367 |   0.3283 |   1.0257 |
timestamp_series_compare                     |   2.4193 |   2.3584 |   1.0258 |
frame_ctor_dtindex_YearBegin(2)              |   0.3720 |   0.3626 |   1.0259 |
frame_interpolate                            |  81.1390 |  79.0920 |   1.0259 |
eval_frame_chained_cmp_python_one_thread     |  24.5217 |  23.8960 |   1.0262 |
stat_ops_series_std                          |   0.2073 |   0.2020 |   1.0264 |
frame_ctor_dtindex_Nano(1)                   |   1.2163 |   1.1850 |   1.0264 |
write_store_mixed                            |  12.9497 |  12.6146 |   1.0266 |
sparse_series_to_frame                       | 114.0147 | 111.0574 |   1.0266 |
timeseries_period_downsample_mean            |  10.6574 |  10.3773 |   1.0270 |
frame_insert_100_columns_begin               |  17.8320 |  17.3617 |   1.0271 |
lib_fast_zip_fillna                          |   9.8276 |   9.5673 |   1.0272 |
eval_frame_mult_all_threads                  |   9.3607 |   9.1113 |   1.0274 |
groupby_multi_size                           |  22.6943 |  22.0790 |   1.0279 |
read_csv_standard                            |  10.1573 |   9.8807 |   1.0280 |
strings_findall                              |   7.6796 |   7.4690 |   1.0282 |
packers_write_pack                           |   3.3550 |   3.2630 |   1.0282 |
frame_ctor_dtindex_BQuarterEnd(2)            |   0.3720 |   0.3617 |   1.0286 |
packers_write_json_date_index                |  29.9714 |  29.1359 |   1.0287 |
frame_ctor_dtindex_QuarterEnd(2)             |   0.3697 |   0.3594 |   1.0287 |
frame_ctor_dtindex_BMonthEnd(1)              |   1.1240 |   1.0920 |   1.0293 |
sort_level_zero                              |   4.6117 |   4.4806 |   1.0293 |
series_xs_mi_ix                              |   0.3023 |   0.2937 |   1.0295 |
strings_pad                                  |   3.2630 |   3.1679 |   1.0300 |
frame_ctor_dtindex_Second(1)                 |   0.8707 |   0.8450 |   1.0304 |
eval_frame_and_python_one_thread             |  42.2813 |  41.0320 |   1.0304 |
frame_multi_and_st                           |  31.9180 |  30.9737 |   1.0305 |
strings_startswith                           |   2.9363 |   2.8490 |   1.0306 |
read_table_multiple_date                     | 178.6166 | 173.2580 |   1.0309 |
index_int64_union                            |  71.9277 |  69.7620 |   1.0310 |
datetimeindex_add_offset                     |   0.1740 |   0.1687 |   1.0311 |
read_csv_comment2                            |  14.3396 |  13.8950 |   1.0320 |
frame_object_equal                           |   4.0793 |   3.9527 |   1.0320 |
reindex_daterange_backfill                   |   0.6760 |   0.6549 |   1.0322 |
frame_reindex_both_axes_ix                   |  33.0767 |  32.0460 |   1.0322 |
stat_ops_frame_mean_int_axis_1               |   0.7767 |   0.7513 |   1.0337 |
strings_contains_few_noregex                 |   2.0263 |   1.9600 |   1.0338 |
frame_ctor_dtindex_YearBegin(1)              |   0.3743 |   0.3620 |   1.0340 |
groupby_frame_cython_many_columns            |   2.9657 |   2.8679 |   1.0341 |
frame_ctor_dtindex_CDay(2)                   |   1.4203 |   1.3733 |   1.0343 |
dataframe_resample_min_numpy                 |   1.5817 |   1.5287 |   1.0347 |
packers_read_hdf_table                       |   7.7473 |   7.4873 |   1.0347 |
timeseries_add_irregular                     |  18.2220 |  17.6020 |   1.0352 |
packers_read_csv                             |  40.9100 |  39.5079 |   1.0355 |
groupby_apply_dict_return                    |  30.3966 |  29.3534 |   1.0355 |
groupby_first                                |   3.3657 |   3.2496 |   1.0357 |
frame_iteritems_cached                       |   0.4823 |   0.4656 |   1.0358 |
groupby_multi_different_functions            |  11.1043 |  10.7190 |   1.0360 |
frame_multi_and_no_ne                        |  79.8867 |  77.0377 |   1.0370 |
frame_html_repr_trunc_mi                     |  39.3070 |  37.8977 |   1.0372 |
frame_drop_dup_inplace                       |   2.5160 |   2.4254 |   1.0374 |
groupby_frame_median                         |   6.5947 |   6.3557 |   1.0376 |
packers_write_hdf_table                      |  23.0580 |  22.2193 |   1.0377 |
read_csv_thou_vb                             |  16.1163 |  15.5300 |   1.0378 |
frame_iteritems                              |  30.8327 |  29.7010 |   1.0381 |
timeseries_sort_index                        |  21.0130 |  20.2417 |   1.0381 |
frame_apply_np_mean                          |   3.2276 |   3.1090 |   1.0382 |
groupby_multi_different_numpy_functions      |  11.1723 |  10.7604 |   1.0383 |
read_store_table_mixed                       |   5.5440 |   5.3360 |   1.0390 |
frame_ctor_dtindex_BYearBegin(2)             |   0.3740 |   0.3599 |   1.0391 |
frame_ctor_nested_dict_int64                 |  73.9667 |  71.1737 |   1.0392 |
stats_rank_average                           |  25.1520 |  24.1880 |   1.0399 |
plot_timeseries_period                       |  45.6260 |  43.8566 |   1.0403 |
strings_center                               |   3.3693 |   3.2376 |   1.0407 |
series_timestamp_compare                     |   2.4350 |   2.3371 |   1.0419 |
frame_reindex_upcast                         |   9.8690 |   9.4709 |   1.0420 |
lib_fast_zip                                 |   7.3510 |   7.0523 |   1.0423 |
replace_fillna                               |   1.8071 |   1.7330 |   1.0427 |
frame_ctor_dtindex_CDay(1)                   |   1.4170 |   1.3583 |   1.0432 |
frame_apply_axis_1                           |  86.4031 |  82.8043 |   1.0435 |
frame_ctor_dtindex_BMonthBegin(1)            |   1.1510 |   1.1030 |   1.0435 |
strings_lower                                |   2.9651 |   2.8397 |   1.0441 |
write_store                                  |   5.6700 |   5.4234 |   1.0455 |
frame_ctor_dtindex_Micro(1)                  |   0.8437 |   0.8067 |   1.0458 |
dti_reset_index_tz                           |  10.9527 |  10.4727 |   1.0458 |
reindex_frame_level_align                    |   0.5894 |   0.5634 |   1.0461 |
read_store_mixed                             |   3.9710 |   3.7920 |   1.0472 |
frame_multi_and                              |  32.6580 |  31.1853 |   1.0472 |
packers_read_hdf_store                       |   4.0323 |   3.8427 |   1.0493 |
timestamp_ops_diff2                          |  19.6323 |  18.6980 |   1.0500 |
frame_drop_dup_na_inplace                    |   2.3150 |   2.2043 |   1.0502 |
strings_cat                                  |   0.7737 |   0.7366 |   1.0503 |
packers_write_json                           |  22.5223 |  21.4440 |   1.0503 |
frame_apply_ref_by_name                      |  11.9843 |  11.4083 |   1.0505 |
eval_frame_mult_one_thread                   |   9.6304 |   9.1613 |   1.0512 |
groupby_last                                 |   3.6607 |   3.4820 |   1.0513 |
strings_upper                                |   2.8846 |   2.7427 |   1.0518 |
frame_ctor_dtindex_Minute(2)                 |   0.8726 |   0.8293 |   1.0522 |
reindex_daterange_pad                        |   0.6627 |   0.6290 |   1.0536 |
frame_apply_pass_thru                        |   4.9489 |   4.6950 |   1.0541 |
groupby_pivot_table                          |  17.4827 |  16.5723 |   1.0549 |
groupby_multi_series_op                      |  13.1153 |  12.4290 |   1.0552 |
packers_write_csv                            | 494.1684 | 468.2906 |   1.0553 |
dataframe_resample_max_numpy                 |   1.5830 |   1.5000 |   1.0554 |
frame_interpolate_some_good_infer            |   3.4677 |   3.2847 |   1.0557 |
strings_contains_many_noregex                |   2.1470 |   2.0326 |   1.0563 |
write_store_table_dc                         | 109.4467 | 103.5527 |   1.0569 |
frame_to_csv_mixed                           | 255.4664 | 241.6294 |   1.0573 |
stat_ops_frame_mean_float_axis_0             |   0.5803 |   0.5483 |   1.0584 |
write_store_table_mixed                      |  32.8927 |  31.0657 |   1.0588 |
reindex_fillna_pad_float32                   |   0.3167 |   0.2990 |   1.0593 |
frame_fancy_lookup_all                       |  16.2671 |  15.3427 |   1.0602 |
packers_read_pickle                          |   0.5830 |   0.5493 |   1.0613 |
melt_dataframe                               |   1.9093 |   1.7963 |   1.0629 |
strings_contains_many                        |   4.5703 |   4.2963 |   1.0638 |
write_csv_standard                           |  38.7787 |  36.3503 |   1.0668 |
timeseries_asof                              |   7.4713 |   6.9981 |   1.0676 |
timeseries_large_lookup_value                |   0.0260 |   0.0243 |   1.0686 |
stat_ops_frame_mean_float_axis_1             |   0.7970 |   0.7433 |   1.0722 |
frame_getitem_single_column                  |  20.2910 |  18.9237 |   1.0723 |
reindex_multiindex                           |   1.0790 |   1.0063 |   1.0723 |
frame_ctor_dtindex_QuarterEnd(1)             |   1.1903 |   1.1094 |   1.0730 |
timeseries_timestamp_downsample_mean         |   4.4437 |   4.1410 |   1.0731 |
strings_slice                                |   2.4680 |   2.2980 |   1.0740 |
frame_to_csv_date_formatting                 |  14.4720 |  13.4203 |   1.0784 |
frame_fancy_lookup                           |   3.4383 |   3.1860 |   1.0792 |
timeseries_to_datetime_YYYYMMDD              |   8.1871 |   7.5747 |   1.0808 |
stat_ops_level_frame_sum                     |   3.3980 |   3.1357 |   1.0837 |
frame_ctor_dtindex_BYearEnd(2)               |   0.3980 |   0.3664 |   1.0863 |
stat_ops_frame_sum_int_axis_1                |   0.3734 |   0.3434 |   1.0872 |
read_csv_infer_datetime_format_iso8601       |   1.6987 |   1.5580 |   1.0903 |
strings_get                                  |   2.4343 |   2.2297 |   1.0918 |
frame_to_csv                                 | 123.1730 | 112.7717 |   1.0922 |
packers_write_pickle                         |   2.9413 |   2.6880 |   1.0942 |
read_csv_infer_datetime_format_ymd           |   1.9953 |   1.8170 |   1.0981 |
unstack_sparse_keyspace                      |   1.6857 |   1.5350 |   1.0982 |
strings_endswith                             |   3.1357 |   2.8520 |   1.0995 |
packers_read_json_date_index                 |  42.3477 |  38.5133 |   1.0996 |
frame_ctor_dtindex_Minute(1)                 |   0.9023 |   0.8203 |   1.1000 |
series_string_vector_slice                   | 164.0153 | 149.0080 |   1.1007 |
frame_mask_bools                             |  14.2420 |  12.8624 |   1.1073 |
packers_read_json                            |  43.5286 |  39.2803 |   1.1082 |
frame_interpolate_some_good                  |   2.0710 |   1.8647 |   1.1106 |
frame_ctor_dtindex_Milli(1)                  |   0.9027 |   0.8120 |   1.1117 |
strings_title                                |   3.5849 |   3.2060 |   1.1182 |
frame_ctor_dtindex_YearEnd(2)                |   0.4036 |   0.3610 |   1.1182 |
reindex_fillna_backfill                      |   0.4167 |   0.3714 |   1.1220 |
timeseries_slice_minutely                    |   0.0670 |   0.0597 |   1.1225 |
indexing_panel_subset                        |   1.0247 |   0.9090 |   1.1273 |
frame_ctor_dtindex_YearEnd(1)                |   0.4090 |   0.3620 |   1.1300 |
strings_rstrip                               |   3.1290 |   2.7680 |   1.1304 |
timeseries_to_datetime_iso8601               |   4.2797 |   3.7626 |   1.1374 |
strings_len                                  |   2.3407 |   2.0426 |   1.1459 |
dataframe_resample_min_string                |   1.7790 |   1.5510 |   1.1470 |
frame_to_csv2                                | 108.2560 |  93.9113 |   1.1527 |
frame_xs_mi_ix                               |   0.3090 |   0.2557 |   1.2086 |
frame_xs_row                                 |   0.0574 |   0.0450 |   1.2756 |
stat_ops_frame_sum_int_axis_0                |   0.6594 |   0.4434 |   1.4872 |
timeseries_1min_5min_mean                    |   1.1110 |   0.5723 |   1.9414 |
timeseries_1min_5min_ohlc                    |   1.1543 |   0.5876 |   1.9644 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@jreback
Copy link
Contributor

jreback commented Mar 8, 2014

maybe repeat and see if last 3 show up
if so then take a look

@hayd
Copy link
Contributor Author

hayd commented Mar 8, 2014

If you would, that'd be great!

groupby_first_float32                        |   3.7374 |   3.2806 |   1.1392 |
frame_boolean_row_select                     |   0.2897 |   0.2540 |   1.1405 |
timeseries_timestamp_downsample_mean         |   4.7660 |   4.0634 |   1.1729 |
timeseries_1min_5min_ohlc                    |   1.1263 |   0.5790 |   1.9454 |
timeseries_1min_5min_mean                    |   1.1007 |   0.5546 |   1.9845 |
packers_write_pickle                         |   6.5750 |   2.7454 |   2.3949 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |

@jreback
Copy link
Contributor

jreback commented Mar 8, 2014

Here's a patch to fix the perf issue; its that __getattr__ was trying to look up the cached properties in the object index (which is wrong, instead it should just rasie AttributeError if the cache is not yet created).

When I run this (even w/o this patch), their are some failing tests....?
(numpy 1.7.1 on 2.7)....odd

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index d418005..2f23935 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -208,6 +208,8 @@ class GroupBy(PandasObject):
         Number of groups
     """
     _apply_whitelist = _common_apply_whitelist
+    _internal_names = ['_cache']
+    _internal_names_set = set(_internal_names)

     def __init__(self, obj, keys=None, axis=0, level=None,
                  grouper=None, exclusions=None, selection=None, as_index=True,
@@ -288,10 +290,12 @@ class GroupBy(PandasObject):
         return sorted(set(self.obj._local_dir() + list(self._apply_whitelist)))

     def __getattr__(self, attr):
+        if attr in self._internal_names_set:
+            return object.__getattribute__(self, attr)
         if attr in self.obj:
             return self[attr]

-        if hasattr(self.obj, attr) and attr != '_cache':
+        if hasattr(self.obj, attr):
             return self._make_wrapper(attr)

         raise AttributeError("%r object has no attribute %r" %

@hayd
Copy link
Contributor Author

hayd commented Mar 8, 2014

Thanks, will apply this, and see what stuff is failing tonight... (weird travis wouldn't be picking them up :s ).

Will also add some more basic tests for some of the other methods in the checklist.

@jreback
Copy link
Contributor

jreback commented Mar 8, 2014

I had to rebase off master
I think this had some conflicts from the head/tail pr

@hayd
Copy link
Contributor Author

hayd commented Mar 9, 2014

@jreback weird, shouldn't be any conflicts (oh, yes there are, ok rebasing...)

@TomAugspurger Do you mind taking a look at this PR, I see you put some serious time in before trying to sort out this behaviour so would be great to hear what I'm missing (sorry if you were part way there / w duplicated effort here, I'd missed your previous discussion in #5264 until just now).

(I had example here with corr, but think it's not an issue)

@hayd
Copy link
Contributor Author

hayd commented Mar 9, 2014

rebasing, I bizarrely I get one failed test. doh, was me not building...

travis is being flaky.

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

rebase and I think this is good to go, pls squash a bit too

I assume release notes you will do all at once (ok..)

@hayd
Copy link
Contributor Author

hayd commented Mar 9, 2014

reordered commits and squished (and passes travis). Will add to release when we close main issue.

hayd added a commit that referenced this pull request Mar 9, 2014
FIX use selected_obj rather the obj throughout groupby
@hayd hayd merged commit 4119e04 into pandas-dev:master Mar 9, 2014
@hayd hayd deleted the groupby_selected_obj branch March 9, 2014 19:09
gouthambs pushed a commit to gouthambs/pandas that referenced this pull request Mar 12, 2014
maurosilber added a commit to maurosilber/pandas that referenced this pull request Dec 17, 2021
…#44821)

Fixes issue pandas-dev#44821.

When trying to iterate on a subset of columns in a GroupBy object,
it returned all columns, instead of the selected subset.

GroupBy.__iter__ used self.obj instead of self._selected_obj (see
PR pandas-dev#6570).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants