Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: revert "CLN: _consolidate_inplace less" / fix regression in fillna() #34407

Merged
merged 6 commits into from
Nov 26, 2020

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented May 27, 2020

Reverts #34389

Closes #36495

@jbrockmendel
Copy link
Member

do the asvs show any of the perf issues mentioned in #34389?

@jorisvandenbossche
Copy link
Member Author

The benchmark server isn't running at the moment, see #34389 (comment)

@jbrockmendel
Copy link
Member

i was hoping you'd be able to run them locally. (as mentioned in the other thread, im wrestling with hardware issues ATM)

@jorisvandenbossche
Copy link
Member Author

Sorry, I currently don't have time to run the full ASV suite.

@jreback jreback added Clean Internals Related to non-user accessible pandas implementation labels May 27, 2020
@jreback jreback added this to the 1.1 milestone May 27, 2020
@jbrockmendel
Copy link
Member

ive gotten asv working locally, will do a run on #34389 if were OK on holding off on merging this

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented May 27, 2020

As I also said in #34389, running asv is not necessarily sufficient, as it not sure that the functions that are tested are actually covered (it's not that our benchmark suite is 100% covering all cases, and even then, they don't necessarily include cases that start with non-consolidated data).
I think you need to specifically check the functions you changed and compare them with and without consolidation.

@jorisvandenbossche
Copy link
Member Author

Or, for some of them it might be sufficient to simply check in the code if it would matter or not.
For example, I was looking at the take example, and NDFrame.take calls BlockManager.take, and the first thing that does is also consolidate inplace.
So in such as case, of course the consolidate_inplace in NDFrame.take can be removed as a code clean-up, and it also won't matter anything performance wise.

@jbrockmendel
Copy link
Member

So in such as case, of course the consolidate_inplace in NDFrame.take can be removed as a code clean-up, and it also won't matter anything performance wise.

Let's un-revert that here and whittle down the places that merit double-checking

@jbrockmendel
Copy link
Member

I'm not seeing any consistent pattern in asv results

@jorisvandenbossche
Copy link
Member Author

Let's un-revert that here and whittle down the places that merit double-checking

Done

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel what's the status here? (I re-reverted the ones discussed above)

Do we continue with this PR reverting those, or do you have time to check those cases?

@jbrockmendel
Copy link
Member

Do we continue with this PR reverting those, or do you have time to check those cases?

Thanks for following up. I've gotten my hardware issues resolved, will run a round of asvs on this.

@jorisvandenbossche
Copy link
Member Author

As mentioned before (#34407 (comment)) and discussed elsewhere, I am not sure that asv will give any useful information for those changes.

@jbrockmendel
Copy link
Member

As mentioned before (#34407 (comment)) and discussed elsewhere, I am not sure that asv will give any useful information for those changes.

Neither am I. But if they do show something, that will help us whittle down the places that need more manual attention.

@jorisvandenbossche
Copy link
Member Author

But what I want to say is: even if they don't show anything, each case still needs manual attention, as the asv's (AFAIK) don't include non-consolidated data.

For example, a small timing for xs (on master, so with the automatic consolidation removed):

In [2]: df = pd.DataFrame(index=list(range(10000))) 

In [3]: for i in range(10): 
   ...:     df[i] = np.random.randn(10000) 
   ...:    

In [6]: df2 = df._consolidate()                                                                                                                                                                                    

In [11]: %timeit df.xs(0)  
72 µs ± 2.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit df2.xs(0)   
49.8 µs ± 547 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Do we care about this difference, I am not fully sure (that's to be discussed then). But it clearly has an impact.

@jbrockmendel
Copy link
Member

But what I want to say is: I haven't had my caffeine yet, and running asvs now doesn't preclude doing %timeits later

@jbrockmendel
Copy link
Member

Full asv run results below. A lot of these look like they may be the result of not being rebased on master, so I'm going to do that and re-run.

       before           after         ratio
     [bbb89cad]       [1228510e]
     <master>         <revert-34389-consolidate-less-1>
+         773±2μs      2.66±0.02ms     3.44  timeseries.DatetimeIndex.time_normalize('repeated')
+      77.2±0.9μs        256±0.7μs     3.31  timeseries.SortIndex.time_sort_index(True)
+         983±5μs      3.15±0.01ms     3.21  timeseries.DatetimeAccessor.time_dt_accessor_normalize('UTC')
+         984±4μs      3.15±0.01ms     3.20  timeseries.DatetimeAccessor.time_dt_accessor_normalize(tzutc())
+        988±10μs      3.12±0.02ms     3.16  timeseries.DatetimeAccessor.time_dt_accessor_normalize(None)
+        776±10μs      2.28±0.01ms     2.94  indexing.DataFrameNumericIndexing.time_bool_indexer
+         778±1μs         2.08±0ms     2.67  timeseries.DatetimeIndex.time_normalize('tz_naive')
+         332±9ns          750±4ns     2.26  tslibs.period.PeriodProperties.time_property('min', 'quarter')
+         348±5ns          761±2ns     2.19  tslibs.period.PeriodProperties.time_property('min', 'qyear')
+         355±6ns          767±4ns     2.16  tslibs.period.PeriodProperties.time_property('M', 'quarter')
+       235±0.3μs        498±0.4μs     2.12  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int16Engine'>, <class 
'numpy.int16'>), 'monotonic_incr')
+         381±4ns          806±6ns     2.12  tslibs.period.PeriodProperties.time_property('min', 'month')
+        369±10ns          778±7ns     2.11  tslibs.period.PeriodProperties.time_property('M', 'qyear')
+         383±1ns          808±2ns     2.11  tslibs.period.PeriodProperties.time_property('min', 'day')
+         383±5ns          807±2ns     2.11  tslibs.period.PeriodProperties.time_property('min', 'minute')
+         383±7ns          805±4ns     2.10  tslibs.period.PeriodProperties.time_property('min', 'hour')
+         384±5ns          806±2ns     2.10  tslibs.period.PeriodProperties.time_property('min', 'second')
+         396±1ns          816±5ns     2.06  tslibs.period.PeriodProperties.time_property('min', 'dayofyear')
+         398±5ns          816±4ns     2.05  tslibs.period.PeriodProperties.time_property('min', 'year')
+        435±10ns          884±6ns     2.03  tslibs.period.PeriodProperties.time_property('min', 'daysinmonth')
+         400±1ns          812±9ns     2.03  tslibs.period.PeriodProperties.time_property('M', 'month')
+         401±5ns          811±3ns     2.02  tslibs.period.PeriodProperties.time_property('M', 'minute')
+         404±8ns          817±3ns     2.02  tslibs.period.PeriodProperties.time_property('M', 'day')
+         402±3ns          810±3ns     2.01  tslibs.period.PeriodProperties.time_property('min', 'dayofweek')
+         405±9ns          811±4ns     2.00  tslibs.period.PeriodProperties.time_property('M', 'hour')
+         419±4ns          836±4ns     1.99  tslibs.period.PeriodProperties.time_property('M', 'dayofyear')
+         457±6ns          909±3ns     1.99  tslibs.period.PeriodProperties.time_property('M', 'daysinmonth')
+         406±7ns          806±7ns     1.99  tslibs.period.PeriodProperties.time_property('M', 'second')
+         416±4ns          825±7ns     1.99  tslibs.period.PeriodProperties.time_property('M', 'year')
+         418±6ns          830±1ns     1.99  tslibs.period.PeriodProperties.time_property('M', 'dayofweek')
+        440±10ns          866±4ns     1.97  tslibs.period.PeriodProperties.time_property('min', 'is_leap_year')
+        449±20ns          877±5ns     1.95  tslibs.period.PeriodProperties.time_property('min', 'week')
+        469±20ns          892±3ns     1.90  tslibs.period.PeriodProperties.time_property('M', 'week')
+         465±6ns         873±10ns     1.88  tslibs.period.PeriodProperties.time_property('M', 'is_leap_year')
+     1.19±0.01μs      1.84±0.03μs     1.55  tslibs.offsets.OnOffset.time_on_offset(<DateOffset: days=2, months=2>)
+         325±1μs          493±2μs     1.52  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class '
numpy.int8'>), 'monotonic_incr')
+        25.3±2ms         37.6±1ms     1.49  gil.ParallelDatetimeFields.time_datetime_field_normalize
+        11.0±1μs       16.3±0.2μs     1.49  tslibs.period.PeriodUnaryMethods.time_asfreq('min')
+      27.6±0.2ms       40.8±0.5ms     1.48  timeseries.ToDatetimeFormat.time_different_offset
+     7.03±0.01ms       10.3±0.1ms     1.46  groupby.CountMultiInt.time_multi_int_nunique
+       129±0.4μs        188±0.6μs     1.46  timeseries.DatetimeIndex.time_normalize('dst')
+      10.4±0.9μs       15.0±0.2μs     1.45  tslibs.period.PeriodUnaryMethods.time_now('M')
+     5.91±0.02ms      8.42±0.04ms     1.42  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<SemiMonthEnd: day_of_month=15>)
+     5.94±0.04ms      8.45±0.01ms     1.42  arithmetic.OffsetArrayArithmetic.time_add_series_offset(<SemiMonthEnd: day_of_month=15>)
+     5.76±0.02ms      8.16±0.05ms     1.42  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<SemiMonthBegin: day_of_month=15>)
+         924±3μs         1.29±0ms     1.40  series_methods.NanOps.time_func('sum', 1000000, 'int8')
+     5.81±0.04ms      8.11±0.01ms     1.40  arithmetic.OffsetArrayArithmetic.time_add_series_offset(<SemiMonthBegin: day_of_month=15>)
+     5.44±0.04ms      7.58±0.03ms     1.39  arithmetic.OffsetArrayArithmetic.time_add_series_offset(<BusinessDay>)
+     5.42±0.02ms      7.53±0.01ms     1.39  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<BusinessDay>)
+        11.9±1μs       16.2±0.1μs     1.36  tslibs.period.PeriodUnaryMethods.time_asfreq('M')
+      1.30±0.01s       1.75±0.02s     1.35  groupby.Apply.time_copy_function_multi_col
+      26.2±0.1ms      34.4±0.07ms     1.31  timeseries.DatetimeIndex.time_normalize('tz_aware')
+     3.02±0.03μs      3.90±0.06μs     1.29  period.Indexing.time_get_loc
+     2.81±0.03ms      3.62±0.01ms     1.29  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'min')
+     2.83±0.02ms      3.64±0.03ms     1.29  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'int', 'max')
+     1.28±0.01ms      1.62±0.02ms     1.27  series_methods.NanOps.time_func('prod', 1000000, 'int8')
+      14.9±0.4μs       18.3±0.2μs     1.23  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<DateOffset: days=2, months=2>)
+     38.3±0.09μs       47.2±0.5μs     1.23  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<CustomBusinessDay>)
+      2.05±0.2μs       2.52±0.5μs     1.23  index_cached_properties.IndexCache.time_values('IntervalIndex')
+      15.1±0.4μs       18.4±0.1μs     1.22  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<DateOffset: days=2, months=2>)
+      35.5±0.2μs       43.5±0.1μs     1.22  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<CustomBusinessDay>)
+      15.4±0.5μs       18.8±0.3μs     1.22  tslibs.offsets.OffestDatetimeArithmetic.time_add(<DateOffset: days=2, months=2>)
+      13.4±0.4μs       16.4±0.3μs     1.22  period.Indexing.time_series_loc
+      19.9±0.1μs       24.0±0.4μs     1.21  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessYearBegin: month=1>)
+      19.6±0.1μs       23.6±0.8μs     1.21  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<YearBegin: month=1>)
+      21.1±0.5μs       25.4±0.1μs     1.20  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessYearEnd: month=12>)
+      19.8±0.1μs       23.8±0.1μs     1.20  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessYearEnd: month=12>)
+       144±0.2μs          173±2μs     1.20  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<CustomBusinessMonthBegin>)
+     1.44±0.05ms      1.72±0.09ms     1.20  period.PeriodIndexConstructor.time_from_pydatetime('D', True)
+     1.45±0.04ms       1.73±0.1ms     1.20  period.PeriodIndexConstructor.time_from_pydatetime('D', False)
+       521±0.9ms          624±2ms     1.20  groupby.Apply.time_copy_overhead_single_col
+     20.3±0.09μs       24.3±0.1μs     1.20  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<SemiMonthBegin: day_of_month=15>)
+       175±0.7μs          209±4μs     1.20  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<CustomBusinessMonthBegin>)
+     3.53±0.02ms       4.22±0.5ms     1.20  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'max')
+      56.3±0.7ms       67.3±0.8ms     1.19  series_methods.SeriesConstructor.time_constructor('dict')
+     21.4±0.08μs       25.5±0.3μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<YearEnd: month=12>)
+         128±1μs          152±2μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<CustomBusinessMonthEnd>)
+      20.2±0.2μs       24.1±0.3μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessYearBegin: month=1>)
+      19.7±0.1μs      23.5±0.07μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessYearEnd: month=12>)
+         124±1μs          148±4μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<CustomBusinessMonthEnd>)
+         160±1μs          190±2μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<CustomBusinessMonthBegin>)
+      18.6±0.1μs       22.0±0.3μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<MonthBegin>)
+     19.7±0.09μs       23.4±0.5μs     1.19  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<YearEnd: month=12>)
+      19.1±0.1μs       22.6±0.1μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessMonthEnd>)
+     21.2±0.09μs       25.1±0.3μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessYearBegin: month=1>)
+      19.1±0.2μs       22.5±0.3μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessMonthEnd>)
+      20.1±0.2μs       23.8±0.2μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<SemiMonthEnd: day_of_month=15>)
+       167±0.7μs          198±1μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<CustomBusinessMonthEnd>)
+      16.9±0.1μs       20.0±0.2μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<SemiMonthBegin: day_of_month=15>)
+      19.6±0.3μs      23.1±0.07μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<QuarterBegin: startingMonth=3>)
+      21.4±0.1μs      25.3±0.07μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<SemiMonthBegin: day_of_month=15>)
+       142±0.1μs          167±2μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<CustomBusinessMonthBegin>)
+      18.8±0.1μs       22.2±0.2μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<MonthEnd>)
+     18.5±0.09μs      21.7±0.08μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<MonthBegin>)
+      16.7±0.1μs       19.7±0.2μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<YearBegin: month=1>)
+        37.1±1μs       43.6±0.1μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<DateOffset: days=2, months=2>)
+      17.7±0.2μs       20.8±0.4μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessYearBegin: month=1>)
+     20.3±0.08μs       23.9±0.1μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<SemiMonthBegin: day_of_month=15>)
+      16.9±0.3μs       19.9±0.1μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessYearEnd: month=12>)
+      16.7±0.1μs       19.7±0.2μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessMonthBegin>)
+     19.0±0.09μs      22.3±0.09μs     1.18  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessMonthBegin>)
+     21.1±0.03μs       24.8±0.6μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<YearBegin: month=1>)
+       162±0.5μs          190±3μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<CustomBusinessMonthBegin>)
+      16.7±0.1μs       19.6±0.2μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<YearEnd: month=12>)
+      20.1±0.1μs       23.6±0.3μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<YearEnd: month=12>)
+      17.2±0.1μs      20.1±0.05μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add(<YearEnd: month=12>)
+     16.7±0.09μs      19.6±0.09μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<YearBegin: month=1>)
+     17.4±0.08μs       20.3±0.1μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add(<YearBegin: month=1>)
+     28.2±0.09μs       32.9±0.3μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<CustomBusinessDay>)
+      20.4±0.2μs       23.8±0.2μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<SemiMonthEnd: day_of_month=15>)
+      17.1±0.3μs      20.0±0.09μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessYearBegin: month=1>)
+         109±1μs          127±2μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<CustomBusinessMonthEnd>)
+      20.1±0.1μs       23.5±0.3μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessQuarterBegin: startingMonth=3>)
+      17.5±0.1μs       20.5±0.3μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessYearEnd: month=12>)
+      19.8±0.1μs       23.1±0.2μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<YearBegin: month=1>)
+       162±0.6μs          189±2μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add(<CustomBusinessMonthBegin>)
+     4.19±0.04μs       4.89±0.1μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<Day>)
+      16.9±0.1μs       19.7±0.1μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessMonthEnd>)
+      17.6±0.1μs      20.6±0.06μs     1.17  tslibs.offsets.OffestDatetimeArithmetic.time_add(<SemiMonthEnd: day_of_month=15>)
+      19.8±0.2μs       23.1±0.2μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessMonthBegin>)
+     5.28±0.02μs      6.14±0.03μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<Day>)
+     5.22±0.04ms      6.08±0.08ms     1.16  tslibs.offsets.OnOffset.time_on_offset(<CustomBusinessMonthEnd>)
+     16.7±0.05μs       19.4±0.1μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<MonthBegin>)
+      4.21±0.3μs       4.89±0.5μs     1.16  index_cached_properties.IndexCache.time_shape('IntervalIndex')
+     28.4±0.08μs       33.1±0.6μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<CustomBusinessDay>)
+     5.56±0.03ms       6.45±0.1ms     1.16  tslibs.offsets.OnOffset.time_on_offset(<CustomBusinessMonthBegin>)
+     16.6±0.06μs       19.3±0.4μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<YearEnd: month=12>)
+      21.4±0.2μs       24.9±0.3μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessQuarterEnd: startingMonth=3>)
+     5.39±0.02μs      6.25±0.03μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<Day>)
+     21.4±0.07μs       24.8±0.1μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<SemiMonthEnd: day_of_month=15>)
+      19.9±0.1μs       23.1±0.1μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessQuarterBegin: startingMonth=3>)
+      19.7±0.1μs      22.9±0.06μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<MonthEnd>)
+      17.2±0.2μs       19.9±0.3μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<SemiMonthEnd: day_of_month=15>)
+      1.57±0.02s       1.81±0.03s     1.16  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<CustomBusinessMonthBegin>)
+      1.58±0.01s       1.83±0.02s     1.16  arithmetic.OffsetArrayArithmetic.time_add_series_offset(<CustomBusinessMonthBegin>)
+         109±1μs          126±1μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_add(<CustomBusinessMonthEnd>)
+      17.0±0.2μs       19.6±0.3μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<SemiMonthBegin: day_of_month=15>)
+     19.4±0.08μs       22.4±0.2μs     1.16  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessDay>)
+      18.2±0.2μs       21.0±0.1μs     1.15  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+      20.1±0.2μs       23.3±0.2μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessQuarterEnd: startingMonth=3>)
+      17.2±0.1μs       19.9±0.3μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessYearBegin: month=1>)
+      17.2±0.1μs      19.8±0.06μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessQuarterEnd: startingMonth=3>)
+      20.2±0.2μs       23.2±0.1μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessQuarterEnd: startingMonth=3>)
+         1.07±0s       1.24±0.02s     1.15  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<CustomBusinessMonthEnd>)
+      16.6±0.1μs      19.2±0.05μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<MonthEnd>)
+     17.0±0.05μs       19.6±0.5μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessDay>)
+     20.4±0.08μs       23.5±0.1μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<QuarterEnd: startingMonth=3>)
+     17.5±0.08μs      20.1±0.09μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_add(<QuarterEnd: startingMonth=3>)
+      4.53±0.1μs      5.21±0.07μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_add(<Day>)
+      17.6±0.1μs       20.3±0.4μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessQuarterBegin: startingMonth=3>)
+         1.09±0s       1.26±0.03s     1.15  arithmetic.OffsetArrayArithmetic.time_add_series_offset(<CustomBusinessMonthEnd>)
+     17.0±0.04μs       19.6±0.2μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessYearEnd: month=12>)
+      16.9±0.1μs       19.4±0.3μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessMonthBegin>)
+      17.1±0.1μs       19.6±0.1μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessQuarterBegin: startingMonth=3>)
+      16.6±0.2μs       19.0±0.1μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessDay>)
+      19.8±0.1μs       22.7±0.2μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessMonthEnd>)
+      16.9±0.1μs       19.3±0.1μs     1.15  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<QuarterBegin: startingMonth=3>)
+     19.2±0.07μs       21.9±0.2μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<BusinessMonthBegin>)
+      20.7±0.1μs       23.7±0.1μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessDay>)
+     17.6±0.09μs       20.1±0.2μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessMonthEnd>)
+     17.1±0.05μs       19.6±0.1μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<QuarterEnd: startingMonth=3>)
+       108±0.6μs          124±3μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<CustomBusinessMonthEnd>)
+     17.1±0.07μs      19.5±0.09μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<QuarterEnd: startingMonth=3>)
+      19.9±0.1μs       22.8±0.2μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<QuarterBegin: startingMonth=3>)
+     4.33±0.05μs      4.94±0.07μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<Day>)
+     7.05±0.05ms      8.03±0.05ms     1.14  io.hdf.HDFStoreDataFrame.time_query_store_table
+      16.9±0.1μs       19.2±0.2μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<QuarterBegin: startingMonth=3>)
+      16.5±0.3μs       18.7±0.4μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessDay>)
+     19.5±0.08μs       22.2±0.2μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<BusinessDay>)
+     17.3±0.09μs       19.7±0.4μs     1.14  tslibs.offsets.OffestDatetimeArithmetic.time_add(<MonthEnd>)
+     10.7±0.03ms      12.2±0.08ms     1.14  groupby.MultiColumn.time_col_select_numpy_sum
+      21.5±0.1μs       24.4±0.2μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<QuarterEnd: startingMonth=3>)
+     17.1±0.08μs       19.4±0.1μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<BusinessMonthEnd>)
+      17.2±0.1μs       19.5±0.5μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<SemiMonthEnd: day_of_month=15>)
+      17.9±0.2μs       20.2±0.1μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_add(<SemiMonthBegin: day_of_month=15>)
+     20.4±0.09μs       23.1±0.2μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_add_10(<QuarterEnd: startingMonth=3>)
+      16.6±0.1μs       18.7±0.2μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_apply(<MonthBegin>)
+      19.8±0.3μs      22.4±0.09μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<MonthBegin>)
+      17.2±0.2μs       19.4±0.2μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessQuarterEnd: startingMonth=3>)
+     17.2±0.06μs       19.4±0.1μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<BusinessQuarterBegin: startingMonth=3>)
+         301±6ms          341±2ms     1.13  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<CustomBusinessDay>)
+      21.4±0.1μs       24.2±0.1μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<QuarterBegin: startingMonth=3>)
+     17.3±0.08μs      19.6±0.09μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_add(<QuarterBegin: startingMonth=3>)
+         965±5μs         1.09±0ms     1.13  arithmetic.ApplyIndex.time_apply_index(<DateOffset: days=2, months=2>)
+     21.5±0.06μs       24.3±0.1μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<BusinessQuarterBegin: startingMonth=3>)
+      19.1±0.1μs       21.6±0.2μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_subtract(<MonthEnd>)
+      17.4±0.2μs       19.7±0.3μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessMonthBegin>)
+      17.0±0.1μs       19.2±0.1μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_apply_np_dt64(<MonthEnd>)
+     17.9±0.05μs       20.2±0.1μs     1.13  tslibs.offsets.OffestDatetimeArithmetic.time_add(<BusinessQuarterEnd: startingMonth=3>)
+     6.13±0.02μs      6.87±0.05μs     1.12  tslibs.offsets.OffestDatetimeArithmetic.time_subtract_10(<Day>)
+      5.90±0.2μs      6.59±0.02μs     1.12  indexing.NonNumericSeriesIndexing.time_getitem_scalar('period', 'non_monotonic')
+     10.7±0.05ms       12.0±0.1ms     1.12  stat_ops.Rank.time_rank('Series', False)
+      20.5±0.2μs       22.9±0.2μs     1.12  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.Float64Index'>, 'unique_monotonic_inc')
+         305±6ms          339±9ms     1.11  arithmetic.OffsetArrayArithmetic.time_add_series_offset(<CustomBusinessDay>)
+     17.4±0.08μs      19.4±0.09μs     1.11  tslibs.offsets.OffestDatetimeArithmetic.time_add(<MonthBegin>)
+     10.8±0.03ms      12.0±0.06ms     1.11  stat_ops.Rank.time_rank('Series', True)
+        2.26±0ms       2.51±0.4ms     1.11  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'sum')
+      29.6±0.3μs      32.9±0.06μs     1.11  tslibs.offsets.OffestDatetimeArithmetic.time_add(<CustomBusinessDay>)
+        1.09±0ms         1.21±0ms     1.11  arithmetic.OffsetArrayArithmetic.time_add_dti_offset(<DateOffset: days=2, months=2>)
+     2.26±0.01ms       2.49±0.4ms     1.10  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'int', 'sum')
+     3.31±0.02ms      3.65±0.03ms     1.10  rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'max')
+     3.39±0.01ms       3.74±0.5ms     1.10  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'int', 'std')
-         278±2ms          253±2ms     0.91  io.json.ReadJSONLines.time_read_json_lines_concat('datetime')
-        873±40ns         793±20ns     0.91  index_cached_properties.IndexCache.time_is_monotonic('RangeIndex')
-      6.20±0.2μs       5.62±0.3μs     0.91  index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
-      3.47±0.2ms      3.13±0.04ms     0.90  rolling.Apply.time_rolling('Series', 300, 'int', <function Apply.<lambda> at 0x7f62bd249d08>, True)
-     6.88±0.03ms      6.21±0.06ms     0.90  frame_methods.Apply.time_apply_pass_thru
-     1.41±0.05μs      1.26±0.04μs     0.90  index_cached_properties.IndexCache.time_inferred_type('DatetimeIndex')
-     1.07±0.08μs         950±30ns     0.89  index_cached_properties.IndexCache.time_is_monotonic_decreasing('Int64Index')
-        515±10ns         459±20ns     0.89  index_cached_properties.IndexCache.time_is_all_dates('Int64Index')
-     1.06±0.01ms          937±4μs     0.89  frame_methods.Quantile.time_frame_quantile(1)
-        81.4±1μs       72.1±0.3μs     0.89  tslibs.timestamp.TimestampOps.time_ceil(None)
-        83.1±3μs       73.5±0.6μs     0.88  tslibs.period.PeriodUnaryMethods.time_to_timestamp('M')
-        80.6±2μs       71.3±0.6μs     0.88  tslibs.timestamp.TimestampOps.time_floor(None)
-     1.13±0.08μs      1.00±0.06μs     0.88  index_cached_properties.IndexCache.time_is_all_dates('Float64Index')
-        83.9±3μs       74.1±0.3μs     0.88  tslibs.period.PeriodProperties.time_property('M', 'start_time')
-        83.8±4μs       73.6±0.2μs     0.88  tslibs.period.PeriodUnaryMethods.time_to_timestamp('min')
-        85.5±3μs         75.0±1μs     0.88  tslibs.period.PeriodProperties.time_property('min', 'start_time')
-        71.3±2μs       62.3±0.6μs     0.87  dtypes.Dtypes.time_pandas_dtype('period[D]')
-      1.68±0.02s          1.47±0s     0.87  groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'direct')
-        722±40ns         630±20ns     0.87  index_cached_properties.IndexCache.time_is_monotonic_increasing('Int64Index')
-      1.69±0.01s          1.47±0s     0.87  groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'transformation')
-      2.39±0.02s       2.08±0.01s     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'describe', 'direct')
-      3.70±0.03s          3.21±0s     0.87  groupby.GroupByMethods.time_dtype_as_group('float', 'describe', 'transformation')
-      2.38±0.02s          2.07±0s     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'describe', 'transformation')
-      3.74±0.2μs      3.24±0.06μs     0.87  index_cached_properties.IndexCache.time_shape('DatetimeIndex')
-            198M             171M     0.87  io.json.ReadJSONLines.peakmem_read_json_lines_concat('datetime')
-            198M             171M     0.87  io.json.ReadJSONLines.peakmem_read_json_lines_concat('int')
-      1.62±0.01s          1.41±0s     0.87  groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'transformation')
-      4.03±0.2μs       3.49±0.1μs     0.87  index_cached_properties.IndexCache.time_engine('UInt64Index')
-      3.71±0.03s       3.21±0.01s     0.87  groupby.GroupByMethods.time_dtype_as_group('float', 'describe', 'direct')
-        517±20ns         447±10ns     0.86  index_cached_properties.IndexCache.time_inferred_type('Int64Index')
-      1.62±0.01s          1.40±0s     0.86  groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'direct')
-     1.58±0.06μs      1.36±0.03μs     0.86  index_cached_properties.IndexCache.time_is_all_dates('MultiIndex')
-     2.13±0.08μs       1.80±0.1μs     0.85  index_cached_properties.IndexCache.time_shape('UInt64Index')
-     1.22±0.05ms         974±20μs     0.80  arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('le')
-         540±4μs          414±2μs     0.77  frame_methods.Quantile.time_frame_quantile(0)
-      1.57±0.2μs      1.18±0.05μs     0.76  index_cached_properties.IndexCache.time_values('UInt64Index')
-      1.80±0.3μs      1.29±0.04μs     0.72  index_cached_properties.IndexCache.time_values('DatetimeIndex')
-         146±4μs       92.5±0.3μs     0.63  tslibs.period.PeriodProperties.time_property('M', 'end_time')
-        13.2±1μs      8.00±0.04μs     0.60  tslibs.timestamp.TimestampOps.time_normalize(<UTC>)
-      13.1±0.6μs       7.88±0.1μs     0.60  tslibs.timestamp.TimestampOps.time_normalize(None)
-      13.5±0.8μs      8.02±0.04μs     0.59  tslibs.timestamp.TimestampOps.time_normalize(tzutc())
-         161±4μs       81.4±0.7μs     0.51  tslibs.period.PeriodProperties.time_property('min', 'end_time')

@jreback jreback removed this from the 1.1 milestone Jul 9, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Jul 31, 2020
@jbrockmendel
Copy link
Member

@jorisvandenbossche is this still active?

@simonjayhawkins
Copy link
Member

I milestoned this 1.1.1 since it reverted a PR that caused a regression #35488. #35578 has been merged (and backported) to fix #35488 so am removing milestone.

@simonjayhawkins simonjayhawkins removed this from the 1.1.1 milestone Aug 11, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.4 milestone Sep 28, 2020
@jreback
Copy link
Contributor

jreback commented Oct 14, 2020

i am not sure it makes sense to push this w/o a clear patch backed by tests. As i am not sure what excatly this is fixing.

@jreback jreback modified the milestones: 1.1.4, 1.2 Oct 26, 2020
@jreback
Copy link
Contributor

jreback commented Oct 26, 2020

moving this off 1.1.4 as not really clear metrics on how to evaluate this.

@jreback
Copy link
Contributor

jreback commented Nov 18, 2020

not sure what the point of the PR Is any longer.

@jreback jreback removed this from the 1.2 milestone Nov 18, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1.5 milestone Nov 25, 2020
@jorisvandenbossche
Copy link
Member Author

not sure what the point of the PR Is any longer.

There are still open regressions related to this (#36495).
(will take a look again in a few days)

@jreback
Copy link
Contributor

jreback commented Nov 25, 2020

not sure what the point of the PR Is any longer.

There are still open regressions related to this (#36495).
(will take a look again in a few days)

maybe so but we still don't have any testing. so -1 on including this on 1.1.5 at this point.

@jorisvandenbossche jorisvandenbossche changed the title Revert "CLN: _consolidate_inplace less" REGR: revert "CLN: _consolidate_inplace less" / fix regression in fillna() Nov 26, 2020
@jorisvandenbossche
Copy link
Member Author

but we still don't have any testing

I added a test for the regression reported in #36495, which is fixed by this

)
df_nonconsol = df.pivot("i1", "i2")
result = df_nonconsol.fillna(0)
assert result.isna().sum().sum() == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a test in #36668 if you wanted a more explicit assertion

@jbrockmendel
Copy link
Member

I added a test for the regression reported in #36495, which is fixed by this

great. there are four consolidations this adds. is any one in particular responsible for fixing this bug?

@jorisvandenbossche
Copy link
Member Author

I suppose it is the consolidation in fillna that specifically fixes the reported regression.

I could do a PR with specifically only that change, but as mentioned above (#34407 (comment)), I think it can't hurt to change the other cases as well. Since the original PR #34389, we now already reverted parts of that change in 3 other PRs that were fixing regressions, and this is a 4th.

@jreback jreback merged commit 27989a6 into master Nov 26, 2020
@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the revert-34389-consolidate-less-1 branch November 26, 2020 19:08
@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

@lumberbot-app

This comment has been minimized.

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Nov 27, 2020
…solidate_inplace less" / fix regression in fillna()
simonjayhawkins added a commit that referenced this pull request Nov 27, 2020
…nplace less" / fix regression in fillna() (#38115)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jbrockmendel
Copy link
Member

AFAICT the consolidation in quantile is never reached. am i missing something?

@jorisvandenbossche
Copy link
Member Author

The default of the keyword is True, so it should be reached? (since we never specify this keyword anywhere at the moment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: fillna not filling NaNs after pivot without explicitly listing pivot values
4 participants