Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Remove rolling window fixed algorithms #36567

Merged
merged 32 commits into from
Oct 9, 2020

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Sep 23, 2020

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Pros:

Cons:

  • A performance hit
       before           after         ratio
     [d9722efe]       [198ab9e8]
     <clean/rolling_aggregations^2>       <clean/rolling_aggregations>
+     1.27±0.02ms         4.05±1ms     3.19  rolling.Quantile.time_quantile('Series', 10, 'float', 1, 'linear')
+     1.27±0.01ms       3.67±0.7ms     2.88  rolling.Quantile.time_quantile('Series', 10, 'float', 1, 'nearest')
+     1.27±0.01ms       3.22±0.3ms     2.54  rolling.Quantile.time_quantile('Series', 10, 'float', 1, 'higher')
+     1.27±0.01ms      2.99±0.08ms     2.36  rolling.Quantile.time_quantile('Series', 10, 'float', 1, 'midpoint')
+        1.29±0ms      2.92±0.01ms     2.26  rolling.Quantile.time_quantile('Series', 10, 'float', 0, 'nearest')
+        1.29±0ms      2.91±0.01ms     2.26  rolling.Quantile.time_quantile('Series', 10, 'float', 0, 'midpoint')
+     1.29±0.08ms      2.91±0.01ms     2.25  rolling.Methods.time_rolling('Series', 10, 'float', 'max')
+     1.29±0.01ms         2.91±0ms     2.25  rolling.Quantile.time_quantile('Series', 10, 'float', 0, 'higher')
+     1.29±0.01ms      2.90±0.02ms     2.24  rolling.Quantile.time_quantile('Series', 10, 'float', 0, 'linear')
+     1.30±0.02ms      2.91±0.04ms     2.24  rolling.Quantile.time_quantile('Series', 10, 'float', 0, 'lower')
+     1.30±0.03ms      2.90±0.01ms     2.23  rolling.Quantile.time_quantile('Series', 10, 'float', 1, 'lower')
+     1.32±0.08ms      2.93±0.02ms     2.22  rolling.Methods.time_rolling('Series', 10, 'float', 'min')
+     1.33±0.03ms         2.95±0ms     2.21  rolling.Methods.time_rolling('Series', 10, 'int', 'max')
+     1.44±0.01ms      3.11±0.03ms     2.16  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'lower')
+     1.37±0.05ms      2.96±0.01ms     2.16  rolling.Methods.time_rolling('Series', 10, 'int', 'min')
+     1.44±0.01ms      3.10±0.02ms     2.15  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'midpoint')
+     1.44±0.01ms      3.07±0.01ms     2.14  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'higher')
+        1.44±0ms      3.08±0.01ms     2.14  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'nearest')
+     1.45±0.01ms      3.07±0.01ms     2.13  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'max')
+     1.46±0.01ms      3.07±0.01ms     2.10  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 1, 'linear')
+        1.47±0ms      3.09±0.02ms     2.10  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 0, 'linear')
+     1.48±0.02ms      3.10±0.02ms     2.10  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 0, 'higher')
+        1.47±0ms         3.08±0ms     2.09  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'min')
+     1.47±0.01ms      3.07±0.01ms     2.09  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 0, 'midpoint')
+      1.48±0.2ms      3.10±0.02ms     2.09  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 0, 'nearest')
+     1.47±0.04ms      3.05±0.02ms     2.08  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'linear')
+     1.52±0.01ms      3.15±0.02ms     2.07  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'max')
+     1.54±0.01ms      3.17±0.03ms     2.06  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'min')
+        1.49±0ms      3.05±0.06ms     2.04  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'nearest')
+        1.49±0ms      3.03±0.03ms     2.03  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'lower')
+        1.49±0ms      2.98±0.03ms     1.99  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'midpoint')
+     1.40±0.01ms      2.75±0.01ms     1.97  rolling.ExpandingMethods.time_expanding('Series', 'float', 'max')
+        1.49±0ms      2.93±0.02ms     1.96  rolling.Quantile.time_quantile('Series', 1000, 'float', 1, 'higher')
+     1.44±0.01ms      2.81±0.02ms     1.96  rolling.ExpandingMethods.time_expanding('Series', 'int', 'max')
+      1.59±0.2ms      3.11±0.03ms     1.95  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 0, 'lower')
+         930±3μs      1.80±0.01ms     1.94  rolling.Quantile.time_quantile('Series', 10, 'int', 0, 'higher')
+        1.52±0ms      2.94±0.02ms     1.93  rolling.Quantile.time_quantile('Series', 1000, 'float', 0, 'linear')
+      1.51±0.2ms      2.92±0.01ms     1.93  rolling.Methods.time_rolling('Series', 1000, 'float', 'max')
+         933±4μs      1.80±0.01ms     1.93  rolling.Quantile.time_quantile('Series', 10, 'int', 0, 'nearest')
+         931±4μs      1.79±0.01ms     1.93  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'nearest')
+     1.69±0.01ms      3.26±0.01ms     1.93  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 1, 'linear')
+         933±4μs      1.80±0.01ms     1.93  rolling.Quantile.time_quantile('Series', 10, 'int', 0, 'midpoint')
+      1.52±0.2ms      2.93±0.01ms     1.93  rolling.Methods.time_rolling('Series', 1000, 'float', 'min')
+        937±10μs      1.80±0.01ms     1.92  rolling.Quantile.time_quantile('Series', 10, 'int', 0, 'lower')
+        1.56±0ms      2.99±0.05ms     1.92  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'max')
+        940±30μs      1.80±0.01ms     1.91  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'midpoint')
+     1.56±0.02ms      2.98±0.01ms     1.91  rolling.Methods.time_rolling('Series', 1000, 'int', 'max')
+         935±3μs      1.79±0.01ms     1.91  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'higher')
+         934±7μs      1.78±0.01ms     1.91  rolling.Quantile.time_quantile('Series', 10, 'int', 0, 'linear')
+         931±3μs      1.78±0.01ms     1.91  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'lower')
+     1.45±0.01ms         2.76±0ms     1.91  rolling.ExpandingMethods.time_expanding('Series', 'float', 'min')
+        1.63±0ms      3.11±0.01ms     1.90  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0, 'nearest')
+         937±4μs      1.78±0.01ms     1.90  rolling.Quantile.time_quantile('Series', 10, 'int', 1, 'linear')
+         961±2μs      1.82±0.04ms     1.90  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'midpoint')
+     1.49±0.02ms      2.82±0.01ms     1.89  rolling.ExpandingMethods.time_expanding('Series', 'int', 'min')
+         964±9μs      1.82±0.02ms     1.88  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'nearest')
+         963±7μs      1.81±0.05ms     1.88  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'linear')
+         962±8μs      1.81±0.01ms     1.88  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'higher')
+         967±3μs      1.81±0.01ms     1.88  rolling.Quantile.time_quantile('Series', 1000, 'int', 0, 'lower')
+         966±3μs      1.81±0.02ms     1.87  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'linear')
+         964±1μs      1.80±0.02ms     1.87  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'nearest')
+     1.66±0.03ms      3.10±0.01ms     1.87  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 1, 'nearest')
+         964±4μs         1.80±0ms     1.87  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'higher')
+        1.12±0ms      2.08±0.01ms     1.87  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'nearest')
+     1.67±0.01ms      3.10±0.01ms     1.86  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 1, 'midpoint')
+     1.63±0.02ms      3.02±0.03ms     1.86  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'max')
+     1.68±0.01ms      3.11±0.01ms     1.86  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 1, 'lower')
+         969±3μs      1.80±0.02ms     1.85  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'midpoint')
+     1.68±0.01ms      3.11±0.02ms     1.85  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 1, 'higher')
+     1.61±0.09ms      2.97±0.01ms     1.85  rolling.Methods.time_rolling('Series', 1000, 'int', 'min')
+         970±7μs      1.79±0.01ms     1.84  rolling.Quantile.time_quantile('Series', 1000, 'int', 1, 'lower')
+     1.72±0.01ms       3.16±0.1ms     1.84  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0, 'linear')
+     1.71±0.02ms      3.16±0.07ms     1.84  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'max')
+     1.70±0.08ms      3.13±0.02ms     1.84  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'min')
+     1.70±0.07ms      3.11±0.01ms     1.83  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0, 'lower')
+     1.71±0.01ms      3.13±0.02ms     1.83  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0, 'higher')
+     1.65±0.01ms      3.00±0.02ms     1.82  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'min')
+     1.15±0.01ms      2.09±0.01ms     1.82  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'linear')
+     1.71±0.02ms      3.11±0.01ms     1.82  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'max')
+        1.71±0ms      3.11±0.01ms     1.82  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0, 'midpoint')
+     1.62±0.01ms      2.94±0.03ms     1.82  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'min')
+        1.12±0ms      2.02±0.05ms     1.80  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'higher')
+     1.76±0.08ms      3.15±0.01ms     1.80  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'min')
+     1.12±0.01ms      1.99±0.03ms     1.78  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'nearest')
+     1.12±0.01ms      1.99±0.02ms     1.78  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'lower')
+        1.14±0ms      2.03±0.03ms     1.78  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'higher')
+        1.12±0ms      1.99±0.01ms     1.78  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'higher')
+        1.15±0ms      2.03±0.05ms     1.77  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'linear')
+     1.13±0.01ms      2.00±0.05ms     1.77  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'nearest')
+        1.12±0ms      1.98±0.04ms     1.77  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'lower')
+        1.12±0ms      1.97±0.01ms     1.76  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'midpoint')
+     1.13±0.01ms      1.99±0.01ms     1.76  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'midpoint')
+        1.12±0ms      1.97±0.01ms     1.76  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 1, 'linear')
+     1.14±0.01ms      1.98±0.01ms     1.74  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'midpoint')
+     1.14±0.02ms         1.97±0ms     1.74  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'lower')
+     1.14±0.01ms         1.97±0ms     1.73  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'higher')
+     1.14±0.01ms      1.97±0.01ms     1.73  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 0, 'midpoint')
+        1.14±0ms      1.97±0.01ms     1.72  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'nearest')
+     1.14±0.01ms      1.97±0.01ms     1.72  rolling.Quantile.time_quantile('DataFrame', 1000, 'int', 1, 'lower')
+     1.46±0.07ms      2.48±0.01ms     1.70  rolling.Methods.time_rolling('Series', 10, 'float', 'std')
+     1.21±0.09ms      2.00±0.02ms     1.66  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'linear')
+     1.54±0.03ms      2.54±0.01ms     1.65  rolling.Methods.time_rolling('Series', 1000, 'int', 'std')
+     1.62±0.04ms      2.66±0.01ms     1.65  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'std')
+     1.68±0.01ms      2.75±0.02ms     1.64  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'std')
+      1.56±0.1ms      2.53±0.01ms     1.63  rolling.Methods.time_rolling('Series', 10, 'int', 'std')
+     1.66±0.03ms      2.67±0.01ms     1.61  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'std')
+     1.71±0.03ms      2.73±0.01ms     1.59  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'std')
+        913±40μs         1.42±0ms     1.56  rolling.Methods.time_rolling('Series', 10, 'float', 'mean')
+      1.62±0.2ms      2.49±0.01ms     1.54  rolling.Methods.time_rolling('Series', 1000, 'float', 'std')
+       933±100μs      1.43±0.01ms     1.53  rolling.Methods.time_rolling('Series', 1000, 'float', 'mean')
+        980±30μs      1.48±0.01ms     1.51  rolling.Methods.time_rolling('Series', 10, 'int', 'mean')
+     1.07±0.01ms      1.60±0.01ms     1.50  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'mean')
+        985±60μs      1.47±0.01ms     1.50  rolling.Methods.time_rolling('Series', 1000, 'int', 'mean')
+        1.21±0ms      1.81±0.01ms     1.49  rolling.ExpandingMethods.time_expanding('Series', 'float', 'kurt')
+        1.43±0ms      2.11±0.01ms     1.47  rolling.Methods.time_rolling('Series', 1000, 'int', 'kurt')
+        1.14±0ms      1.68±0.01ms     1.47  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'mean')
+        1.26±0ms         1.86±0ms     1.47  rolling.ExpandingMethods.time_expanding('Series', 'int', 'kurt')
+     1.40±0.08ms      2.06±0.01ms     1.47  rolling.Methods.time_rolling('Series', 10, 'float', 'kurt')
+     1.14±0.01ms         1.67±0ms     1.46  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'mean')
+     1.12±0.01ms      1.63±0.01ms     1.45  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'mean')
+     1.55±0.01ms      2.24±0.01ms     1.44  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'kurt')
+         846±4μs         1.22±0ms     1.44  rolling.ExpandingMethods.time_expanding('Series', 'float', 'mean')
+     1.38±0.01ms      1.98±0.01ms     1.43  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'kurt')
+        1.62±0ms      2.30±0.01ms     1.42  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'kurt')
+     1.27±0.02ms      1.80±0.02ms     1.41  rolling.Methods.time_rolling('Series', 10, 'float', 'skew')
+     1.57±0.02ms      2.22±0.06ms     1.41  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'kurt')
+     1.32±0.08ms      1.86±0.01ms     1.41  rolling.Methods.time_rolling('Series', 10, 'int', 'skew')
+     1.44±0.01ms      2.03±0.01ms     1.41  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'kurt')
+         895±3μs         1.26±0ms     1.40  rolling.ExpandingMethods.time_expanding('Series', 'int', 'mean')
+     1.32±0.06ms         1.84±0ms     1.40  rolling.Methods.time_rolling('Series', 1000, 'int', 'skew')
+     1.64±0.04ms      2.29±0.01ms     1.40  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'kurt')
+     1.42±0.01ms         1.97±0ms     1.39  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'skew')
+        900±50μs      1.25±0.02ms     1.38  rolling.Methods.time_rolling('Series', 10, 'float', 'sum')
+     2.12±0.07ms      2.92±0.01ms     1.38  rolling.Quantile.time_quantile('Series', 1000, 'float', 0, 'lower')
+     1.49±0.01ms         2.04±0ms     1.37  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'skew')
+     1.45±0.04ms      1.98±0.03ms     1.37  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'skew')
+     1.51±0.02ms      2.05±0.01ms     1.36  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'skew')
+      1.51±0.2ms      2.05±0.01ms     1.36  rolling.Methods.time_rolling('Series', 1000, 'float', 'kurt')
+     1.02±0.01ms      1.38±0.01ms     1.36  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'mean')
+     1.08±0.01ms      1.46±0.02ms     1.36  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'mean')
+        962±60μs         1.28±0ms     1.34  rolling.Methods.time_rolling('Series', 10, 'int', 'sum')
+         841±5μs      1.12±0.01ms     1.33  rolling.ExpandingMethods.time_expanding('Series', 'float', 'sum')
+     1.26±0.01ms      1.66±0.02ms     1.33  rolling.ExpandingMethods.time_expanding('Series', 'int', 'skew')
+        969±80μs      1.28±0.01ms     1.32  rolling.Methods.time_rolling('Series', 1000, 'int', 'sum')
+     1.20±0.02ms      1.58±0.01ms     1.31  rolling.ExpandingMethods.time_expanding('Series', 'float', 'skew')
+     1.14±0.01ms      1.48±0.01ms     1.30  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'sum')
+         899±5μs         1.17±0ms     1.30  rolling.ExpandingMethods.time_expanding('Series', 'int', 'sum')
+     1.08±0.04ms      1.40±0.01ms     1.30  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'sum')
+     1.09±0.04ms      1.41±0.01ms     1.29  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'sum')
+        1.36±0ms      1.75±0.01ms     1.29  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'skew')
+     1.15±0.06ms      1.47±0.01ms     1.28  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'sum')
+        1.01±0ms         1.28±0ms     1.27  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'sum')
+        1.43±0ms      1.81±0.01ms     1.27  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'skew')
+     1.22±0.03ms      1.54±0.01ms     1.26  rolling.Methods.time_rolling('Series', 10, 'float', 'count')
+        1.10±0ms      1.38±0.01ms     1.26  rolling.ExpandingMethods.time_expanding('Series', 'int', 'count')
+     1.07±0.01ms         1.35±0ms     1.26  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'sum')
+        1.36±0ms      1.71±0.02ms     1.26  rolling.Methods.time_rolling('DataFrame', 10, 'int', 'count')
+     1.40±0.01ms      1.74±0.01ms     1.24  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'count')
+     1.14±0.01ms         1.40±0ms     1.23  rolling.ExpandingMethods.time_expanding('Series', 'float', 'count')
+      1.22±0.1ms         1.49±0ms     1.22  rolling.Methods.time_rolling('Series', 10, 'int', 'count')
+     1.40±0.04ms      1.70±0.01ms     1.21  rolling.Methods.time_rolling('DataFrame', 1000, 'int', 'count')
+     1.43±0.09ms      1.73±0.01ms     1.21  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'count')
+        1.29±0ms      1.56±0.01ms     1.21  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'count')
+     1.33±0.01ms      1.60±0.01ms     1.20  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'count')
+        1.38±0ms      1.66±0.01ms     1.20  rolling.ExpandingMethods.time_expanding('Series', 'float', 'std')
+     1.44±0.01ms      1.72±0.01ms     1.19  rolling.ExpandingMethods.time_expanding('Series', 'int', 'std')
+     1.56±0.01ms         1.84±0ms     1.18  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'std')
+     1.62±0.01ms      1.89±0.01ms     1.17  rolling.ExpandingMethods.time_expanding('DataFrame', 'int', 'std')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@mroeschke mroeschke changed the title WIP: REF: Remove rolling window fixed algorithms REF: Remove rolling window fixed algorithms Sep 25, 2020
@@ -767,7 +767,7 @@ def test_rolling_numerical_too_large_numbers():
ds[2] = -9e33
result = ds.rolling(5).mean()
expected = pd.Series(
[np.nan, np.nan, np.nan, np.nan, -1.8e33, -1.8e33, -1.8e33, 0.0, 6.0, 7.0],
[np.nan, np.nan, np.nan, np.nan, -1.8e33, -1.8e33, -1.8e33, 5.0, 6.0, 7.0],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was thought that the 0 was expected due to numerical precision, but by using the variable algorithm instead of the fixed algorithm we get the correct value

xref #11645 (comment)

@mroeschke mroeschke added Refactor Internal refactoring of code Window rolling, ewma, expanding labels Sep 25, 2020
@mroeschke mroeschke added this to the 1.2 milestone Sep 25, 2020
@jreback
Copy link
Contributor

jreback commented Sep 26, 2020

so all of your bencharmarks on quantile are a bit misleading; they are actually measuring the perf diff in min/max (which is what percentile 0 or 1 does)

@@ -414,7 +356,7 @@ cdef inline float64_t calc_var(int64_t minp, int ddof, float64_t nobs,
result = 0
else:
result = ssqdm_x / (nobs - <float64_t>ddof)
if result < 0:
if result < 1e-15:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this new? ok but can you add a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, turns out var/std doesn't use Kahan Summation so using the variable algorithm has a small numerical imprecision

@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

let's file an issue for followon to see if we can improve min//max rolling with variable.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

any comments @TomAugspurger

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can add a release, e.g. slight performance decrease in min/max fixed algos.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

do these changes help / hurt #36132 ? (maybe worth pulling in and xfailing these tests)? though that can certainly be a followon

@mroeschke
Copy link
Member Author

mroeschke commented Oct 2, 2020

Yeah I was thinking #36132 could be simplified once this PR is merged in. Probably best on a follow-on

@mroeschke mroeschke mentioned this pull request Oct 4, 2020
32 tasks
@mroeschke
Copy link
Member Author

All green

@jreback
Copy link
Contributor

jreback commented Oct 5, 2020

cc @pandas-dev/pandas-core if any comments.

@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

can you merge master and ping on green.

@mroeschke
Copy link
Member Author

@jreback green

@jreback jreback merged commit 846cff9 into pandas-dev:master Oct 9, 2020
@jreback
Copy link
Contributor

jreback commented Oct 9, 2020

thanks @mroeschke very nice

@mroeschke mroeschke deleted the clean/rolling_aggregations branch October 9, 2020 23:34
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
* Fix rolling test result, removed fixed algorithms, some rolling apply with center tests still failing

* Rename function and move offset to correct location

* Impliment center in terms of indexers

* Get all the tests to pass

* Remove center from _apply as no longer needed

* Add better typing

* Deal with numeric precision

* Remove code related to center now being implimented in the fixed indexer

* Add note regarding precision issues

* Note performance hit

* Remove tilde

* Change useage of get_cython_func_type

* Remove self.center from count's _apply

Co-authored-by: Matt Roeschke <mroeschke@housecanary.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants