PERF: improved clip performance #16364

jreback · 2017-05-16T02:33:08Z

closes #15400
In [1]: np.random.seed(1234)

In [2]: s = pd.Series(np.random.randn(50))

master

In [3]: %timeit s.clip(0, 1)
1.65 ms ± 48.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

PR

In [3]: %timeit s.clip(0, 1)
124 µs ± 2.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

prob as good as can do for now as we still have 2 where ops (numpy does this in a single loop), and we have a mask check and fill (and final construction).

but about 15x better

closes pandas-dev#15400

codecov · 2017-05-16T10:47:31Z

Codecov Report

Merging #16364 into master will decrease coverage by 0.01%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master   #16364      +/-   ##
==========================================
- Coverage   90.38%   90.36%   -0.02%     
==========================================
  Files         161      161              
  Lines       50916    50933      +17     
==========================================
+ Hits        46021    46028       +7     
- Misses       4895     4905      +10

Flag	Coverage Δ
#multiple	`88.14% <94.11%> (ø)`	⬆️
#single	`40.21% <5.88%> (-0.12%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`91.96% <94.11%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.68% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d92f06a...62843f8. Read the comment docs.

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

jorisvandenbossche

There is a slight change in behaviour in that the new implementation does not preserve heterogeneous data types (eg int/float).
Not that this should hold back these perf improvements, but we might consider to keep it for 0.21 (this is also no regression)

jorisvandenbossche · 2017-05-16T21:06:18Z

pandas/core/generic.py

+        result = self.values
+        mask = isnull(result)
+        if upper is not None:
+            result = np.where(result >= upper, upper, result)


I think this needs a with np.errstate, as we are working with raw array

In [8]: pd.Series([0, np.nan, 2]).clip(0, 1) /home/joris/scipy/pandas/pandas/core/generic.py:4117: RuntimeWarning: invalid value encountered in greater_equal result = np.where(result >= upper, upper, result) /home/joris/scipy/pandas/pandas/core/generic.py:4119: RuntimeWarning: invalid value encountered in less_equal result = np.where(result <= lower, lower, result) Out[8]: 0 0.0 1 NaN 2 1.0 dtype: float64

jorisvandenbossche · 2017-05-16T21:06:48Z

pandas/core/generic.py

+    def _clip_with_scalar(self, lower, upper):
+
+        if ((lower is not None and np.any(isnull(lower))) or
+                (upper is not None and np.any(isnull(upper)))):


Are the np.any needed here? As lower/upper are already confirmed to be a scalar?

jorisvandenbossche · 2017-05-16T21:18:41Z

new implementation does not preserve heterogeneous data types

In principle could add a check for that (at the if statement to decide to take this path or not), but not sure that is worth it ..

jreback · 2017-05-16T22:03:45Z

yeah this is a limitation of the current methodology

let me see what i can do

Additional test cases for #16364 when upper and / or lower is nan.

* upstream/master: (48 commits) BUG: Categorical comparison with unordered (pandas-dev#16339) ENH: Adding 'protocol' parameter to 'to_pickle'. PERF: improve MultiIndex get_loc performance (pandas-dev#16346) TST: remove pandas-datareader xfail as 0.4.0 works (pandas-dev#16374) TST: followup to pandas-dev#16364, catch errstate warnings (pandas-dev#16373) DOC: new oauth token TST: Add test for clip-na (pandas-dev#16369) ENH: Draft metadata specification doc for Apache Parquet (pandas-dev#16315) MAINT: Add .iml to .gitignore (pandas-dev#16368) BUG/API: Categorical constructor scalar categories (pandas-dev#16340) ENH: Provide dict object for to_dict() pandas-dev#16122 (pandas-dev#16220) PERF: improved clip performance (pandas-dev#16364) DOC: try new token for docs DOC: try with new secure token DOC: add developer section to the docs DEPS: Drop Python 3.4 support (pandas-dev#16303) DOC: remove credential helper DOC: force fetch on build docs DOC: redo dev docs access token DOC: add dataframe construction in merge_asof example (pandas-dev#16348) ...

closes pandas-dev#15400

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

…v#16373)

closes pandas-dev#15400 (cherry picked from commit 42e2a87)

…v#16373) (cherry picked from commit e97865e)

closes #15400 (cherry picked from commit 42e2a87)

(cherry picked from commit e97865e)

closes pandas-dev#15400

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

…v#16373)

jreback added the Performance Memory or execution speed performance label May 16, 2017

jreback added this to the 0.20.2 milestone May 16, 2017

PERF: improved clip performance

62843f8

closes pandas-dev#15400

jreback force-pushed the clip branch from 6efa1c8 to 62843f8 Compare May 16, 2017 10:14

jreback merged commit 42e2a87 into pandas-dev:master May 16, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request May 16, 2017

TST: Add test for clip-na

b546752

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

TomAugspurger mentioned this pull request May 16, 2017

TST: Add test for clip-na #16369

Merged

jorisvandenbossche reviewed May 16, 2017

View reviewed changes

jreback pushed a commit that referenced this pull request May 16, 2017

TST: Add test for clip-na (#16369)

9c8337a

Additional test cases for #16364 when upper and / or lower is nan.

jreback added a commit to jreback/pandas that referenced this pull request May 16, 2017

TST: followup to pandas-dev#16364, catch errstate warnings

ffbb0b5

jreback added a commit that referenced this pull request May 17, 2017

TST: followup to #16364, catch errstate warnings (#16373)

e97865e

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017

PERF: improved clip performance (pandas-dev#16364)

a4730d5

closes pandas-dev#15400

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017

TST: Add test for clip-na (pandas-dev#16369)

6b05e16

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017

TST: followup to pandas-dev#16364, catch errstate warnings (pandas-de…

04ab907

…v#16373)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request May 29, 2017

PERF: improved clip performance (pandas-dev#16364)

41d90dc

closes pandas-dev#15400 (cherry picked from commit 42e2a87)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request May 29, 2017

TST: followup to pandas-dev#16364, catch errstate warnings (pandas-de…

a495669

…v#16373) (cherry picked from commit e97865e)

TomAugspurger added Backported and removed Needs Backport labels May 30, 2017

TomAugspurger pushed a commit that referenced this pull request May 30, 2017

PERF: improved clip performance (#16364)

f16141f

closes #15400 (cherry picked from commit 42e2a87)

TomAugspurger pushed a commit that referenced this pull request May 30, 2017

TST: followup to #16364, catch errstate warnings (#16373)

fef4136

(cherry picked from commit e97865e)

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

PERF: improved clip performance (pandas-dev#16364)

4c6b1c9

closes pandas-dev#15400

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

TST: Add test for clip-na (pandas-dev#16369)

15f33e0

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

TST: followup to pandas-dev#16364, catch errstate warnings (pandas-de…

3667eb3

…v#16373)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: improved clip performance #16364

PERF: improved clip performance #16364

jreback commented May 16, 2017

codecov bot commented May 16, 2017 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche May 16, 2017

jorisvandenbossche May 16, 2017

jorisvandenbossche commented May 16, 2017

jreback commented May 16, 2017

PERF: improved clip performance #16364

PERF: improved clip performance #16364

Conversation

jreback commented May 16, 2017

codecov bot commented May 16, 2017 • edited Loading

Codecov Report

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche May 16, 2017

Choose a reason for hiding this comment

jorisvandenbossche May 16, 2017

Choose a reason for hiding this comment

jorisvandenbossche commented May 16, 2017

jreback commented May 16, 2017

codecov bot commented May 16, 2017 •

edited

Loading