PERF: Sparse IntIndex.make_union / Numeric ops #13036

sinhrks · 2016-04-30T01:21:19Z

tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Replace repeated list.append with np.union1d in IntIndex.make_union. make_union is used in numeric ops.

NOTE: It is also possible to fix IntIndex.intersect to use np.intersect1d, but it doesn't increase the performance (because the length of the result is smaller).

The below microbench assumes array's 90% is sparse.

import numpy as np
import pandas as pd

np.random.seed(1)
N = 1000000
a = np.array([np.nan] * N)
b = np.array([np.nan] * N)

indexer_a = np.unique(np.random.randint(0, N, N / 10))
indexer_b = np.unique(np.random.randint(0, N, N / 10))
a[indexer_a] = np.random.randint(0, 100, len(indexer_a))
b[indexer_b] = np.random.randint(0, 100, len(indexer_b))

sa = pd.SparseArray(a)
sb = pd.SparseArray(b)

on current master

%timeit sa.sp_index.make_union(sb.sp_index)
#10 loops, best of 3: 52.7 ms per loop

%timeit sa + sb
10 loops, best of 3: 47.8 ms per loop

After this PR

%timeit sa.sp_index.make_union(sb.sp_index)
100 loops, best of 3: 11.6 ms per loop

%timeit sa + sb
100 loops, best of 3: 15.3 ms per loop

codecov-io · 2016-04-30T05:28:21Z

Current coverage is 84.06%

Merging #13036 into master will decrease coverage by -0.00%

@@             master     #13036   diff @@
==========================================
  Files           136        136          
  Lines         50005      49994    -11   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42040      42027    -13   
- Misses         7965       7967     +2   
  Partials          0          0

2 files (not in diff) in pandas/tseries were modified. more
- Misses +1
- Hits -7
2 files (not in diff) in pandas/io were modified. more
- Misses +1
- Hits -1
1 files (not in diff) in pandas were modified. more
- Hits -5

Powered by Codecov. Last updated by e9ef522...33104f5

jreback · 2016-04-30T16:53:46Z

this looks fine for 0.18.1. do you want to add some sparse benchmarks?

sinhrks · 2016-04-30T20:39:43Z

OK, added whatsnew and bench.

   before     after       ratio
  [286782da] [a48f05db]
-   18.15ms     8.20ms      0.45  sparse.sparse_arithmetic.time_sparse_addition_10percent
-   18.17ms     8.08ms      0.44  sparse.sparse_arithmetic.time_sparse_division_10percent
-   17.97ms     7.73ms      0.43  sparse.sparse_arithmetic.time_sparse_addition_10percent_zero
-   18.47ms     7.84ms      0.42  sparse.sparse_arithmetic.time_sparse_division_10percent_zero
-    2.06ms   810.82μs      0.39  sparse.sparse_arithmetic.time_sparse_division_1percent
-    2.05ms   804.12μs      0.39  sparse.sparse_arithmetic.time_sparse_addition_1percent

jreback · 2016-04-30T20:42:39Z

pandas/src/sparse.pyx


        # if is one already, returns self
        y = y_.to_int_index()

        if self.length != y.length:
-            raise Exception('Indices must reference same underlying length')
-


not that I am recommending it, because the numpy code is just so much shorter. But this could have been much faster if the array was pre-allocated (to a max len), then you slice it at the end. Rather than appending using a list.

Yeah will take suggested approach when I fix others using list.append.

jreback · 2016-04-30T20:42:55Z

ok ping on green.

sinhrks · 2016-04-30T23:16:45Z

Thanks, now green.

jreback · 2016-04-30T23:23:12Z

thanks @sinhrks

sinhrks added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type labels Apr 30, 2016

sinhrks added this to the 0.18.2 milestone Apr 30, 2016

sinhrks force-pushed the sparse_make_union branch from 33104f5 to 8903da7 Compare April 30, 2016 03:50

sinhrks force-pushed the sparse_make_union branch 2 times, most recently from a48f05d to d0034fe Compare April 30, 2016 20:31

PERF: Sparse IntIndex.make_union

b1cf4b5

sinhrks force-pushed the sparse_make_union branch from d0034fe to b1cf4b5 Compare April 30, 2016 20:38

sinhrks modified the milestones: 0.18.1, 0.18.2 Apr 30, 2016

jreback reviewed Apr 30, 2016
View reviewed changes

jreback closed this in 3ff5af0 Apr 30, 2016

sinhrks deleted the sparse_make_union branch April 30, 2016 23:23

sinhrks mentioned this pull request May 4, 2016

PERF: Sparse ops speedup #13082

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Sparse IntIndex.make_union / Numeric ops #13036

PERF: Sparse IntIndex.make_union / Numeric ops #13036

sinhrks commented Apr 30, 2016 •

edited

Loading

codecov-io commented Apr 30, 2016 •

edited

Loading

jreback commented Apr 30, 2016

sinhrks commented Apr 30, 2016

jreback Apr 30, 2016

sinhrks Apr 30, 2016

jreback commented Apr 30, 2016

sinhrks commented Apr 30, 2016

jreback commented Apr 30, 2016

PERF: Sparse IntIndex.make_union / Numeric ops #13036

PERF: Sparse IntIndex.make_union / Numeric ops #13036

Conversation

sinhrks commented Apr 30, 2016 • edited Loading

on current master

After this PR

codecov-io commented Apr 30, 2016 • edited Loading

Current coverage is 84.06%

jreback commented Apr 30, 2016

sinhrks commented Apr 30, 2016

jreback Apr 30, 2016

Choose a reason for hiding this comment

sinhrks Apr 30, 2016

Choose a reason for hiding this comment

jreback commented Apr 30, 2016

sinhrks commented Apr 30, 2016

jreback commented Apr 30, 2016

sinhrks commented Apr 30, 2016 •

edited

Loading

codecov-io commented Apr 30, 2016 •

edited

Loading