ENH: Use views in set_id_type_array_py #664

larsoner · 2018-06-29T02:18:10Z

Using this snippet (modified to properly set empty(..., int) so the C works properly):

from tvtk.array_handler import set_id_type_array_py
from tvtk.array_ext import set_id_type_array
import numpy as np

n = 100000
cs = 10
a = np.arange(cs*n)
a.shape = n, cs
b = np.empty(n*(cs+1), int)

%timeit set_id_type_array(a, b)
%timeit set_id_type_array_py(a, b)

on master I get roughly ~2x slower:

785 µs ± 6.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.4 ms ± 8.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

On this PR (which only uses views which do not copy the actual data / are for the most part free) I get only ~25% slower:

777 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
961 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So very close in speed. This makes me think compiling might not ever be worth the hassle.

FWIW in a previous version of this (@prabhuramachandran your original snippet) there was b = np.empty(n*(cs+1)) without specifying the type as int. The Python code at least gave the correct/expected output (integers stored in the float array b) whereas the C code gave garbage. But I doubt this is really done / useful in practice. There should maybe also be an assert statement about the dtype of out_array at some point.

codecov-io · 2018-06-29T02:25:00Z

Codecov Report

Merging #664 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #664      +/-   ##
==========================================
+ Coverage   50.42%   50.43%   +<.01%     
==========================================
  Files         257      257              
  Lines       23370    23371       +1     
  Branches     3187     3186       -1     
==========================================
+ Hits        11785    11786       +1     
  Misses      10828    10828              
  Partials      757      757

Impacted Files	Coverage Δ
tvtk/array_handler.py	`80.11% <100%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02bc71d...9f1da55. Read the comment docs.

prabhuramachandran · 2018-06-29T07:48:51Z

Ahh yes, this is very nice! This tells me nicely that I should not benchmark stuff late in the night with little sleep. Since the compilation is now optional anyway it should be OK to just leave it as it is. My original benchmarks were very different and did the right thing, this was something I ran last night. Thank you for the extra pair of eyes and a nicer implementation.

larsoner · 2018-06-29T11:33:11Z

On my machine I still get close to 50% slowdown and with smaller arrays even 2x.

But the call time will correspondingly shrink when the array size gets smaller. Will this actually affect real-world use cases? I wonder if at this point there is extra need-to-compile induced complexity here without any real-world benefit for users.

prabhuramachandran · 2018-06-29T11:37:42Z

The issue is not for smaller data but for larger data where a 50% reduction means more time taken. It does affect "real" world use cases with a large number of cells. They are not for the majority of the casual users but for actual cases with larger data. So given that there is a small increase in performance and that everything works, I think this is fine. There is also the possibility of other code (at Enthought for example) where this may be used and I don't want to change that.

larsoner · 2018-06-29T12:08:58Z

FWIW I think for larger arrays the performance difference should (asymptotically) disappear.

…

prabhuramachandran · 2018-06-29T13:25:46Z

It doesn't on my machine here are the numbers on my machine, showing that when N is about a million or more it is about 60% slower.

Here is the code from a notebook:

from tvtk.array_handler import set_id_type_array_py
from tvtk.array_ext import set_id_type_array
import numpy as np

def get_data(n, cs=10):
    a = np.arange(cs*n)
    a.shape = n, cs
    b = np.empty(n*(cs+1), dtype=int)
    return a, b

t1, t2 = [], []
sizes = 10**np.arange(1, 8)
for n in sizes:
    a, b = get_data(n)
    r1 = %timeit -o set_id_type_array(a, b)
    t1.append(r1)
    r2 = %timeit -o set_id_type_array_py(a, b)
    t2.append(r2)

%matplotlib inline
import matplotlib.pyplot as plt
fac = np.array([x.best for x in t2])/np.array([x.best for x in t1])
plt.semilogx(sizes, fac)
plt.xlabel('N'); plt.ylabel('py/cy')

larsoner · 2018-06-29T14:55:08Z

Ahh, so it is. Yet another lesson for me about not making (bad) assumptions about optimization!

prabhuramachandran · 2018-06-29T15:24:06Z

No worries, the %timeit -o trick was very handy here.

ENH: Use views in set_id_type_array_py

9f1da55

larsoner mentioned this pull request Jun 29, 2018

problems installing mne on win10 64bit, py3.6(anaconda). mne-tools/mne-python#5288

Closed

prabhuramachandran merged commit a33fae1 into enthought:master Jun 29, 2018

larsoner deleted the views branch June 29, 2018 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Use views in set_id_type_array_py #664

ENH: Use views in set_id_type_array_py #664

larsoner commented Jun 29, 2018

codecov-io commented Jun 29, 2018 •

edited

Loading

prabhuramachandran commented Jun 29, 2018

larsoner commented Jun 29, 2018

prabhuramachandran commented Jun 29, 2018

larsoner commented Jun 29, 2018 via email

prabhuramachandran commented Jun 29, 2018

larsoner commented Jun 29, 2018

prabhuramachandran commented Jun 29, 2018 •

edited

Loading

ENH: Use views in set_id_type_array_py #664

ENH: Use views in set_id_type_array_py #664

Conversation

larsoner commented Jun 29, 2018

codecov-io commented Jun 29, 2018 • edited Loading

Codecov Report

prabhuramachandran commented Jun 29, 2018

larsoner commented Jun 29, 2018

prabhuramachandran commented Jun 29, 2018

larsoner commented Jun 29, 2018 via email

prabhuramachandran commented Jun 29, 2018

larsoner commented Jun 29, 2018

prabhuramachandran commented Jun 29, 2018 • edited Loading

codecov-io commented Jun 29, 2018 •

edited

Loading

prabhuramachandran commented Jun 29, 2018 •

edited

Loading