performance regression for record array access in numpy 1.10.1 #6467

Closed
beckermr opened this Issue Oct 13, 2015 · 35 comments

Comments

Projects
None yet
@beckermr

It appears that access numpy record arrays by field name is significantly slower in numpy 1.10.1. I have put below a simple example test that illustrates the issue. (I am aware that this particular example is much better accomplished by other means. The point is that array access is slow, not that this a representative problem.)

The test script is

#!/usr/bin/env python
import os
import time
import sys
import numpy as np

def test(N=100000,verbose=False):
    d = np.zeros(1,dtype=[('col','f8')])

    t0 = time.time()
    for i in xrange(N):
        d['col'] += i
    t0 = time.time() - t0

    if verbose:
    print 'numpy version:',np.version.version
        print 'time: %g' % t0

if __name__ == "__main__":
    if len(sys.argv) > 1:
    N = int(sys.argv[1])
    test(N=N,verbose=True)
    else:
    test(verbose=True)

Here are the running times for different versions of numpy:

numpy version: 1.9.3
time: 0.262786

numpy version: 1.10.1
time: 3.57254

@esheldon has reproduced the relative timing differences on linux in addition to my tests which were with my mac.

I profiled the code for v1.10.1 and found this

         3200006 function calls (3000006 primitive calls) in 4.521 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   200000    1.386    0.000    1.883    0.000 _internal.py:372(_check_field_overlap)
600000/400000    0.751    0.000    0.906    0.000 _internal.py:337(_get_all_field_offsets)
        1    0.566    0.566    4.521    4.521 numpy_test.py:7(test)
   200000    0.401    0.000    3.189    0.000 _internal.py:425(_getfield_is_safe)
   200000    0.350    0.000    3.955    0.000 _internal.py:287(_index_fields)
   200000    0.323    0.000    3.513    0.000 {method 'getfield' of 'numpy.ndarray' objects}
   400000    0.279    0.000    0.279    0.000 {range}
   400000    0.155    0.000    0.155    0.000 {method 'update' of 'set' objects}
   400000    0.106    0.000    0.106    0.000 {method 'append' of 'list' objects}
   200000    0.093    0.000    0.093    0.000 {isinstance}
   200000    0.062    0.000    0.062    0.000 {method 'difference' of 'set' objects}
   200000    0.048    0.000    0.048    0.000 {method 'extend' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
        1    0.000    0.000    4.521    4.521 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 {time.time}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

It appears that new code added at the python level for error checking is significantly degrading performance.

@esheldon

This comment has been minimized.

Show comment
Hide comment
@esheldon

esheldon Oct 13, 2015

I see this in 1.10.0 as well, python 2.7.10

I see this in 1.10.0 as well, python 2.7.10

@seberg

This comment has been minimized.

Show comment
Hide comment
@seberg

seberg Oct 13, 2015

Member

I am not surprised to be honest, but I hoped it is not very bad. @ahaldane, can you have a look at it, this must be due to gh-5636.

@beckermr is this a real world problem for you, or just an observation? Just out of curiosity, I frankly did not expect the field access to be very relevant to overall program speed typically. What does your "real world" example really do that it depends so strongly on fast field access?

Member

seberg commented Oct 13, 2015

I am not surprised to be honest, but I hoped it is not very bad. @ahaldane, can you have a look at it, this must be due to gh-5636.

@beckermr is this a real world problem for you, or just an observation? Just out of curiosity, I frankly did not expect the field access to be very relevant to overall program speed typically. What does your "real world" example really do that it depends so strongly on fast field access?

@beckermr

This comment has been minimized.

Show comment
Hide comment
@beckermr

beckermr Oct 13, 2015

This is definitely a real world problem. We found it while trying to diagnose performance issues in a FITS image reader maintained by @esheldon here https://github.com/esheldon/fitsio. This reader is used by a non-trivial segment of the astronomy community. See this issue esheldon/fitsio#58 for the thread (also reference above). Furthermore, @esheldon found that the fitsio test suite (which performs typical operations done with FITS files) has a similar performance degradation. @esheldon might be able to comment as well.

This is definitely a real world problem. We found it while trying to diagnose performance issues in a FITS image reader maintained by @esheldon here https://github.com/esheldon/fitsio. This reader is used by a non-trivial segment of the astronomy community. See this issue esheldon/fitsio#58 for the thread (also reference above). Furthermore, @esheldon found that the fitsio test suite (which performs typical operations done with FITS files) has a similar performance degradation. @esheldon might be able to comment as well.

@seberg

This comment has been minimized.

Show comment
Hide comment
@seberg

seberg Oct 13, 2015

Member

OK, maybe it is possible to find a more exact definition of the problem? For example, I could imagine that the problem is access of many many void scalars and not so much arrays, which I think may have changed as well (but also fixed some bugs...). Since for a larger array, the overhead should just not matter.
It is just that if I recall correclty, these changes, while making things slower, did also fix quite a few things. And maybe we can fix the real world speed issues without trying to go back to the old speeds everywhere.

Member

seberg commented Oct 13, 2015

OK, maybe it is possible to find a more exact definition of the problem? For example, I could imagine that the problem is access of many many void scalars and not so much arrays, which I think may have changed as well (but also fixed some bugs...). Since for a larger array, the overhead should just not matter.
It is just that if I recall correclty, these changes, while making things slower, did also fix quite a few things. And maybe we can fix the real world speed issues without trying to go back to the old speeds everywhere.

@esheldon

This comment has been minimized.

Show comment
Hide comment
@esheldon

esheldon Oct 13, 2015

In fitsio, all tabular data is read into arrays with fields, so field access
is inherent to all operations.

The test suite slows even more than the example given above. Normally
the tests run quickly, in about 800 ms, but in 1.10 they run in 1209s,
a factor of 1500 slower. This is prohibitively slow.

In fitsio, all tabular data is read into arrays with fields, so field access
is inherent to all operations.

The test suite slows even more than the example given above. Normally
the tests run quickly, in about 800 ms, but in 1.10 they run in 1209s,
a factor of 1500 slower. This is prohibitively slow.

@juliantaylor

This comment has been minimized.

Show comment
Hide comment
@juliantaylor

juliantaylor Oct 13, 2015

Contributor

mh python in the indexing code path, I guess we learned nothing from the great mmap regression of 2012 ;)
http://yarikoptic.github.io/numpy-vbench/vb_vb_indexing.html#mmap-slicing
also unfortunate that our benchmark suite does not include a record array case

Contributor

juliantaylor commented Oct 13, 2015

mh python in the indexing code path, I guess we learned nothing from the great mmap regression of 2012 ;)
http://yarikoptic.github.io/numpy-vbench/vb_vb_indexing.html#mmap-slicing
also unfortunate that our benchmark suite does not include a record array case

@charris

This comment has been minimized.

Show comment
Hide comment
@charris

charris Oct 13, 2015

Member

It's certainly an excellent argument for faster releases. Apparently, no one checks development or beta branches ;)

Member

charris commented Oct 13, 2015

It's certainly an excellent argument for faster releases. Apparently, no one checks development or beta branches ;)

@juliantaylor

This comment has been minimized.

Show comment
Hide comment
@juliantaylor

juliantaylor Oct 13, 2015

Contributor

gh-5548 is the problematic one

Contributor

juliantaylor commented Oct 13, 2015

gh-5548 is the problematic one

@ahaldane

This comment has been minimized.

Show comment
Hide comment
@ahaldane

ahaldane Oct 13, 2015

Member

I haven't checked carefully yet, but I think I've already foreseen this problem and fixed it in #6208.

I wasn't sure if it was a real problem, so didn't advertise the PR too much. But if it's affecting people in the real world maybe it's worth trying to get merged soon.

Member

ahaldane commented Oct 13, 2015

I haven't checked carefully yet, but I think I've already foreseen this problem and fixed it in #6208.

I wasn't sure if it was a real problem, so didn't advertise the PR too much. But if it's affecting people in the real world maybe it's worth trying to get merged soon.

@esheldon

This comment has been minimized.

Show comment
Hide comment
@esheldon

esheldon Oct 13, 2015

@ahaldane Yes, large dtypes are common for this code, for example reading from tables with many columns. My test suite, which slowed down by a factor of 1500, in particular has very large dtypes to test reading and writing a large variety of data types.

@ahaldane Yes, large dtypes are common for this code, for example reading from tables with many columns. My test suite, which slowed down by a factor of 1500, in particular has very large dtypes to test reading and writing a large variety of data types.

@seberg

This comment has been minimized.

Show comment
Hide comment
@seberg

seberg Oct 13, 2015

Member

Oh, right, I bet 6208 is exactly what is needed, sorry it stalled :(

Member

seberg commented Oct 13, 2015

Oh, right, I bet 6208 is exactly what is needed, sorry it stalled :(

@danielsf

This comment has been minimized.

Show comment
Hide comment
@danielsf

danielsf Oct 13, 2015

I would just like to add a +1 to the "this is a real-world problem" camp. I work on the software development team for the Large Synoptic Survey Telescope. We have a bunch of code that queries sqlite databases, returns recarrays, and then manipulates those recarrays using their field names. The upgrade to numpy 1.10 slowed down at least one of our unit tests by an order of magnitude. The test in question used a 21-element dtype. We would really appreciate having a fix soon. Thanks.

I would just like to add a +1 to the "this is a real-world problem" camp. I work on the software development team for the Large Synoptic Survey Telescope. We have a bunch of code that queries sqlite databases, returns recarrays, and then manipulates those recarrays using their field names. The upgrade to numpy 1.10 slowed down at least one of our unit tests by an order of magnitude. The test in question used a 21-element dtype. We would really appreciate having a fix soon. Thanks.

@seberg

This comment has been minimized.

Show comment
Hide comment
@seberg

seberg Oct 13, 2015

Member

Don't worry, it is definitely going to be in a 1.10.2 release (if anyone has time to review the open pull request gh-6208 be my guest). Won't argue yet about whether or not we need a very quick 1.10.2 release because of this, some other similar regression may pop up in the next week or two....

Member

seberg commented Oct 13, 2015

Don't worry, it is definitely going to be in a 1.10.2 release (if anyone has time to review the open pull request gh-6208 be my guest). Won't argue yet about whether or not we need a very quick 1.10.2 release because of this, some other similar regression may pop up in the next week or two....

@beckermr

This comment has been minimized.

Show comment
Hide comment
@beckermr

beckermr Oct 13, 2015

Thanks @seberg and @ahaldane! Looking forward to the fix!

Thanks @seberg and @ahaldane! Looking forward to the fix!

@JohnLonginotto

This comment has been minimized.

Show comment
Hide comment
@JohnLonginotto

JohnLonginotto Oct 18, 2015

Oh, I thought I was the only one - phew :) Thank you all for preparing a fix 👍

Oh, I thought I was the only one - phew :) Thank you all for preparing a fix 👍

@charris charris added this to the 1.10.2 release milestone Oct 18, 2015

@JohnLonginotto

This comment has been minimized.

Show comment
Hide comment
@JohnLonginotto

JohnLonginotto Oct 18, 2015

I should mention that in my code there isn't a particularly large dtype. just six uint32s.

code: http://paste.ofcode.org/WbHmraTuDtHMcN65rda7tS
profile on 1.10.1 : http://i.imgur.com/w0hdUY3.png

It takes 17.4 seconds to run in NumPy 1.9.2, and 65.4 seconds to run in NumPy 1.10.1, on a 10,000th sample of my data. So that means on the full set it will take nearly an extra week of processing :P

I should mention that in my code there isn't a particularly large dtype. just six uint32s.

code: http://paste.ofcode.org/WbHmraTuDtHMcN65rda7tS
profile on 1.10.1 : http://i.imgur.com/w0hdUY3.png

It takes 17.4 seconds to run in NumPy 1.9.2, and 65.4 seconds to run in NumPy 1.10.1, on a 10,000th sample of my data. So that means on the full set it will take nearly an extra week of processing :P

@dkirkby dkirkby referenced this issue in dkirkby/bossdata Oct 18, 2015

Closed

Slow creation of spAll full db #106

@dkirkby

This comment has been minimized.

Show comment
Hide comment
@dkirkby

dkirkby Oct 18, 2015

I have a similar real-world use case to what @danielsf described above in dkirkby/bossdata#106 and am also seeing a huge slowdown. Looking forward to 10.1.2 !

dkirkby commented Oct 18, 2015

I have a similar real-world use case to what @danielsf described above in dkirkby/bossdata#106 and am also seeing a huge slowdown. Looking forward to 10.1.2 !

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Oct 22, 2015

Contributor

Unfortunately Astropy revealed a regression related to this that does not appear to be fixed by #6208. Here is a simple way to reproduce:

In [1]: from numpy import zeros, dtype

In [2]: dt = dtype([('PRODUCT_ID', 'S63'), ('COLLECTION_ID', 'S4', (1,)), ('BUNDLE_ID', 'S3'), ('CODE_SVN_REVISION', 'S4'), ('ANC_SVN_REVISION', 'S4'), ('PRODUCT_CREATION_DATE', 'S33'), ('OBSERVATION_TYPE', 'S3'), ('MISSION_PHASE', 'S5'), ('TARGET_NAME', 'S4'), ('ORBIT_SEGMENT', '<i2'), ('ORBIT_NUMBER', '<i2'), ('SOLAR_LONGITUDE', '<f4'), ('GRATING_SELECT', 'S6'), ('KEYHOLE_SELECT', 'S7'), ('BIN_PATTERN_INDEX', 'S18'), ('CADENCE', '<f8'), ('INT_TIME', '<f8'), ('DUTY_CYCLE', '<f8'), ('CHANNEL', 'S3'), ('WAVELENGTH', '<f8', (1024, 1024)), ('WAVELENGTH_WIDTH', '<f8', (1024, 1024)), ('KERNELS', 'S32', (8,))])

In [3]: arr = zeros(1, dtype=dt)

In [4]: arr.dtype = arr.dtype.newbyteorder('>')

Yes it's a particularly ugly dtype, but I don't question it. The last line above takes several seconds to run and ballooned my python process to as high as 2.6 GB (then settling down to 1.6 GB).

Most of the problem here seems to come from the two 1024x1024 fields, so you could probably narrow the problem down to that.

Contributor

embray commented Oct 22, 2015

Unfortunately Astropy revealed a regression related to this that does not appear to be fixed by #6208. Here is a simple way to reproduce:

In [1]: from numpy import zeros, dtype

In [2]: dt = dtype([('PRODUCT_ID', 'S63'), ('COLLECTION_ID', 'S4', (1,)), ('BUNDLE_ID', 'S3'), ('CODE_SVN_REVISION', 'S4'), ('ANC_SVN_REVISION', 'S4'), ('PRODUCT_CREATION_DATE', 'S33'), ('OBSERVATION_TYPE', 'S3'), ('MISSION_PHASE', 'S5'), ('TARGET_NAME', 'S4'), ('ORBIT_SEGMENT', '<i2'), ('ORBIT_NUMBER', '<i2'), ('SOLAR_LONGITUDE', '<f4'), ('GRATING_SELECT', 'S6'), ('KEYHOLE_SELECT', 'S7'), ('BIN_PATTERN_INDEX', 'S18'), ('CADENCE', '<f8'), ('INT_TIME', '<f8'), ('DUTY_CYCLE', '<f8'), ('CHANNEL', 'S3'), ('WAVELENGTH', '<f8', (1024, 1024)), ('WAVELENGTH_WIDTH', '<f8', (1024, 1024)), ('KERNELS', 'S32', (8,))])

In [3]: arr = zeros(1, dtype=dt)

In [4]: arr.dtype = arr.dtype.newbyteorder('>')

Yes it's a particularly ugly dtype, but I don't question it. The last line above takes several seconds to run and ballooned my python process to as high as 2.6 GB (then settling down to 1.6 GB).

Most of the problem here seems to come from the two 1024x1024 fields, so you could probably narrow the problem down to that.

@mhvk

This comment has been minimized.

Show comment
Hide comment
@mhvk

mhvk Oct 22, 2015

Contributor

From astropy/astropy#4259 (comment): it would seem
that one culprit is _get_all_field_offsets in numpy/core/_internal.py, in particular this stanza

        if dtype.shape:
            sub_offsets = _get_all_field_offsets(dtype.base, base_offset)
            count = 1
            for dim in dtype.shape:
                count *= dim
            fields.extend((typ, off + dtype.base.itemsize*j)
                           for j in range(count) for (typ, off) in sub_offsets)

A quick shows that with @embray's dtype, one indeed lands here, with dtype=dtype(('>f8', (1024, 1024))) and count=1048576 (1024**2). Thus the generator gets called millions of times...

Contributor

mhvk commented Oct 22, 2015

From astropy/astropy#4259 (comment): it would seem
that one culprit is _get_all_field_offsets in numpy/core/_internal.py, in particular this stanza

        if dtype.shape:
            sub_offsets = _get_all_field_offsets(dtype.base, base_offset)
            count = 1
            for dim in dtype.shape:
                count *= dim
            fields.extend((typ, off + dtype.base.itemsize*j)
                           for j in range(count) for (typ, off) in sub_offsets)

A quick shows that with @embray's dtype, one indeed lands here, with dtype=dtype(('>f8', (1024, 1024))) and count=1048576 (1024**2). Thus the generator gets called millions of times...

@ahaldane

This comment has been minimized.

Show comment
Hide comment
@ahaldane

ahaldane Oct 22, 2015

Member

I see. #6208 didn't actually speed up the safety checks, it only avoided them in some cases. It looks like your cases are still a problem. I have ideas to speed up the checks, but it might take time to think through, and it looks like with the current strategy there will always be dtypes which are slow.

I think it will be better to revert the view safety checks for 1.10 so I can think through them more carefully.

Member

ahaldane commented Oct 22, 2015

I see. #6208 didn't actually speed up the safety checks, it only avoided them in some cases. It looks like your cases are still a problem. I have ideas to speed up the checks, but it might take time to think through, and it looks like with the current strategy there will always be dtypes which are slow.

I think it will be better to revert the view safety checks for 1.10 so I can think through them more carefully.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Oct 22, 2015

Contributor

@ahaldane Thanks--I would see if I can help out but I'm about to leave for vacation.

Contributor

embray commented Oct 22, 2015

@ahaldane Thanks--I would see if I can help out but I'm about to leave for vacation.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Oct 22, 2015

Contributor

@ahaldane Also, if you don't want to remove the checks entirely (which I agree are valuable) a flag to disable them would be fine too.

Contributor

embray commented Oct 22, 2015

@ahaldane Also, if you don't want to remove the checks entirely (which I agree are valuable) a flag to disable them would be fine too.

@charris

This comment has been minimized.

Show comment
Hide comment
@charris

charris Oct 22, 2015

Member

@ahaldane Do you want to revert this in both master and 1.10.x?

Member

charris commented Oct 22, 2015

@ahaldane Do you want to revert this in both master and 1.10.x?

@charris

This comment has been minimized.

Show comment
Hide comment
@charris

charris Oct 23, 2015

Member

@ahaldane Could you make a list of PRs to revert?

Member

charris commented Oct 23, 2015

@ahaldane Could you make a list of PRs to revert?

@charris

This comment has been minimized.

Show comment
Hide comment
@charris

charris Oct 23, 2015

Member

There are a ton of fixes between #5548 and the present. It might be easier to just put together a PR that reverts the relevant parts. That is, not an official git revert commit but rather just fixing up the files using input from the earlier version.

Member

charris commented Oct 23, 2015

There are a ton of fixes between #5548 and the present. It might be easier to just put together a PR that reverts the relevant parts. That is, not an official git revert commit but rather just fixing up the files using input from the earlier version.

@ahaldane

This comment has been minimized.

Show comment
Hide comment
@ahaldane

ahaldane Oct 23, 2015

Member

I've already done the reversion locally, no problem. I want to think one more time before actually merging the reversion - maybe something like simply disabling the view safety checks would be easier. I don't have much time today but I'll do it this weekend.

Member

ahaldane commented Oct 23, 2015

I've already done the reversion locally, no problem. I want to think one more time before actually merging the reversion - maybe something like simply disabling the view safety checks would be easier. I don't have much time today but I'll do it this weekend.

@charris

This comment has been minimized.

Show comment
Hide comment
@charris

charris Oct 23, 2015

Member

That should be soon enough, I'm shooting for a 1.10.2rc1 a week from now.

Member

charris commented Oct 23, 2015

That should be soon enough, I'm shooting for a 1.10.2rc1 a week from now.

@charris

This comment has been minimized.

Show comment
Hide comment
@charris

charris Oct 27, 2015

Member

The proposed fix has been merged into both master and maintenance/1.10.x. It would be helpful anyone having problems could give it a whirl and report success or failure.

Member

charris commented Oct 27, 2015

The proposed fix has been merged into both master and maintenance/1.10.x. It would be helpful anyone having problems could give it a whirl and report success or failure.

@mhvk

This comment has been minimized.

Show comment
Hide comment
@mhvk

mhvk Oct 27, 2015

Contributor

The example listed above (#6467 (comment)) by @embray is indeed solved. With that, also astropy/astropy#4259 was solved (and is now closed). Thanks!

Contributor

mhvk commented Oct 27, 2015

The example listed above (#6467 (comment)) by @embray is indeed solved. With that, also astropy/astropy#4259 was solved (and is now closed). Thanks!

@beckermr

This comment has been minimized.

Show comment
Hide comment
@beckermr

beckermr Oct 27, 2015

Looks good to me. These are the timing numbers for the test I sent at the top.

numpy version: 1.9.3
time: 0.277699

numpy version: 1.11.0.dev0+522a0f7
time: 0.223134

Looks good to me. These are the timing numbers for the test I sent at the top.

numpy version: 1.9.3
time: 0.277699

numpy version: 1.11.0.dev0+522a0f7
time: 0.223134
@charris

This comment has been minimized.

Show comment
Hide comment
@charris

charris Oct 27, 2015

Member

OK, I'm going to close this now. Feel free to reopen or post another issue if the problem is not fixed for you.

Member

charris commented Oct 27, 2015

OK, I'm going to close this now. Feel free to reopen or post another issue if the problem is not fixed for you.

@megies

This comment has been minimized.

Show comment
Hide comment
@megies

megies Nov 27, 2015

Contributor

Just stumbled over this, too. Test suite of one of our submodules at obspy/obspy was blown up by a factor of 10 due to this on numpy 1.10.1.

Can confirm that current maintenance/1.10.x and master are back to normal.

FYI @krischer @QuLogic

Contributor

megies commented Nov 27, 2015

Just stumbled over this, too. Test suite of one of our submodules at obspy/obspy was blown up by a factor of 10 due to this on numpy 1.10.1.

Can confirm that current maintenance/1.10.x and master are back to normal.

FYI @krischer @QuLogic

@MaxNoe MaxNoe referenced this issue in cta-observatory/dragonboard_testbench Jan 14, 2016

Closed

Reading is very slow with 1.9 < numpy < 1.10.2 #6

@sbailey sbailey referenced this issue in desihub/desitarget Sep 22, 2016

Closed

Python 2->3 Upgrade #60

@tdpetrou

This comment has been minimized.

Show comment
Hide comment
@tdpetrou

tdpetrou Oct 27, 2017

I know this is old but I am still getting terrible performance for structured arrays. I have similar results to the example from this SO question.

>>> n=1000000
>>> dict_homo = {'a': np.zeros(n), 'b': np.zeros(n)}
>>> %timeit dict_homo['a']+=1
462 µs ± 33.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> np_homo = np.zeros(n, dtype=[('a', np.double), ('b', np.double)])
>>> %timeit np_homo['a'] + 1
2.47 ms ± 24.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I also have a much smaller structured array with 50k rows of mixed numeric and object data types and getting this same 6x slow down.

I know this is old but I am still getting terrible performance for structured arrays. I have similar results to the example from this SO question.

>>> n=1000000
>>> dict_homo = {'a': np.zeros(n), 'b': np.zeros(n)}
>>> %timeit dict_homo['a']+=1
462 µs ± 33.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> np_homo = np.zeros(n, dtype=[('a', np.double), ('b', np.double)])
>>> %timeit np_homo['a'] + 1
2.47 ms ± 24.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I also have a much smaller structured array with 50k rows of mixed numeric and object data types and getting this same 6x slow down.

@mhvk

This comment has been minimized.

Show comment
Hide comment
@mhvk

mhvk Oct 27, 2017

Contributor

I don't confirm such a large slow-down, especially after ensuring both operations are not in-place

In [3]: n=1000000

In [4]: dict_homo = {'a': np.zeros(n), 'b': np.zeros(n)}

In [5]: %timeit dict_homo['a']+1
1000 loops, best of 3: 808 µs per loop

In [6]: np_homo = np.zeros(n, dtype=[('a', np.double), ('b', np.double)])

In [7]: %timeit np_homo['a'] + 1
1000 loops, best of 3: 964 µs per loop
In [4]: %timeit dict_homo['a']+=1
The slowest run took 5.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 530 µs per loop

In [5]: np_homo = np.zeros(n, dtype=[('a', np.double), ('b', np.double)])

In [6]: %timeit np_homo['a'] + 1
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 974 µs per loop

Note also that using the record array means that the numbers are not contiguous, so some slowdown is expected.

I tested on numpy 1.13.1. If this is an issue on a later version, please open a separate issue!

Contributor

mhvk commented Oct 27, 2017

I don't confirm such a large slow-down, especially after ensuring both operations are not in-place

In [3]: n=1000000

In [4]: dict_homo = {'a': np.zeros(n), 'b': np.zeros(n)}

In [5]: %timeit dict_homo['a']+1
1000 loops, best of 3: 808 µs per loop

In [6]: np_homo = np.zeros(n, dtype=[('a', np.double), ('b', np.double)])

In [7]: %timeit np_homo['a'] + 1
1000 loops, best of 3: 964 µs per loop
In [4]: %timeit dict_homo['a']+=1
The slowest run took 5.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 530 µs per loop

In [5]: np_homo = np.zeros(n, dtype=[('a', np.double), ('b', np.double)])

In [6]: %timeit np_homo['a'] + 1
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 974 µs per loop

Note also that using the record array means that the numbers are not contiguous, so some slowdown is expected.

I tested on numpy 1.13.1. If this is an issue on a later version, please open a separate issue!

@tdpetrou

This comment has been minimized.

Show comment
Hide comment
@tdpetrou

tdpetrou Oct 27, 2017

Thanks for the response @mhvk. I'm on 1.13.3

Thanks for the response @mhvk. I'm on 1.13.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment