Shallow copy is slow, seems to spend a lot of time in repr #1694

ivirshup · 2022-09-13T22:02:34Z

Version of Awkward Array

1.10.0rc3

Description and code to reproduce

During the PR adding awkward array support into AnnData, I noticed the test suite was taking much longer than usual. Trying to track down the culprit, I found the line where we make shallow copies of an awkward array. Under this largely seemed to be code related to making repr strings.

Checking if this is just us:

Making a minimal reproducible example:

import awkward._v2 as ak
import numpy as np
from copy import copy, deepcopy

counts = np.random.randint(1, 1000, size=(10000))
data = np.random.randint(0, 100, size=counts.sum())
awk = ak.unflatten(data, counts)


%%prun -s cumtime
for _ in range(1000):
    copy(awk)

         3544003 function calls (3382003 primitive calls) in 3.619 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.619    3.619 {built-in method builtins.exec}
        1    0.002    0.002    3.619    3.619 <string>:1(<module>)
     1000    0.003    0.000    3.617    0.004 copy.py:66(copy)
     5000    0.041    0.000    3.264    0.001 _util.py:210(__init__)
     1000    0.002    0.000    3.078    0.003 {method '__reduce_ex__' of 'object' objects}
     1000    0.003    0.000    3.076    0.003 highlevel.py:1377(__getstate__)
28000/24000    0.056    0.000    2.881    0.000 _util.py:138(format_argument)
     4000    0.044    0.000    2.026    0.001 highlevel.py:1177(_repr)
     4000    0.005    0.000    1.699    0.000 _prettyprint.py:209(valuestr)
78000/26000    0.186    0.000    1.697    0.000 highlevel.py:1113(__setattr__)
40000/4000    0.182    0.000    1.692    0.000 _prettyprint.py:69(valuestr_horiz)
    12000    0.028    0.000    1.595    0.000 highlevel.py:187(__init__)
     1000    0.004    0.000    1.548    0.002 ak_to_buffers.py:9(to_buffers)
     1000    0.004    0.000    1.525    0.002 ak_packed.py:8(packed)
     2000    0.007    0.000    1.493    0.001 ak_to_layout.py:11(to_layout)
    78000    1.398    0.000    1.398    0.000 {built-in method builtins.dir}
    36000    0.051    0.000    1.318    0.000 _prettyprint.py:41(get_at)
    10000    0.017    0.000    1.070    0.000 _util.py:770(wrap)

I would expect making a shallow copy of an awkward array to be quite fast. I also wouldn't expect pretty printing methods to be called.

The text was updated successfully, but these errors were encountered:

jpivarski · 2022-09-13T22:55:19Z

There's a long story here:

We will be handling CuPy (GPU) arrays as well as NumPy, and doing CuPy operations efficiently means that they need to be dispatched asynchronously (GPU backend, eagerness, and errors #1321).
But if a GPU function raises an error, the stack trace at the time the error is caught will not point to where the error actually happens, which would seriously confuse people when they're debugging.
So for this reason, and because users have found errors that point to Awkward's internals to be unhelpful anyway, error messages report the ak.* function that the error occurred in (even if, in the GPU case, that function has already formally returned).

That is, error messages that used to look like this:

>>> ak.Array([[1, 2, 3], [], [4, 5]])[:, 0]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jpivarski/irishep/awkward/awkward/highlevel.py", line 991, in __getitem__
    tmp = ak._util.wrap(self.layout[where], self._behavior)
ValueError: in ListOffsetArray64 attempting to get 0, index out of range

(https://github.com/scikit-hep/awkward-1.0/blob/1.10.0rc3/src/cpu-kernels/awkward_NumpyArray_getitem_next_at.cpp#L21)

now look like this:

>>> ak._v2.Array([[1, 2, 3], [], [4, 5]])[:, 0]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/highlevel.py", line 959, in __getitem__
    out = self._layout[where]
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 488, in __getitem__
    return self._getitem(where)
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 527, in _getitem
    out = next._getitem_next(nextwhere[0], nextwhere[1:], None)
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/regulararray.py", line 426, in _getitem_next
    nextcontent._getitem_next(nexthead, nexttail, advanced),
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/listarray.py", line 585, in _getitem_next
    self._handle_error(
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 225, in _handle_error
    raise ak._v2._util.indexerror(self, slicer, message)
IndexError: while attempting to slice (from <stdin>, line 1)

    <Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>

with

    (:, 0)

at inner ListArray of length 3, using sub-slice 0.

Error details: index out of range while attempting to get index 0 (in compiled code: https://github.com/scikit-hep/awkward-1.0/blob/1.10.0rc3/src/cpu-kernels/awkward_NumpyArray_getitem_next_at.cpp#L21).

This means putting error context information in a list, and the right context gets attached to a GPU function when it fails asynchronously.
We don't want that list of error contexts to contain references to all the array arguments to each function, since the memory use would explode.
The pretty-printed repr of the arrays uses far less memory, but the trade-off is memory use of storing the array versus string formatting computation time, even if there are no errors to report.
We decided in favor of making string representations of array arguments, since memory issues are more insidious (harder to debug) and ak.* functions aren't supposed to be called large numbers of times anyway—that is, the number of high-level ak.* calls should be ≪ the number of entries in an array.

But...

What's going on here shouldn't be related to that. Calling __copy__ on an ak.Array isn't a high-level ak.* function. It's a low-level, public operation, like layout manipulations, which should not be suffering from this penalty.

So what's going on here? It looks like our ak._v2.Array.__copy__ isn't even implemented:

awkward/src/awkward/_v2/highlevel.py

Lines 1417 to 1427 in 03dd965

    
               self.layout = layout 
        
               self.behavior = behavior 
        
               def __copy__(self): 
        
                   return Array(self._layout, behavior=self._behavior) 
        
               def __deepcopy__(self, memo): 
        
                   return Array(self._layout.deep_copy(), behavior=self._behavior) 
        
           def __bool__(self): 
        
               if len(self) == 1:

Note that the indentation of __copy__ and __deepcopy__ is not at class-level, but is nested within the function above it (__setstate__).

There are a lot of expensive things in your profile trace, not just the string formatting. It's also calling ak.packed, which is a high-level function, and that's unnecessarily collapsing your array for storage, because it's part of pickling (__reduce_ex__). Maybe the copy module pickles and unpickles an object to copy it when it can't find a __copy__ method? (I would have thought it would do that for deepcopy, not copy.)

So to start with, I can fix the indentation of __copy__ and __deepcopy__. This implementation does not trigger the error-handling routines (since it never touches any buffers, it can't raise an error asynchronously). But it seems pretty disturbing to me that "forgetting to implement __copy__" would take this detour. Maybe we should make the error-handling even more complicated and choose the "reference array, make error message only when there's an error" case when nplike is NumPy, or otherwise an eager evaluation, rather than a delayed one?

jpivarski · 2022-09-13T23:04:48Z

I think that #1695 will directly address your issue, but the system is too close to the brink of this happening, if only missing a __copy__ method would have triggered the error-handling routines that you see.

Can you check it? I'll include it in 1.10.0rc4.

agoose77 · 2022-09-14T09:55:38Z

@jpivarski we will fix this class of error with e.g. #1657 to add both type hints and NotImplementedError base implementations for these kinds of methods.

ivirshup · 2022-09-14T11:29:19Z

Can you check it? I'll include it in 1.10.0rc4.

I can try. Do PRs here generate builds I can use?

ivirshup · 2022-09-14T11:31:53Z

Maybe the copy module pickles and unpickles an object to copy it when it can't find a copy method? (I would have thought it would do that for deepcopy, not copy.)

I don't think this is happening (or at least not exactly) since I wasn't seeing significant allocation during the shallow copy. Definitely not a full copy's worth

ivirshup · 2022-09-14T12:33:41Z

It took a few tries to install, but I think I got it working. Copy speeds are much faster now!

However, would you expect copy(awk) to be doing much more than copy(awk.layout)? copy(awk.layout) seems ~20x faster.

In [3]: %%prun -s cumtime
   ...: for _ in range(1000):
   ...:     copy(awk)
   ...: 
         43003 function calls (37003 primitive calls) in 0.133 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.133    0.133 {built-in method builtins.exec}
        1    0.001    0.001    0.133    0.133 <string>:1(<module>)
     1000    0.001    0.000    0.132    0.000 copy.py:66(copy)
     1000    0.001    0.000    0.130    0.000 highlevel.py:1421(__copy__)
     1000    0.002    0.000    0.129    0.000 highlevel.py:188(__init__)
6000/2000    0.014    0.000    0.125    0.000 highlevel.py:1114(__setattr__)
     6000    0.103    0.000    0.103    0.000 {built-in method builtins.dir}
     1000    0.002    0.000    0.045    0.000 highlevel.py:335(behavior)
     1000    0.001    0.000    0.041    0.000 highlevel.py:306(layout)
     1000    0.002    0.000    0.005    0.000 _util.py:479(arrayclass)
4000/2000    0.002    0.000    0.003    0.000 content.py:1203(purelist_parameter)
     2000    0.001    0.000    0.002    0.000 listoffsetform.py:126(purelist_parameter)
     3000    0.001    0.000    0.001    0.000 _util.py:82(isstr)
     6000    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
     3000    0.001    0.000    0.001    0.000 {method 'startswith' of 'str' objects}
     1000    0.000    0.000    0.000    0.000 _util.py:449(__init__)
     2000    0.000    0.000    0.000    0.000 numpyform.py:179(purelist_parameter)
     1000    0.000    0.000    0.000    0.000 content.py:72(parameter)
     1000    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 
In [4]: %%prun -s cumtime
   ...: for _ in range(1000):
   ...:     copy(awk.layout)
   ...: 
         14003 function calls in 0.007 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 {built-in method builtins.exec}
        1    0.001    0.001    0.007    0.007 <string>:1(<module>)
     1000    0.001    0.000    0.006    0.000 copy.py:66(copy)
     1000    0.000    0.000    0.004    0.000 listoffsetarray.py:33(__copy__)
     1000    0.001    0.000    0.004    0.000 listoffsetarray.py:17(copy)
     1000    0.002    0.000    0.003    0.000 listoffsetarray.py:44(__init__)
     1000    0.001    0.000    0.001    0.000 content.py:29(_init)
     3000    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
     1000    0.000    0.000    0.000    0.000 index.py:102(length)
     1000    0.000    0.000    0.000    0.000 highlevel.py:262(layout)
     1000    0.000    0.000    0.000    0.000 index.py:84(nplike)
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
     1000    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

agoose77 · 2022-09-14T13:24:53Z

@ivirshup predominantly this is caused by a x in dir(...) check; we'll clean that up.

jpivarski · 2022-09-14T13:26:21Z

The high-level ak.Array.__copy__ doesn't do much:

awkward/src/awkward/_v2/highlevel.py

Lines 1421 to 1422 in b5545a8

    
           def __copy__(self): 
        
               return Array(self._layout, behavior=self._behavior)

compared to a typical Content.__copy__ (each subclass is different, this is ListOffsetArray):

awkward/src/awkward/_v2/contents/listoffsetarray.py

Lines 17 to 34 in b5545a8

    
           def copy( 
        
               self, 
        
               offsets=unset, 
        
               content=unset, 
        
               identifier=unset, 
        
               parameters=unset, 
        
               nplike=unset, 
        
           ): 
        
               return ListOffsetArray( 
        
                   self._offsets if offsets is unset else offsets, 
        
                   self._content if content is unset else content, 
        
                   self._identifier if identifier is unset else identifier, 
        
                   self._parameters if parameters is unset else parameters, 
        
                   self._nplike if nplike is unset else nplike, 
        
               ) 
        
           def __copy__(self): 
        
               return self.copy()

But the ak.Array.__copy__ does go through the ak.Array constructor. It could be more lightweight by bypassing that:

    def __copy__(self): 
        cls = type(self)
        out = cls.__new__(cls)
        out.__dict__.update(self.__dict__)
        return out

It would skip the usual checks that ensure that self._layout is a layout, sets the __class__ if there is a behavior overload, etc. These things should be pretty lightweight already, but in the particular case of a copy, we know exactly what to construct and could skip these checks.

jpivarski · 2022-09-14T13:27:52Z

@ivirshup predominantly this is caused by a x in dir(...) check; we'll clean that up.

Is the x in dir(...) invoked when looking at underscored attributes, like self._layout? Maybe skipping that would shave off more milliseconds, not just here in copy, but everywhere.

ivirshup added the performance Works, but not fast enough or uses too much memory label Sep 13, 2022

jpivarski linked a pull request Sep 13, 2022 that will close this issue

fix: ensure that __copy__ and __deepcopy__ are enabled. #1695

Merged

jpivarski closed this as completed in #1695 Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shallow copy is slow, seems to spend a lot of time in repr #1694

Shallow copy is slow, seems to spend a lot of time in repr #1694

ivirshup commented Sep 13, 2022

jpivarski commented Sep 13, 2022

jpivarski commented Sep 13, 2022

agoose77 commented Sep 14, 2022

ivirshup commented Sep 14, 2022

ivirshup commented Sep 14, 2022

ivirshup commented Sep 14, 2022 •

edited

agoose77 commented Sep 14, 2022

jpivarski commented Sep 14, 2022

jpivarski commented Sep 14, 2022

Shallow copy is slow, seems to spend a lot of time in repr #1694

Shallow copy is slow, seems to spend a lot of time in repr #1694

Comments

ivirshup commented Sep 13, 2022

Version of Awkward Array

Description and code to reproduce

jpivarski commented Sep 13, 2022

But...

jpivarski commented Sep 13, 2022

agoose77 commented Sep 14, 2022

ivirshup commented Sep 14, 2022

ivirshup commented Sep 14, 2022

ivirshup commented Sep 14, 2022 • edited

agoose77 commented Sep 14, 2022

jpivarski commented Sep 14, 2022

jpivarski commented Sep 14, 2022

ivirshup commented Sep 14, 2022 •

edited