Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shallow copy is slow, seems to spend a lot of time in repr #1694

Closed
ivirshup opened this issue Sep 13, 2022 · 9 comments · Fixed by #1695
Closed

Shallow copy is slow, seems to spend a lot of time in repr #1694

ivirshup opened this issue Sep 13, 2022 · 9 comments · Fixed by #1695
Labels
performance Works, but not fast enough or uses too much memory

Comments

@ivirshup
Copy link

Version of Awkward Array

1.10.0rc3

Description and code to reproduce

During the PR adding awkward array support into AnnData, I noticed the test suite was taking much longer than usual. Trying to track down the culprit, I found the line where we make shallow copies of an awkward array. Under this largely seemed to be code related to making repr strings.

Checking if this is just us:

Making a minimal reproducible example:

import awkward._v2 as ak
import numpy as np
from copy import copy, deepcopy

counts = np.random.randint(1, 1000, size=(10000))
data = np.random.randint(0, 100, size=counts.sum())
awk = ak.unflatten(data, counts)


%%prun -s cumtime
for _ in range(1000):
    copy(awk)
         3544003 function calls (3382003 primitive calls) in 3.619 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.619    3.619 {built-in method builtins.exec}
        1    0.002    0.002    3.619    3.619 <string>:1(<module>)
     1000    0.003    0.000    3.617    0.004 copy.py:66(copy)
     5000    0.041    0.000    3.264    0.001 _util.py:210(__init__)
     1000    0.002    0.000    3.078    0.003 {method '__reduce_ex__' of 'object' objects}
     1000    0.003    0.000    3.076    0.003 highlevel.py:1377(__getstate__)
28000/24000    0.056    0.000    2.881    0.000 _util.py:138(format_argument)
     4000    0.044    0.000    2.026    0.001 highlevel.py:1177(_repr)
     4000    0.005    0.000    1.699    0.000 _prettyprint.py:209(valuestr)
78000/26000    0.186    0.000    1.697    0.000 highlevel.py:1113(__setattr__)
40000/4000    0.182    0.000    1.692    0.000 _prettyprint.py:69(valuestr_horiz)
    12000    0.028    0.000    1.595    0.000 highlevel.py:187(__init__)
     1000    0.004    0.000    1.548    0.002 ak_to_buffers.py:9(to_buffers)
     1000    0.004    0.000    1.525    0.002 ak_packed.py:8(packed)
     2000    0.007    0.000    1.493    0.001 ak_to_layout.py:11(to_layout)
    78000    1.398    0.000    1.398    0.000 {built-in method builtins.dir}
    36000    0.051    0.000    1.318    0.000 _prettyprint.py:41(get_at)
    10000    0.017    0.000    1.070    0.000 _util.py:770(wrap)

I would expect making a shallow copy of an awkward array to be quite fast. I also wouldn't expect pretty printing methods to be called.

@ivirshup ivirshup added the performance Works, but not fast enough or uses too much memory label Sep 13, 2022
@jpivarski
Copy link
Member

There's a long story here:

  1. We will be handling CuPy (GPU) arrays as well as NumPy, and doing CuPy operations efficiently means that they need to be dispatched asynchronously (GPU backend, eagerness, and errors #1321).
  2. But if a GPU function raises an error, the stack trace at the time the error is caught will not point to where the error actually happens, which would seriously confuse people when they're debugging.
  3. So for this reason, and because users have found errors that point to Awkward's internals to be unhelpful anyway, error messages report the ak.* function that the error occurred in (even if, in the GPU case, that function has already formally returned).

That is, error messages that used to look like this:

>>> ak.Array([[1, 2, 3], [], [4, 5]])[:, 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jpivarski/irishep/awkward/awkward/highlevel.py", line 991, in __getitem__
    tmp = ak._util.wrap(self.layout[where], self._behavior)
ValueError: in ListOffsetArray64 attempting to get 0, index out of range

(https://github.com/scikit-hep/awkward-1.0/blob/1.10.0rc3/src/cpu-kernels/awkward_NumpyArray_getitem_next_at.cpp#L21)

now look like this:

>>> ak._v2.Array([[1, 2, 3], [], [4, 5]])[:, 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/highlevel.py", line 959, in __getitem__
    out = self._layout[where]
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 488, in __getitem__
    return self._getitem(where)
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 527, in _getitem
    out = next._getitem_next(nextwhere[0], nextwhere[1:], None)
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/regulararray.py", line 426, in _getitem_next
    nextcontent._getitem_next(nexthead, nexttail, advanced),
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/listarray.py", line 585, in _getitem_next
    self._handle_error(
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/contents/content.py", line 225, in _handle_error
    raise ak._v2._util.indexerror(self, slicer, message)
IndexError: while attempting to slice (from <stdin>, line 1)

    <Array [[1, 2, 3], [], [4, 5]] type='3 * var * int64'>

with

    (:, 0)

at inner ListArray of length 3, using sub-slice 0.

Error details: index out of range while attempting to get index 0 (in compiled code: https://github.com/scikit-hep/awkward-1.0/blob/1.10.0rc3/src/cpu-kernels/awkward_NumpyArray_getitem_next_at.cpp#L21).
  1. This means putting error context information in a list, and the right context gets attached to a GPU function when it fails asynchronously.
  2. We don't want that list of error contexts to contain references to all the array arguments to each function, since the memory use would explode.
  3. The pretty-printed repr of the arrays uses far less memory, but the trade-off is memory use of storing the array versus string formatting computation time, even if there are no errors to report.
  4. We decided in favor of making string representations of array arguments, since memory issues are more insidious (harder to debug) and ak.* functions aren't supposed to be called large numbers of times anyway—that is, the number of high-level ak.* calls should be ≪ the number of entries in an array.

But...

What's going on here shouldn't be related to that. Calling __copy__ on an ak.Array isn't a high-level ak.* function. It's a low-level, public operation, like layout manipulations, which should not be suffering from this penalty.

So what's going on here? It looks like our ak._v2.Array.__copy__ isn't even implemented:

self.layout = layout
self.behavior = behavior
def __copy__(self):
return Array(self._layout, behavior=self._behavior)
def __deepcopy__(self, memo):
return Array(self._layout.deep_copy(), behavior=self._behavior)
def __bool__(self):
if len(self) == 1:

Note that the indentation of __copy__ and __deepcopy__ is not at class-level, but is nested within the function above it (__setstate__).

There are a lot of expensive things in your profile trace, not just the string formatting. It's also calling ak.packed, which is a high-level function, and that's unnecessarily collapsing your array for storage, because it's part of pickling (__reduce_ex__). Maybe the copy module pickles and unpickles an object to copy it when it can't find a __copy__ method? (I would have thought it would do that for deepcopy, not copy.)

So to start with, I can fix the indentation of __copy__ and __deepcopy__. This implementation does not trigger the error-handling routines (since it never touches any buffers, it can't raise an error asynchronously). But it seems pretty disturbing to me that "forgetting to implement __copy__" would take this detour. Maybe we should make the error-handling even more complicated and choose the "reference array, make error message only when there's an error" case when nplike is NumPy, or otherwise an eager evaluation, rather than a delayed one?

@jpivarski jpivarski linked a pull request Sep 13, 2022 that will close this issue
@jpivarski
Copy link
Member

I think that #1695 will directly address your issue, but the system is too close to the brink of this happening, if only missing a __copy__ method would have triggered the error-handling routines that you see.

Can you check it? I'll include it in 1.10.0rc4.

@agoose77
Copy link
Collaborator

@jpivarski we will fix this class of error with e.g. #1657 to add both type hints and NotImplementedError base implementations for these kinds of methods.

@ivirshup
Copy link
Author

Can you check it? I'll include it in 1.10.0rc4.

I can try. Do PRs here generate builds I can use?

@ivirshup
Copy link
Author

Maybe the copy module pickles and unpickles an object to copy it when it can't find a copy method? (I would have thought it would do that for deepcopy, not copy.)

I don't think this is happening (or at least not exactly) since I wasn't seeing significant allocation during the shallow copy. Definitely not a full copy's worth

@ivirshup
Copy link
Author

ivirshup commented Sep 14, 2022

It took a few tries to install, but I think I got it working. Copy speeds are much faster now!

However, would you expect copy(awk) to be doing much more than copy(awk.layout)? copy(awk.layout) seems ~20x faster.

In [3]: %%prun -s cumtime
   ...: for _ in range(1000):
   ...:     copy(awk)
   ...: 
         43003 function calls (37003 primitive calls) in 0.133 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.133    0.133 {built-in method builtins.exec}
        1    0.001    0.001    0.133    0.133 <string>:1(<module>)
     1000    0.001    0.000    0.132    0.000 copy.py:66(copy)
     1000    0.001    0.000    0.130    0.000 highlevel.py:1421(__copy__)
     1000    0.002    0.000    0.129    0.000 highlevel.py:188(__init__)
6000/2000    0.014    0.000    0.125    0.000 highlevel.py:1114(__setattr__)
     6000    0.103    0.000    0.103    0.000 {built-in method builtins.dir}
     1000    0.002    0.000    0.045    0.000 highlevel.py:335(behavior)
     1000    0.001    0.000    0.041    0.000 highlevel.py:306(layout)
     1000    0.002    0.000    0.005    0.000 _util.py:479(arrayclass)
4000/2000    0.002    0.000    0.003    0.000 content.py:1203(purelist_parameter)
     2000    0.001    0.000    0.002    0.000 listoffsetform.py:126(purelist_parameter)
     3000    0.001    0.000    0.001    0.000 _util.py:82(isstr)
     6000    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
     3000    0.001    0.000    0.001    0.000 {method 'startswith' of 'str' objects}
     1000    0.000    0.000    0.000    0.000 _util.py:449(__init__)
     2000    0.000    0.000    0.000    0.000 numpyform.py:179(purelist_parameter)
     1000    0.000    0.000    0.000    0.000 content.py:72(parameter)
     1000    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 
In [4]: %%prun -s cumtime
   ...: for _ in range(1000):
   ...:     copy(awk.layout)
   ...: 
         14003 function calls in 0.007 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 {built-in method builtins.exec}
        1    0.001    0.001    0.007    0.007 <string>:1(<module>)
     1000    0.001    0.000    0.006    0.000 copy.py:66(copy)
     1000    0.000    0.000    0.004    0.000 listoffsetarray.py:33(__copy__)
     1000    0.001    0.000    0.004    0.000 listoffsetarray.py:17(copy)
     1000    0.002    0.000    0.003    0.000 listoffsetarray.py:44(__init__)
     1000    0.001    0.000    0.001    0.000 content.py:29(_init)
     3000    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
     1000    0.000    0.000    0.000    0.000 index.py:102(length)
     1000    0.000    0.000    0.000    0.000 highlevel.py:262(layout)
     1000    0.000    0.000    0.000    0.000 index.py:84(nplike)
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
     1000    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

@agoose77
Copy link
Collaborator

@ivirshup predominantly this is caused by a x in dir(...) check; we'll clean that up.

@jpivarski
Copy link
Member

The high-level ak.Array.__copy__ doesn't do much:

def __copy__(self):
return Array(self._layout, behavior=self._behavior)

compared to a typical Content.__copy__ (each subclass is different, this is ListOffsetArray):

def copy(
self,
offsets=unset,
content=unset,
identifier=unset,
parameters=unset,
nplike=unset,
):
return ListOffsetArray(
self._offsets if offsets is unset else offsets,
self._content if content is unset else content,
self._identifier if identifier is unset else identifier,
self._parameters if parameters is unset else parameters,
self._nplike if nplike is unset else nplike,
)
def __copy__(self):
return self.copy()

But the ak.Array.__copy__ does go through the ak.Array constructor. It could be more lightweight by bypassing that:

    def __copy__(self): 
        cls = type(self)
        out = cls.__new__(cls)
        out.__dict__.update(self.__dict__)
        return out

It would skip the usual checks that ensure that self._layout is a layout, sets the __class__ if there is a behavior overload, etc. These things should be pretty lightweight already, but in the particular case of a copy, we know exactly what to construct and could skip these checks.

@jpivarski
Copy link
Member

@ivirshup predominantly this is caused by a x in dir(...) check; we'll clean that up.

Is the x in dir(...) invoked when looking at underscored attributes, like self._layout? Maybe skipping that would shave off more milliseconds, not just here in copy, but everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Works, but not fast enough or uses too much memory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants