Skip to content

BUG: Incorrect results from sys.getsizeof() for multi-dimensional arrays #20707

@pya

Description

@pya

Describe the issue:

While sys.getsizeof() seems to work correctly for one-dimensional arrays, it gives, in my opinion, incorrect results for multi-dimensional arrays.

import sys
import numpy as np
a = np.arange(10).reshape(2, 5)

Bytes as reported by NumPy:

>>> a.nbytes
80

Size as Python sees it:

>>> sys.getsizeof(a)
120

The number of bytes is the same for the one-dimensional version:

a.flatten().nbytes
80

But sys.getsizeof() gives a different result:

sys.getsizeof(a.flatten())
184

Just to make sure it is not the "fault" of flatten():

sys.getsizeof(np.arange(10))
184

There seems to be consistent 104 byte overhead of sys.getsizeof():

for size in [10, 100, 1_000, 10_000]:
    arr = np.arange(size)
    diff = sys.getsizeof(arr) - arr.nbytes
    print(f'{size: 6d}: {diff}')

Output:

    10: 104
   100: 104
  1000: 104
 10000: 104

This holds true for all dtypes:

obj_sizes = set()
count = 0
for name, obj in vars(np).items():
    if type(obj) is type and np.number in obj.mro():
        for size in [10, 100, 1_000, 10_000]:
            arr = np.arange(size, dtype=obj)
            diff = sys.getsizeof(arr) - arr.nbytes
            obj_sizes.add(diff)
        count += 1
obj_sizes, count

({104}, 52)

This looks different for a 2d array:

obj_sizes = set()
count = 0
size = 10_000
for name, obj in vars(np).items():
    if type(obj) is type and np.number in obj.mro():
        arr = np.arange(size, dtype=obj).reshape(100, 100)
        diff = sys.getsizeof(arr) - arr.nbytes
        obj_sizes.add(diff)
        count += 1
print(f'found sizes: {obj_sizes} for {count} dtypes')
found sizes: {-9880, -19880, -319880, -159880, -79880, -39880} for 52 dtypes

Reproduce the code example:

import sys
import numpy as np

a = np.arange(10).reshape(2, 5)

print('Array:')
print(a)
print('Bytes as reported by NumPy:', a.nbytes)

print('Size as Python sees it:', sys.getsizeof(a))

print('The number of bytes is the same for the one-dimensional version:', a.flatten().nbytes)
print('But `sys.getsizeof()` gives a different result:', sys.getsizeof(a.flatten()))
print('Just to make sure it is not the "fault" of `flatten()`:', sys.getsizeof(np.arange(10)))


print('There seems to be consistent 104 byte overhead of `sys.getsizeof():')
for size in [10, 100, 1_000, 10_000]:
    arr = np.arange(size)
    diff = sys.getsizeof(arr) - arr.nbytes
    print(f'{size: 6d}: {diff}')


print('This holds true for all dtypes:')
obj_sizes = set()
count = 0
for name, obj in vars(np).items():
    if type(obj) is type and np.number in obj.mro():
        for size in [10, 100, 1_000, 10_000]:
            arr = np.arange(size, dtype=obj)
            diff = sys.getsizeof(arr) - arr.nbytes
            obj_sizes.add(diff)
        count += 1

print('This looks different for a 2d array:')
print(f'found sizes: {obj_sizes} for {count} dtypes')
obj_sizes = set()
count = 0
size = 10_000
for name, obj in vars(np).items():
    if type(obj) is type and np.number in obj.mro():
        arr = np.arange(size, dtype=obj).reshape(100, 100)
        diff = sys.getsizeof(arr) - arr.nbytes
        obj_sizes.add(diff)
        count += 1
print(f'found sizes: {obj_sizes} for {count} dtypes')

Error message:

See above.

NumPy/Python version information:

>>> sys.version
'3.10.1 | packaged by conda-forge | (main, Dec 22 2021, 01:39:14) [Clang 11.1.0 ]'
>>> np.__version__
'1.21.5'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions