-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Closed
Labels
Description
Describe the issue:
While sys.getsizeof()
seems to work correctly for one-dimensional arrays, it gives, in my opinion, incorrect results for multi-dimensional arrays.
import sys
import numpy as np
a = np.arange(10).reshape(2, 5)
Bytes as reported by NumPy:
>>> a.nbytes
80
Size as Python sees it:
>>> sys.getsizeof(a)
120
The number of bytes is the same for the one-dimensional version:
a.flatten().nbytes
80
But sys.getsizeof()
gives a different result:
sys.getsizeof(a.flatten())
184
Just to make sure it is not the "fault" of flatten()
:
sys.getsizeof(np.arange(10))
184
There seems to be consistent 104 byte overhead of sys.getsizeof()
:
for size in [10, 100, 1_000, 10_000]:
arr = np.arange(size)
diff = sys.getsizeof(arr) - arr.nbytes
print(f'{size: 6d}: {diff}')
Output:
10: 104
100: 104
1000: 104
10000: 104
This holds true for all dtypes:
obj_sizes = set()
count = 0
for name, obj in vars(np).items():
if type(obj) is type and np.number in obj.mro():
for size in [10, 100, 1_000, 10_000]:
arr = np.arange(size, dtype=obj)
diff = sys.getsizeof(arr) - arr.nbytes
obj_sizes.add(diff)
count += 1
obj_sizes, count
({104}, 52)
This looks different for a 2d array:
obj_sizes = set()
count = 0
size = 10_000
for name, obj in vars(np).items():
if type(obj) is type and np.number in obj.mro():
arr = np.arange(size, dtype=obj).reshape(100, 100)
diff = sys.getsizeof(arr) - arr.nbytes
obj_sizes.add(diff)
count += 1
print(f'found sizes: {obj_sizes} for {count} dtypes')
found sizes: {-9880, -19880, -319880, -159880, -79880, -39880} for 52 dtypes
Reproduce the code example:
import sys
import numpy as np
a = np.arange(10).reshape(2, 5)
print('Array:')
print(a)
print('Bytes as reported by NumPy:', a.nbytes)
print('Size as Python sees it:', sys.getsizeof(a))
print('The number of bytes is the same for the one-dimensional version:', a.flatten().nbytes)
print('But `sys.getsizeof()` gives a different result:', sys.getsizeof(a.flatten()))
print('Just to make sure it is not the "fault" of `flatten()`:', sys.getsizeof(np.arange(10)))
print('There seems to be consistent 104 byte overhead of `sys.getsizeof():')
for size in [10, 100, 1_000, 10_000]:
arr = np.arange(size)
diff = sys.getsizeof(arr) - arr.nbytes
print(f'{size: 6d}: {diff}')
print('This holds true for all dtypes:')
obj_sizes = set()
count = 0
for name, obj in vars(np).items():
if type(obj) is type and np.number in obj.mro():
for size in [10, 100, 1_000, 10_000]:
arr = np.arange(size, dtype=obj)
diff = sys.getsizeof(arr) - arr.nbytes
obj_sizes.add(diff)
count += 1
print('This looks different for a 2d array:')
print(f'found sizes: {obj_sizes} for {count} dtypes')
obj_sizes = set()
count = 0
size = 10_000
for name, obj in vars(np).items():
if type(obj) is type and np.number in obj.mro():
arr = np.arange(size, dtype=obj).reshape(100, 100)
diff = sys.getsizeof(arr) - arr.nbytes
obj_sizes.add(diff)
count += 1
print(f'found sizes: {obj_sizes} for {count} dtypes')
Error message:
See above.
NumPy/Python version information:
>>> sys.version
'3.10.1 | packaged by conda-forge | (main, Dec 22 2021, 01:39:14) [Clang 11.1.0 ]'
>>> np.__version__
'1.21.5'