### Testing (informally) the speed of saving Numpy arrays with Numpy vs Pickle.

Note that other processes on the system affect the measurements.

##### Conclusion:

For small to medium arrays, ~ (400, 400, 3) - (1080, 1080, 3), Pickle seemed to be slightly faster. (although measurement noise is relatively large, so I don't strongly trust these measurements).

For larger arrays ~ (10000, 10000, 3), Numpy.save / Numpy.load was consistently somewhat faster.

(I ran this on a Mid-2012 MacBook Pro 13, with an SSD)

**WARNING**: the file size for an array of (10_000, 10_000, 3) is ~ 300MB. If unmodified, the code below creates two files of this size.

In [1]:
import numpy as np
import pickle
import sys
import os
print('np.version.version:', np.version.version)
print('pickle.format_version:', pickle.format_version)
print('sys.version:', sys.version)

np.version.version: 1.18.1
pickle.format_version: 4.0
sys.version: 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]


In [2]:
FNAME = 'numpy_array_save_test'

def generate_array(size):
    return np.random.randint(0, 2**8-1 , size=size, dtype=np.uint8)

def numpy_save(arr):
    np.save(f'{FNAME}.npy', arr)

def numpy_load():
    with open(f"{FNAME}.npy", 'rb') as f:
        arr = np.load(f)
    return arr
    
def pickle_save(arr):
    with open(f"{FNAME}.pickle", 'wb') as f:
        pickle.dump(arr, f)

def pickle_load():
    with open(f"{FNAME}.pickle", 'rb') as f:
        arr = pickle.load(f)
    return arr

In [3]:
ARRAY_LENGTH = [400, 1080, 10_000]

for array_length in ARRAY_LENGTH:
    
    array_size = (array_length, array_length, 3)
    arr = generate_array(array_size)

    print('\nArray size:', array_size)

    print("\nnumpy_save(arr):")
    %timeit numpy_save(arr)
    
    print("\npickle_save(arr):")
    %timeit pickle_save(arr)

    print("\nnumpy_load():")
    %timeit numpy_load()
    
    print("\npickle_load():")
    %timeit pickle_load()


Array size: (400, 400, 3)

numpy_save(arr):
3.21 ms ± 404 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

pickle_save(arr):
2.65 ms ± 812 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

numpy_load():
496 µs ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

pickle_load():
148 µs ± 3.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Array size: (1080, 1080, 3)

numpy_save(arr):
22.8 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

pickle_save(arr):
11.5 ms ± 6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

numpy_load():
1.21 ms ± 13.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

pickle_load():
1.19 ms ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Array size: (10000, 10000, 3)

numpy_save(arr):
1.66 s ± 431 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

pickle_save(arr):
2.46 s ± 601 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

numpy_load():
189 ms ± 5.56 ms per

In [4]:
# Delete the files created by this notebook
os.remove(f'{FNAME}.npy') 
os.remove(f"{FNAME}.pickle") 