## Benchmarking numpy memory map

In [1]:
import numpy as np
import os

## Data

In [2]:
output_folder = '/Users/Edu/data/yass-benchmarks'

In [3]:
wide_data = np.random.rand(50, 1000000)
long_data = np.random.rand(1000000, 50)

In [4]:
path_to_wide = os.path.join(output_folder, 'wide.bin')
wide_data.tofile(path_to_wide)

path_to_long = os.path.join(output_folder, 'long.bin')
long_data.tofile(path_to_long)

# Read

In [7]:
long_map = np.memmap(path_to_long, dtype='float64').reshape((50, 1000000))
print(long_map.shape)

wide_map = np.memmap(path_to_wide, dtype='float64').reshape((1000000, 50))
print(wide_map.shape)

(50, 1000000)
(1000000, 50)


In [24]:
%%timeit
long_map[:, 500000:600000]

The slowest run took 9.22 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.63 µs per loop


In [25]:
%%timeit
long_data[:, 500000:600000]

The slowest run took 10.36 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.6 µs per loop


In [27]:
%%timeit
wide_map[500000:600000, :]

The slowest run took 8.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.56 µs per loop


In [28]:
%%timeit
wide_data[500000:600000, :]

The slowest run took 11.90 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.63 µs per loop


In [29]:
%%timeit
long_map[20:30, :]

The slowest run took 7.99 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.66 µs per loop


In [30]:
%%timeit
long_data[20:30, :]

The slowest run took 7.65 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.62 µs per loop


In [31]:
%%timeit
wide_map[:, 20:30]

The slowest run took 6.92 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.62 µs per loop


In [32]:
%%timeit
wide_data[:, 20:30]

The slowest run took 7.93 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.53 µs per loop


In [8]:
%%timeit
wide_map[500000:600000, range(50)]

10 loops, best of 3: 33.9 ms per loop


In [9]:
%%timeit
wide_map[500000:600000, :]

The slowest run took 7.00 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.69 µs per loop


## Write

In [17]:
path_to_write_long = os.path.join(output_folder, 'big_long.bin')
path_to_write_wide = os.path.join(output_folder, 'big_wide.bin')

In [18]:
write_wide = np.memmap(path_to_write_long, 'int64', 'w+', shape=(500, 2000000))
write_long = np.memmap(path_to_write_wide, 'int64', 'w+', shape=(2000000, 500))

In [13]:
%%timeit
write_long[:, 0] = write_long[:, 0] + 1

10 loops, best of 3: 19.6 ms per loop


In [14]:
%%timeit
write_long[0, :] = write_long[0, :] + 1

The slowest run took 9.99 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.6 µs per loop


In [15]:
%%timeit
write_wide[0, :] = write_wide[0, :] + 1

1000 loops, best of 3: 788 µs per loop


In [16]:
%%timeit
write_wide[:, 0] = write_wide[:, 0] + 1 

The slowest run took 11.37 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 12.6 µs per loop
