Open
Description
The ordering of axes fed to tensordot
can have a massive (order of magnitude) impact on its efficiency, based on the memory layout of the array(s) being summed:
>>> import numpy as np
>>> x = np.random.rand(100, 100, 100)
>>> %%timeit
... np.tensordot(x, x, axes=((0, 1, 2), (0, 1, 2)))
151 µs ± 6.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %%timeit
... np.tensordot(x, x, axes=((1, 2, 0), (1, 2, 0)))
7.9 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Moving x
's axis leads to a swap in timing:
>>> xt = np.moveaxis(x, -1, 0)
>>> %%timeit
... np.tensordot(xt, xt, axes=((0, 1, 2), (0, 1, 2)))
10.8 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %%timeit
... np.tensordot(xt, xt, axes=((1, 2, 0), (1, 2, 0)))
146 µs ± 4.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As suggested by @eric-wieser, tensordot
would benefit from axis-ordering based on memory contiguity to help guard against these massive slow downs.
Metadata
Metadata
Assignees
Labels
No labels