Skip to content

A need for contiguity-based axis-order optimization in tensordot #11940

Open
@rsokl

Description

@rsokl

The ordering of axes fed to tensordot can have a massive (order of magnitude) impact on its efficiency, based on the memory layout of the array(s) being summed:

>>> import numpy as np
>>> x = np.random.rand(100, 100, 100)
>>> %%timeit
... np.tensordot(x, x, axes=((0, 1, 2), (0, 1, 2)))  
151 µs ± 6.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %%timeit
... np.tensordot(x, x, axes=((1, 2, 0), (1, 2, 0))) 
7.9 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Moving x's axis leads to a swap in timing:

>>>  xt = np.moveaxis(x, -1, 0)
>>> %%timeit
... np.tensordot(xt, xt, axes=((0, 1, 2), (0, 1, 2)))  
10.8 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %%timeit
... np.tensordot(xt, xt, axes=((1, 2, 0), (1, 2, 0))) 
146 µs ± 4.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As suggested by @eric-wieser, tensordot would benefit from axis-ordering based on memory contiguity to help guard against these massive slow downs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions