Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Improve performance of tril_indices and triu_indices #18176

Merged
merged 4 commits into from
Jan 25, 2021

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Jan 15, 2021

np.nonzero seems slow compared to other methods. Use np.indices and np.broadcast_to to speed it up.

Closes #18153.

Performance comparison:

import numpy as np


def tril_indices(n, k=0, m=None):
    tri = np.tri(n, m, k=k, dtype=bool)

    return tuple(np.broadcast_to(inds, tri.shape)[tri]
                 for inds in np.indices(tri.shape, sparse=True))


def triu_indices(n, k=0, m=None):
    tri = ~np.tri(n, m, k=k - 1, dtype=bool)

    return tuple(np.broadcast_to(inds, tri.shape)[tri]
                 for inds in np.indices(tri.shape, sparse=True))

# Original numpy implementation with nonzero:
%timeit np.tril_indices(500)
2.14 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# New implementation:
%timeit tril_indices(500)
684 µs ± 60.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

nonzero is slow. Use np.indices and np.broadcast_to to speed it up.
@Qiyu8
Copy link
Member

Qiyu8 commented Jan 18, 2021

you should compare new implements in tril_indices and triu_indices with nonzero, Can you write a benchmark in bench_core.py?

@Illviljan
Copy link
Contributor Author

@Qiyu8 Added tests to bench_core.py. I don't know why it crashes though, any ideas?

This one seems related: #18183

@Qiyu8
Copy link
Member

Qiyu8 commented Jan 19, 2021

I don't think #18183 is related, can you add your crash logs here? From the CI perspective, The only failure is TestMatmul.test_vector_matrix_values.

@Illviljan
Copy link
Contributor Author

Oh sorry I didn't mean that the pull request was related to the CI issue. But that the solution in #18183 might make nonzero on par with my solution.

I don't get how TestMatmul.test_vector_matrix_values can crash when I add some performance tests. I'll restart and see if it's just some CI hickup.

@Illviljan Illviljan closed this Jan 19, 2021
@Illviljan Illviljan reopened this Jan 19, 2021
@Qiyu8
Copy link
Member

Qiyu8 commented Jan 20, 2021

Can you provide the benchmark result by running python runtests.py --bench-compare master bench_core.Core?

@Illviljan
Copy link
Contributor Author

@Qiyu8 I can't figure out how to do it. Do you want to run it yourself or act IT-support?

Anaconda prompt:

(base) C:\Users\J.W\Documents\GitHub\numpy>python runtests.py --bench-compare master bench_core.Core
Traceback (most recent call last):
  File "C:\Users\J.W\Documents\GitHub\numpy\numpy\__init__.py", line 126, in <module>
    from numpy.__config__ import show as show_config
ModuleNotFoundError: No module named 'numpy.__config__'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "runtests.py", line 642, in <module>
    main(argv=sys.argv[1:])
  File "runtests.py", line 172, in main
    _temp = __import__(PROJECT_MODULE)
  File "C:\Users\J.W\Documents\GitHub\numpy\numpy\__init__.py", line 131, in <module>
    raise ImportError(msg) from e
ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.

Ok, go back one level then:

(base) C:\Users\J.W\Documents\GitHub\numpy\numpy>cd C:\Users\J.W\Documents\GitHub

(base) C:\Users\J.W\Documents\GitHub>python runtests.py --bench-compare master bench_core.Core
python: can't open file 'runtests.py': [Errno 2] No such file or directory

@Qiyu8
Copy link
Member

Qiyu8 commented Jan 21, 2021

you can read the benchmark guide here.

@Illviljan
Copy link
Contributor Author

@Qiyu8 It's still not working for me, got a little further though:

  • python runtests.py --bench -Works
  • python runtests.py --bench-compare master bench_core - does not work.
(base) C:\Users\J.W\Documents\GitHub\numpy>python runtests.py --bench-compare master bench_core
Traceback (most recent call last):
  File "C:\Users\J.W\Documents\GitHub\numpy\numpy\__init__.py", line 126, in <module>
    from numpy.__config__ import show as show_config
ModuleNotFoundError: No module named 'numpy.__config__'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "runtests.py", line 642, in <module>
    main(argv=sys.argv[1:])
  File "runtests.py", line 172, in main
    _temp = __import__(PROJECT_MODULE)
  File "C:\Users\J.W\Documents\GitHub\numpy\numpy\__init__.py", line 131, in <module>
    raise ImportError(msg) from e
ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.

Also virtualenv doesn't seem to be the recommended way anymore:

Note

If you are using Python 3.3 or newer, the venv module is the preferred way to create and manage virtual environments. venv is included in the Python standard library and requires no additional installation. If you are using venv, you may skip this section.

https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#installing-virtualenv

@Illviljan
Copy link
Contributor Author

@Qiyu8 I never got --bench-compare to work but here are results using --bench instead. About the same ~2x improvement as I saw using timeit.

Illviljan-faster_tril_indices:

(base) C:\Users\J.W\Documents\GitHub\numpy>python runtests.py --bench bench_core.Core
Building, see build.log...
Build OK
· No executable found for python 3.7
· Discovering benchmarks
· Running 28 total benchmarks (1 commits * 1 environments * 28 benchmarks)
[  0.00%] ·· Benchmarking existing-pyc__users_j.w_anaconda3_python.exe
[  1.79%] ··· Running (bench_core.Core.time_arange_100--)............................
[ 51.79%] ··· bench_core.Core.time_arange_100                       1.06±0.04μs
[ 53.57%] ··· bench_core.Core.time_array_1                             479±40ns
[ 55.36%] ··· bench_core.Core.time_array_empty                       1.55±0.2μs
[ 57.14%] ··· bench_core.Core.time_array_float64_l1000                 54.7±2μs
[ 58.93%] ··· bench_core.Core.time_array_float_l1000                   68.3±5μs
[ 60.71%] ··· bench_core.Core.time_array_float_l1000_dtype             48.6±3μs
[ 62.50%] ··· bench_core.Core.time_array_int_l1000                     75.3±5μs
[ 64.29%] ··· bench_core.Core.time_array_l                          3.61±0.07μs
[ 66.07%] ··· bench_core.Core.time_array_l1                         1.35±0.06μs
[ 67.86%] ··· bench_core.Core.time_array_l100                        11.4±0.3μs
[ 69.64%] ··· bench_core.Core.time_array_l_view                      3.69±0.1μs
[ 71.43%] ··· bench_core.Core.time_diag_l100                         21.1±0.8μs
[ 73.21%] ··· bench_core.Core.time_diagflat_l100                     32.4±0.7μs
[ 75.00%] ··· bench_core.Core.time_diagflat_l50_l50                    34.4±2μs
[ 76.79%] ··· bench_core.Core.time_dstack_l                          11.0±0.4μs
[ 78.57%] ··· bench_core.Core.time_empty_100                        1.09±0.03μs
[ 80.36%] ··· bench_core.Core.time_eye_100                           8.07±0.1μs
[ 82.14%] ··· bench_core.Core.time_eye_3000                          5.39±0.3ms
[ 83.93%] ··· bench_core.Core.time_hstack_l                          8.24±0.5μs
[ 85.71%] ··· bench_core.Core.time_identity_100                      10.1±0.4μs
[ 87.50%] ··· bench_core.Core.time_identity_3000                     5.49±0.3ms
[ 89.29%] ··· bench_core.Core.time_ones_100                          3.93±0.2μs
[ 91.07%] ··· bench_core.Core.time_tril_indices_500                  1.11±0.1ms
[ 92.86%] ··· bench_core.Core.time_tril_l10x10                         21.3±1μs
[ 94.64%] ··· bench_core.Core.time_triu_indices_500                 1.20±0.05ms
[ 96.43%] ··· bench_core.Core.time_triu_l10x10                         20.8±1μs
[ 98.21%] ··· bench_core.Core.time_vstack_l                          9.83±0.3μs
[100.00%] ··· bench_core.Core.time_zeros_100                           990±40ns

master:

(base) C:\Users\J.W\Documents\GitHub\numpy>python runtests.py --bench bench_core.Core
Building, see build.log...
Build OK
· No executable found for python 3.7
· Discovering benchmarks
· Running 28 total benchmarks (1 commits * 1 environments * 28 benchmarks)
[  0.00%] ·· Benchmarking existing-pyc__users_j.w_anaconda3_python.exe
[  1.79%] ··· Running (bench_core.Core.time_arange_100--)............................
[ 51.79%] ··· bench_core.Core.time_arange_100                        1.18±0.1μs
[ 53.57%] ··· bench_core.Core.time_array_1                             445±10ns
[ 55.36%] ··· bench_core.Core.time_array_empty                      1.39±0.03μs
[ 57.14%] ··· bench_core.Core.time_array_float64_l1000                 68.3±6μs
[ 58.93%] ··· bench_core.Core.time_array_float_l1000                   86.8±9μs
[ 60.71%] ··· bench_core.Core.time_array_float_l1000_dtype             43.8±2μs
[ 62.50%] ··· bench_core.Core.time_array_int_l1000                     73.8±4μs
[ 64.29%] ··· bench_core.Core.time_array_l                           3.15±0.1μs
[ 66.07%] ··· bench_core.Core.time_array_l1                         1.53±0.08μs
[ 67.86%] ··· bench_core.Core.time_array_l100                        11.3±0.8μs
[ 69.64%] ··· bench_core.Core.time_array_l_view                      3.69±0.3μs
[ 71.43%] ··· bench_core.Core.time_diag_l100                         21.4±0.5μs
[ 73.21%] ··· bench_core.Core.time_diagflat_l100                       31.9±1μs
[ 75.00%] ··· bench_core.Core.time_diagflat_l50_l50                    33.6±3μs
[ 76.79%] ··· bench_core.Core.time_dstack_l                          10.2±0.3μs
[ 78.57%] ··· bench_core.Core.time_empty_100                           923±10ns
[ 80.36%] ··· bench_core.Core.time_eye_100                           8.22±0.3μs
[ 82.14%] ··· bench_core.Core.time_eye_3000                          5.23±0.2ms
[ 83.93%] ··· bench_core.Core.time_hstack_l                          8.20±0.2μs
[ 85.71%] ··· bench_core.Core.time_identity_100                      9.65±0.3μs
[ 87.50%] ··· bench_core.Core.time_identity_3000                     5.31±0.2ms
[ 89.29%] ··· bench_core.Core.time_ones_100                          3.96±0.2μs
[ 91.07%] ··· bench_core.Core.time_tril_indices_500                  2.24±0.1ms
[ 92.86%] ··· bench_core.Core.time_tril_l10x10                         20.2±1μs
[ 94.64%] ··· bench_core.Core.time_triu_indices_500                  2.64±0.2ms
[ 96.43%] ··· bench_core.Core.time_triu_l10x10                       20.5±0.5μs
[ 98.21%] ··· bench_core.Core.time_vstack_l                          9.74±0.3μs
[100.00%] ··· bench_core.Core.time_zeros_100                           995±20ns

@Qiyu8
Copy link
Member

Qiyu8 commented Jan 25, 2021

The performance improvements looks good to me.

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one small styling suggestion.

numpy/lib/twodim_base.py Show resolved Hide resolved
@mattip mattip merged commit d9dee6a into numpy:master Jan 25, 2021
@mattip
Copy link
Member

mattip commented Jan 25, 2021

Thanks @Illviljan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of tril_indices and triu_indices
4 participants