Skip to content

Conversation

zoj613
Copy link
Contributor

@zoj613 zoj613 commented Aug 4, 2019

This PR addresses issue # #14193.

As per discussion had with @charris and @bashtage, I replaced the current slow method using SVD for factorization with the faster eigen decomposition method. The SVD method can still be accessible for legacy reasons via the keyword argument mode='legacy'. A much faster but less robust method using Cholesky factorization can be used via mode='fast'. I also added the option for the user to explicitely provide the factor matrix so it doesn't get recomputed in situations that require repeated calls to the function.

@charris charris added 01 - Enhancement component: numpy.random 09 - Backport-Candidate PRs tagged should be backported 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes labels Aug 4, 2019
@charris charris added this to the 1.17.1 release milestone Aug 4, 2019
@charris
Copy link
Member

charris commented Aug 4, 2019

I have mixed feeling about putting a backport candidate tag on this, but I expect very few folks are using the new rng at this point.

@zoj613 Enhancements like this need a post on the mailing list for discussion and a release note. The release not process has changed, so I'll let @seberg explain it.

@bashtage I haven't checked, but zipf was another distribution that could be vastly improved over the legacy version for a close to 1.

@zoj613
Copy link
Contributor Author

zoj613 commented Aug 4, 2019

@charris I tried using the mailing list given as numpy-devel by https://numpy.org/devdocs/dev/index.html but it doesn't seem to exist anymore. Which one should I use between NumPy-Discussion and Numpy-svn?

@charris
Copy link
Member

charris commented Aug 4, 2019

@zoj613 Oh, that needs fixing. The link is to a list of python.org mailing lists :) Just use numpy-discussion@python.org. You may want to subscribe to follow the discussion.

@seberg
Copy link
Member

seberg commented Aug 5, 2019

I want to move things a bit around. So if this is still settling, maybe hold off on the release notes for a bit. Right now, you would add an 14197.improvement.rst file (actually its improvements at the moment, but I wish to change it) into the changelog folder. I will move that to doc/release/upcoming_changes if everyone is fine with that.

It should then be formatted as:

``my_new_feature`` option for `my_favorite_function`
----------------------------------------------------
The ``my_new_feature`` option is now available for `my_favorite_function`.
To use it, write ``np.my_favorite_function(..., my_new_feature=True)``.

(but if you are quick, just go ahead and I will fix it up/move things around as necessary, the small issue is that right now towncrier does not really support the way we write our release notes).

@charris
Copy link
Member

charris commented Aug 6, 2019

I went ahead and posted to the list.

@zoj613
Copy link
Contributor Author

zoj613 commented Aug 6, 2019

I went ahead and posted to the list.

Thanks a lot. sorry about that. I haven't had the chance to push the required changes until now.

@seberg
Copy link
Member

seberg commented Aug 6, 2019

I think you are currently editing mtrand.pyx, but these changes should be in generator.pyx to mainly affect the new API. We could expose it to the old API, but in that case the default should probably be "legacy" (although, maybe we should not, give everyone a reason to switch over).

@zoj613
Copy link
Contributor Author

zoj613 commented Aug 6, 2019

I am getting errors while running the tests on macOS after merging the latest changes from the master branch. I get a ERROR: refguide or doctests have errors with exception ValueError: shapes (2,2) and (9,2) not aligned: 2 (dim 1) != 9 (dim 0). Can someone help me with this error?

@zoj613
Copy link
Contributor Author

zoj613 commented Aug 6, 2019

I think you are currently editing mtrand.pyx, but these changes should be in generator.pyx to mainly affect the new API. We could expose it to the old API, but in that case the default should probably be "legacy" (although, maybe we should not, give everyone a reason to switch over).

should I then leave the function in mtrand.pyx as it was before the PR?

@seberg
Copy link
Member

seberg commented Aug 6, 2019

@zoj613, probably, although you could leave the change for now and just change the mode default to "legacy". I am not sure which option I prefer, both will do.

@zoj613 zoj613 changed the title Multivariate normal speedups ENH: Multivariate normal speedups Aug 6, 2019
@eric-wieser
Copy link
Member

eric-wieser commented Aug 7, 2019

I'm not really sure I like including mode='factor' with the other modes - all the others produce the same distribution via a different routine, but mode='factor' takes a fundamentally different set of arguments.

So I think I prefer a spelling closer to the earlier revision,

  • multivariate_normal(mean, cov, factor=<factor>)
  • multivariate_normal(mean, cov, factor_method='svd')

Another option would be

  • multivariate_normal(mean, cov, factor_method='svd')
  • multivariate_normal.from_factor(mean, cov_factor)

(precedent: itertools.chain(...) vs itertools.chain.from_iterable(...))

@ilayn
Copy link
Contributor

ilayn commented Aug 7, 2019

If the covariance matrix is numerically semidefinite, cholesky will trip up. But isn't this breaking backwards compatibility where SVD was working but cholesky will fail?

@rkern
Copy link
Member

rkern commented Aug 7, 2019

I think you are currently editing mtrand.pyx, but these changes should be in generator.pyx to mainly affect the new API. We could expose it to the old API, but in that case the default should probably be "legacy" (although, maybe we should not, give everyone a reason to switch over).

With the release of 1.17.0, mtrand.pyx is now frozen. No new features, per NEP 19.

@zoj613
Copy link
Contributor Author

zoj613 commented Oct 16, 2019

@seberg
Copy link
Member

seberg commented Oct 16, 2019

Don't worry about that one, it is a heisenbug, nobody quite knows where it comes from... I suspsect a heisenbug within OpenBLAS or so (maybe in the thread scheduling, but the last time I looked I saw things that were not real).

@zoj613
Copy link
Contributor Author

zoj613 commented Oct 16, 2019

Don't worry about that one, it is a heisenbug, nobody quite knows where it comes from... I suspsect a heisenbug within OpenBLAS or so (maybe in the thread scheduling, but the last time I looked I saw things that were not real).

It seemed to pop up after I used the pytest.mark.parametrize function in the tests. Before that all checks passed fine. That function seems to be using

@pytest.mark.parametrize('func', (np.dot, np.matmul))
 @pytest.mark.parametrize('dtype', 'ifdFD')

Maybe it is a bug in pytest?

@seberg
Copy link
Member

seberg commented Oct 16, 2019

No, the RuntimeWarning about an invalid value is a bug that occurs randomly, it has nothing to do with your changes.

Copy link
Contributor

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general notes on correctness.

if method == 'cholesky':
x = np.dot(x, l)
else:
x = np.dot(x, u * np.sqrt(s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies here, should use np.conj(u.T).

@zoj613 zoj613 requested a review from eric-wieser October 18, 2019 19:06
Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

…eigh

Co-Authored-By: Eric Wieser <wieser.eric@gmail.com>
@mattip
Copy link
Member

mattip commented Nov 2, 2019

There is a merge conflict that needs to be resolved before merging

@zoj613
Copy link
Contributor Author

zoj613 commented Nov 3, 2019

There is a merge conflict that needs to be resolved before merging

I think the generator.pyx file was renamed to _generator.pyx in a recent merged commit. Will a merge with master or rebase fix this?

@eric-wieser
Copy link
Member

Either ought to work fine

@zoj613
Copy link
Contributor Author

zoj613 commented Nov 3, 2019

One of the Travis CI builds failed on python 3.6 with environment variables PPC64_LE=1 Atlas=None

Successfully installed numpy-1.18.0.dev0+c09016f

Cleaning up...

Removed build tracker '/tmp/pip-req-tracker-sujk1xwe'

+grep -v _configtest log

+grep -vE 'ld returned 1|no previously-included files matching|manifest_maker: standard file '\''-c'\'''

+grep -E 'warning\>'

+tee warnings

      warning: no files found matching 'Doc/*'

      warning: no files found matching '*.pyx' under directory 'Cython/Debugger/Tests'

      warning: no files found matching '*.pxd' under directory 'Cython/Debugger/Tests'

      warning: no files found matching '*.pxd' under directory 'Cython/Utility'

      warning: no files found matching 'pyximport/README'

    warning: no files found matching 'Doc/*'

    warning: no files found matching '*.pyx' under directory 'Cython/Debugger/Tests'

    warning: no files found matching '*.pxd' under directory 'Cython/Debugger/Tests'

    warning: no files found matching '*.pxd' under directory 'Cython/Utility'

    warning: no files found matching 'pyximport/README'

+'[' '' '!=' None ']'

++wc -l

+[[ 10 -lt 1 ]]

The command "./tools/travis-test.sh" exited with 1.

Done. Your build exited with 1.

@mattip
Copy link
Member

mattip commented Nov 3, 2019

travis run on ppc is failing since cython outputs a warning:
"warning: no files found matching ..."
This is cython 0.29.14 on Python 3.6.7. I wonder what changed? This PR did not change anything having to do with the build.

@zoj613
Copy link
Contributor Author

zoj613 commented Nov 3, 2019

travis run on ppc is failing since cython outputs a warning:
"warning: no files found matching ..."
This is cython 0.29.14 on Python 3.6.7. I wonder what changed? This PR did not change anything having to do with the build.

I also now get this error while building numpy to run tests locally:

Traceback (most recent call last):
  File "/home/zoj/numpy/build/testenv/lib/python3.6/site-packages/numpy/core/__init__.py", line 24, in <module>
    from . import multiarray
  File "/home/zoj/numpy/build/testenv/lib/python3.6/site-packages/numpy/core/multiarray.py", line 14, in <module>
    from . import overrides
  File "/home/zoj/numpy/build/testenv/lib/python3.6/site-packages/numpy/core/overrides.py", line 7, in <module>
    from numpy.core._multiarray_umath import (
ImportError: /home/zoj/numpy/build/testenv/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so: undefined symbol: cblas_sgemm

@mattip
Copy link
Member

mattip commented Nov 3, 2019

what happens when you clean up your build environment git clean -xfd?

@mattip
Copy link
Member

mattip commented Nov 3, 2019

The warning is coming from building a new cython version 0.29.14 from the source package. It seems the CI may have run during a package upload, when the binary wheels were not available.

@mattip mattip merged commit 04229de into numpy:master Nov 3, 2019
@mattip
Copy link
Member

mattip commented Nov 3, 2019

Thanks @zoj613 and @eric-wieser for the review.

@zoj613
Copy link
Contributor Author

zoj613 commented Nov 3, 2019

git clean -xfd

It runs just fine. I already tried it twice before posting the error here. Here is the output:

Removing build.log
Removing build/
Removing cythonize.dat
Removing numpy.egg-info/
Removing numpy/__config__.py
Removing numpy/__pycache__/
Removing numpy/_build_utils/__pycache__/
Removing numpy/compat/__pycache__/
Removing numpy/core/__pycache__/
Removing numpy/core/_multiarray_tests.cpython-36m-x86_64-linux-gnu.so
Removing numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so
Removing numpy/core/_operand_flag_tests.cpython-36m-x86_64-linux-gnu.so
Removing numpy/core/_rational_tests.cpython-36m-x86_64-linux-gnu.so
Removing numpy/core/_struct_ufunc_tests.cpython-36m-x86_64-linux-gnu.so
Removing numpy/core/_umath_tests.cpython-36m-x86_64-linux-gnu.so
Removing numpy/core/code_generators/__pycache__/
Removing numpy/core/include/numpy/__multiarray_api.c
Removing numpy/core/include/numpy/__multiarray_api.h
Removing numpy/core/include/numpy/__ufunc_api.c
Removing numpy/core/include/numpy/__ufunc_api.h
Removing numpy/core/include/numpy/__umath_generated.c
Removing numpy/core/include/numpy/_numpyconfig.h
Removing numpy/core/include/numpy/config.h
Removing numpy/core/include/numpy/multiarray_api.txt
Removing numpy/core/include/numpy/ufunc_api.txt
Removing numpy/core/lib/
Removing numpy/core/src/common/npy_binsearch.h
Removing numpy/core/src/common/npy_partition.h
Removing numpy/core/src/common/npy_sort.h
Removing numpy/core/src/common/templ_common.h
Removing numpy/core/src/multiarray/_multiarray_tests.c
Removing numpy/core/src/multiarray/arraytypes.c
Removing numpy/core/src/multiarray/einsum.c
Removing numpy/core/src/multiarray/lowlevel_strided_loops.c
Removing numpy/core/src/multiarray/nditer_templ.c
Removing numpy/core/src/multiarray/scalartypes.c
Removing numpy/core/src/npymath/ieee754.c
Removing numpy/core/src/npymath/npy_math_complex.c
Removing numpy/core/src/npymath/npy_math_internal.h
Removing numpy/core/src/npysort/binsearch.c
Removing numpy/core/src/npysort/heapsort.c
Removing numpy/core/src/npysort/mergesort.c
Removing numpy/core/src/npysort/quicksort.c
Removing numpy/core/src/npysort/radixsort.c
Removing numpy/core/src/npysort/selection.c
Removing numpy/core/src/npysort/timsort.c
Removing numpy/core/src/umath/_operand_flag_tests.c
Removing numpy/core/src/umath/_rational_tests.c
Removing numpy/core/src/umath/_struct_ufunc_tests.c
Removing numpy/core/src/umath/_umath_tests.c
Removing numpy/core/src/umath/clip.c
Removing numpy/core/src/umath/clip.h
Removing numpy/core/src/umath/funcs.inc
Removing numpy/core/src/umath/loops.c
Removing numpy/core/src/umath/loops.h
Removing numpy/core/src/umath/matmul.c
Removing numpy/core/src/umath/matmul.h
Removing numpy/core/src/umath/scalarmath.c
Removing numpy/core/src/umath/simd.inc
Removing numpy/distutils/__config__.py
Removing numpy/distutils/__pycache__/
Removing numpy/distutils/command/__pycache__/
Removing numpy/distutils/fcompiler/__pycache__/
Removing numpy/f2py/__pycache__/
Removing numpy/fft/__pycache__/
Removing numpy/fft/_pocketfft_internal.cpython-36m-x86_64-linux-gnu.so
Removing numpy/lib/__pycache__/
Removing numpy/linalg/__pycache__/
Removing numpy/linalg/_umath_linalg.cpython-36m-x86_64-linux-gnu.so
Removing numpy/linalg/lapack_lite.cpython-36m-x86_64-linux-gnu.so
Removing numpy/linalg/umath_linalg.c
Removing numpy/ma/__pycache__/
Removing numpy/matrixlib/__pycache__/
Removing numpy/polynomial/__pycache__/
Removing numpy/random/__pycache__/
Removing numpy/random/_bit_generator.c
Removing numpy/random/_bit_generator.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/_bounded_integers.c
Removing numpy/random/_bounded_integers.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/_common.c
Removing numpy/random/_common.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/_generator.c
Removing numpy/random/_generator.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/_mt19937.c
Removing numpy/random/_mt19937.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/_pcg64.c
Removing numpy/random/_pcg64.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/_philox.c
Removing numpy/random/_philox.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/_sfc64.c
Removing numpy/random/_sfc64.cpython-36m-x86_64-linux-gnu.so
Removing numpy/random/mtrand.c
Removing numpy/random/mtrand.cpython-36m-x86_64-linux-gnu.so
Removing numpy/testing/__pycache__/
Removing numpy/version.py
Removing tools/npy_tempita/__pycache__/

@mattip
Copy link
Member

mattip commented Nov 3, 2019

It ran fine for me locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.