Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: linalg: implement higher-order SVD (HOSVD) for M-D tensors #14284

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

JanLuca
Copy link

@JanLuca JanLuca commented Jun 23, 2021

Reference issue

Since the pull request #12592 is open for quite some time due to license questions, I took the time to reimplement the HOSVD algorithm myself so it can be included in scipy.

What does this implement/fix?

Based on the publication by Lieven De Lathauwer et al. [1] the higher-order SVD is implemented.

[1] https://doi.org/10.1137/S0895479896305696

@JanLuca JanLuca marked this pull request as draft June 23, 2021 14:08
@JanLuca JanLuca force-pushed the hosvd branch 2 times, most recently from 0145b15 to f6f1b10 Compare June 23, 2021 15:31
@JanLuca JanLuca marked this pull request as ready for review June 23, 2021 15:32
@JanLuca JanLuca changed the title WIP: Implement higher-order SVD (HOSVD) for M-D tensors Implement higher-order SVD (HOSVD) for M-D tensors Jun 23, 2021
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a great addition, thanks!

I wonder if it would not be better to just have it under SVD itself so that we just have one function. We would just have to check the dimension and redirect to the corresponding private method.

@JanLuca
Copy link
Author

JanLuca commented Jun 23, 2021

The problem is that numpy has already defined for its svd method what happens if the argument has dimension > 2, cf. [1]. I would not suggest to implement a different behavior in scipy to avoid confusion.

[1] https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html

@tupui
Copy link
Member

tupui commented Jun 23, 2021

I see, I will let the other SVD residents decide. Thanks for the info.

@tupui tupui added scipy.linalg enhancement A new feature or improvement labels Jun 23, 2021
@rkern
Copy link
Member

rkern commented Jun 23, 2021

+1 for a separate hosvd() function.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass on the structure, it's a good start. I will let the other comment on the maths as I am not a specialist on the subject and not really a user either.

scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
scipy/linalg/tests/test_decomp.py Outdated Show resolved Hide resolved
@tupui
Copy link
Member

tupui commented Jun 23, 2021

Also optional but nice to have, consider adding some typing. We slowly want to add these across the library.

@ilayn
Copy link
Member

ilayn commented Jun 24, 2021

I would suggest a name like tensor_svd or something. It's too close to hankel sv's and also difficult to relate to the abbreviation SVD.

@ilayn
Copy link
Member

ilayn commented Jun 24, 2021

After another coffee, to be perfectly honest, tensor word starts to sound like jargon after a while since NumPy takes pride in calling everything "ndarray', I'm not even sure I believe in mentioning tensors here. ML people use it to sound fancy but they really mean nDarrays (I'm looking at you tensorflow).

Before anyone starts complaining that (a tensor) != (a matrix), I'd like to remind that also (a matrix) != (2d array of numbers) either if we are pedantic at that level before we go down that rabbit hole. Not to mention that this is strictly a numerical context.

@tupui
Copy link
Member

tupui commented Jun 24, 2021

After another coffee, to be perfectly honest, tensor word starts to sound like jargon after a while since NumPy takes pride in calling everything "ndarray', I'm not even sure I believe in mentioning tensors here. ML people use it to sound fancy but they really mean nDarrays (I'm looking at you tensorflow).

Before anyone starts complaining that (a tensor) != (a matrix), I'd like to remind that also (a matrix) != (2d array of numbers) either if we are pedantic at that level before we go down that rabbit hole. Not to mention that this is strictly a numerical context.

That's why I suggested to not talk about these at all and use the current svd (and we could have a param to trigger the high order).
Otherwize, what about spelling it out: high_order_svd ? Because I am not a fan of mangling things like hosvd. Also, who knows what this shortening could mean in the future 😉

@rkern
Copy link
Member

rkern commented Jun 24, 2021

This operation actually does treat the input as a proper tensor, I believe.

@JanLuca
Copy link
Author

JanLuca commented Jun 25, 2021

Also optional but nice to have, consider adding some typing. We slowly want to add these across the library.

That is nice, I really like the idea to have a typed library at some point :)

About the function name: I have no strong opinion about this. Personally I would go with something like higher_order_svd.

@ilayn
Copy link
Member

ilayn commented Jun 25, 2021

Also optional but nice to have, consider adding some typing. We slowly want to add these across the library.

Do we actually? This is something I really want to be discussed extensively. So far it has been coming from NumPy side and a bit one-sided. I for one don't see the need for the extra complication. In fact for many users, not bothering with types is the reason why we started using Python and why so far it gained quite some traction.

@ilayn
Copy link
Member

ilayn commented Jun 25, 2021

For the naming, I think the ship has sailed to do anything about tensor usage so probably tensor_svd would be the most recognizable thing following the suite tensordot.

I don't have any objection to with or without underscore to be honest.

@JanLuca JanLuca force-pushed the hosvd branch 3 times, most recently from dd6f8c7 to 11ef523 Compare June 25, 2021 13:25
@JanLuca
Copy link
Author

JanLuca commented Jun 25, 2021

Except for the discussion on the function name, I think I implemented all suggestions and would be fine from my side for another review :)

@tupui
Copy link
Member

tupui commented Jun 27, 2021

Also optional but nice to have, consider adding some typing. We slowly want to add these across the library.

Do we actually? This is something I really want to be discussed extensively. So far it has been coming from NumPy side and a bit one-sided. I for one don't see the need for the extra complication. In fact for many users, not bothering with types is the reason why we started using Python and why so far it gained quite some traction.

@ilayn Yes we do, cf the last 2 community meetings minutes and the several recent PRs adding it. Also we have introduced mypy in the CI.

The whole Python community is actually adding types all around (cf all the other big libraries and the last pools from SO, PyCharm and talks at PyCon). I agree that we should not complicate the story for the end users. But here, we add a bit of complexity on our side for so much convenience when a final user is using SciPy in a modern IDE. In PyCharm it's day and night. So to me, it's really worth the small trouble. I mean, we ask extensive validation on the inputs, adding types would be the easiest part of the whole job.

We should have soon some guidelines (cc @BvB93) on how to properly add types. There are already common bricks in scipy.lib._util.

@ilayn
Copy link
Member

ilayn commented Jun 27, 2021

I don't agree. The python community is definitely not going towards the typing direction. There is a huge resistance from the regular users. The IDE argument doesn't make a strong argument as most scientific audience doesn't use an IDE. To be perfectly honest, to me this is pretty much a few core users advocating and adding without any discussion or nothing I am aware of. Other than complicating the signatures beyond recognition I have absolutely no use case for types in Python and mind you I am working with professionals. For most of them, absence of types is what makes them come over to Python from other languages. The type annotations are at most wishful arguments for a bit sophisticated functions and callbacks and primarily geared towards IDE users who wants better coding experience. If you folks want to go ahead without discussion fine but at least say it so, such that we don't get surprised by all the typing.

Adding mypy is one thing, converting all function signatures to ones with type annotations is another. I don't need to go David Baezley all the way, but if you want typed language benefits you should code in one.

@tupui
Copy link
Member

tupui commented Aug 3, 2021

@ilayn I believe this was ready. What do you think?

@tupui
Copy link
Member

tupui commented Aug 21, 2021

@mdhaber since you are working on SVD right now. Is this looking good for you?

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with HOSVD, but there are a few things:

  • I'm re-running CI so I can see the rendered docs
  • Typically I would include input validation tests. It looks like you are relying on _asarray_validated and scipy.linalg.svd to validate all three arguments, which might be OK, but it's something to consider checking.
  • The tests all look the same, so I would prefer that they be parametrized for easier maintenance and so that the differences are more clear. More importantly though, is that I'm not sure if there is anything different in what the tests check when full_tensor is True or False. Presumably this option changes the output, so it would be good to check for that.
  • Are there any properties of the output that can be tested other than assert_allclose(A, A_exact)? I think we can assume that the individual U matrices are orthogonal, as they come from scipy.linalg.svd, but is there are there any other properties that are specific to this higher order SVD?
  • Has there been an email to the mailing list? Should check that others consider this to be within scope before merging. It sounds appropriate when you the terms "higher order svd" and "nd-array svd" are used, but - as you addressed in the discussion a bit, I think - it sounds less appropriate when the documentation talks about tensors. Then again, the paper has 4000+ citations, which is pretty convincing.

@mdhaber mdhaber closed this Aug 24, 2021
@mdhaber mdhaber reopened this Aug 24, 2021
scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
scipy/linalg/decomp_svd.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor

mdhaber commented Aug 24, 2021

Close-reopen didn't re-build docs, apparently, so I made a minor commit (to identify default value of check_finite) to make them build

@tupui
Copy link
Member

tupui commented Aug 24, 2021

Thanks for having a look @mdhaber 😃

  • Has there been an email to the mailing list? Should check that others consider this to be within scope before merging. It sounds appropriate when you the terms "higher order svd" and "nd-array svd" are used, but - as you addressed in the discussion a bit, I think - it sounds less appropriate when the documentation talks about tensors. Then again, the paper has 4000+ citations, which is pretty convincing.

Yes there was: https://mail.python.org/pipermail/scipy-dev/2021-June/024904.html (do we have a search for the dev mailing list? I used the date here)

@rgommers
Copy link
Member

Yes there was: https://mail.python.org/pipermail/scipy-dev/2021-June/024904.html (do we have a search for the dev mailing list? I used the date here)

No we don't. We'll fix that sometime in the next year is my guess, by moving off Mailman.

What I do is archive interesting threads, so I have search inside my email client.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 24, 2021

Rendered documentation looks good. Re: mailing list - OK, thanks!

@j-bowhay j-bowhay added this to the 1.10.0 milestone Oct 7, 2022
@tylerjereddy
Copy link
Contributor

tylerjereddy commented Nov 26, 2022

It sounds like this PR may be quite useful, but it has also been inactive for a year and we don't normally hold up releases for enhancements like this. What I'm going to do is hope that this moves forward soon, but bump the milestone up a bit so that it isn't on my "release checklist" before branching in 10 days. If things pick up again and you get it in on time, of course adjust the milestone back after merging.

@tylerjereddy tylerjereddy modified the milestones: 1.10.0, 1.11.0 Nov 26, 2022
@h-vetinari h-vetinari modified the milestones: 1.11.0, 1.12.0 May 20, 2023
@tylerjereddy
Copy link
Contributor

I'm hoping to branch for SciPy 1.12.0 on December 6th--any chance this will get some momentum again before then? Otherwise, my decision to bump the milestone again would be likely without more input.

@tylerjereddy tylerjereddy modified the milestones: 1.12.0, 1.13.0 Dec 4, 2023
@tylerjereddy
Copy link
Contributor

No activity here since I last bumped the milestone, so I see no reason not to bump again before branching ~Sunday 17th for 1.13.0. If the PR comes back to life and is deemed safe to merge at the last minute, that's fine, just adjust the milestone back if it goes in.

@tylerjereddy tylerjereddy modified the milestones: 1.13.0, 1.14.0 Mar 12, 2024
@lucascolley lucascolley changed the title Implement higher-order SVD (HOSVD) for M-D tensors ENH: linalg: implement higher-order SVD (HOSVD) for M-D tensors Mar 14, 2024
@tylerjereddy tylerjereddy modified the milestones: 1.14.0, 1.15.0 May 26, 2024
@tylerjereddy
Copy link
Contributor

bumping milestone again per comments above, apologies!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.linalg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet