-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: linalg: implement higher-order SVD (HOSVD) for M-D tensors #14284
base: main
Are you sure you want to change the base?
Conversation
0145b15
to
f6f1b10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be a great addition, thanks!
I wonder if it would not be better to just have it under SVD
itself so that we just have one function. We would just have to check the dimension and redirect to the corresponding private method.
The problem is that numpy has already defined for its svd method what happens if the argument has dimension > 2, cf. [1]. I would not suggest to implement a different behavior in scipy to avoid confusion. [1] https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html |
I see, I will let the other SVD residents decide. Thanks for the info. |
+1 for a separate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a pass on the structure, it's a good start. I will let the other comment on the maths as I am not a specialist on the subject and not really a user either.
Also optional but nice to have, consider adding some typing. We slowly want to add these across the library. |
I would suggest a name like tensor_svd or something. It's too close to hankel sv's and also difficult to relate to the abbreviation SVD. |
After another coffee, to be perfectly honest, tensor word starts to sound like jargon after a while since NumPy takes pride in calling everything "ndarray', I'm not even sure I believe in mentioning tensors here. ML people use it to sound fancy but they really mean nDarrays (I'm looking at you tensorflow). Before anyone starts complaining that (a tensor) != (a matrix), I'd like to remind that also (a matrix) != (2d array of numbers) either if we are pedantic at that level before we go down that rabbit hole. Not to mention that this is strictly a numerical context. |
That's why I suggested to not talk about these at all and use the current |
This operation actually does treat the input as a proper tensor, I believe. |
That is nice, I really like the idea to have a typed library at some point :) About the function name: I have no strong opinion about this. Personally I would go with something like |
Do we actually? This is something I really want to be discussed extensively. So far it has been coming from NumPy side and a bit one-sided. I for one don't see the need for the extra complication. In fact for many users, not bothering with types is the reason why we started using Python and why so far it gained quite some traction. |
For the naming, I think the ship has sailed to do anything about tensor usage so probably I don't have any objection to with or without underscore to be honest. |
dd6f8c7
to
11ef523
Compare
Except for the discussion on the function name, I think I implemented all suggestions and would be fine from my side for another review :) |
@ilayn Yes we do, cf the last 2 community meetings minutes and the several recent PRs adding it. Also we have introduced mypy in the CI. The whole Python community is actually adding types all around (cf all the other big libraries and the last pools from SO, PyCharm and talks at PyCon). I agree that we should not complicate the story for the end users. But here, we add a bit of complexity on our side for so much convenience when a final user is using SciPy in a modern IDE. In PyCharm it's day and night. So to me, it's really worth the small trouble. I mean, we ask extensive validation on the inputs, adding types would be the easiest part of the whole job. We should have soon some guidelines (cc @BvB93) on how to properly add types. There are already common bricks in |
I don't agree. The python community is definitely not going towards the typing direction. There is a huge resistance from the regular users. The IDE argument doesn't make a strong argument as most scientific audience doesn't use an IDE. To be perfectly honest, to me this is pretty much a few core users advocating and adding without any discussion or nothing I am aware of. Other than complicating the signatures beyond recognition I have absolutely no use case for types in Python and mind you I am working with professionals. For most of them, absence of types is what makes them come over to Python from other languages. The type annotations are at most wishful arguments for a bit sophisticated functions and callbacks and primarily geared towards IDE users who wants better coding experience. If you folks want to go ahead without discussion fine but at least say it so, such that we don't get surprised by all the typing. Adding mypy is one thing, converting all function signatures to ones with type annotations is another. I don't need to go David Baezley all the way, but if you want typed language benefits you should code in one. |
@ilayn I believe this was ready. What do you think? |
@mdhaber since you are working on SVD right now. Is this looking good for you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with HOSVD, but there are a few things:
- I'm re-running CI so I can see the rendered docs
- Typically I would include input validation tests. It looks like you are relying on
_asarray_validated
andscipy.linalg.svd
to validate all three arguments, which might be OK, but it's something to consider checking. - The tests all look the same, so I would prefer that they be
parametrize
d for easier maintenance and so that the differences are more clear. More importantly though, is that I'm not sure if there is anything different in what the tests check whenfull_tensor
is True or False. Presumably this option changes the output, so it would be good to check for that. - Are there any properties of the output that can be tested other than
assert_allclose(A, A_exact)
? I think we can assume that the individualU
matrices are orthogonal, as they come fromscipy.linalg.svd
, but is there are there any other properties that are specific to this higher order SVD? - Has there been an email to the mailing list? Should check that others consider this to be within scope before merging. It sounds appropriate when you the terms "higher order svd" and "nd-array svd" are used, but - as you addressed in the discussion a bit, I think - it sounds less appropriate when the documentation talks about tensors. Then again, the paper has 4000+ citations, which is pretty convincing.
[skip azp] [skip actions]
Close-reopen didn't re-build docs, apparently, so I made a minor commit (to identify default value of |
Thanks for having a look @mdhaber 😃
Yes there was: https://mail.python.org/pipermail/scipy-dev/2021-June/024904.html (do we have a search for the dev mailing list? I used the date here) |
No we don't. We'll fix that sometime in the next year is my guess, by moving off Mailman. What I do is archive interesting threads, so I have search inside my email client. |
Rendered documentation looks good. Re: mailing list - OK, thanks! |
It sounds like this PR may be quite useful, but it has also been inactive for a year and we don't normally hold up releases for enhancements like this. What I'm going to do is hope that this moves forward soon, but bump the milestone up a bit so that it isn't on my "release checklist" before branching in 10 days. If things pick up again and you get it in on time, of course adjust the milestone back after merging. |
I'm hoping to branch for SciPy |
No activity here since I last bumped the milestone, so I see no reason not to bump again before branching ~Sunday 17th for |
bumping milestone again per comments above, apologies! |
Reference issue
Since the pull request #12592 is open for quite some time due to license questions, I took the time to reimplement the HOSVD algorithm myself so it can be included in scipy.
What does this implement/fix?
Based on the publication by Lieven De Lathauwer et al. [1] the higher-order SVD is implemented.
[1] https://doi.org/10.1137/S0895479896305696