Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistent shape of matrix returned by scipy2scipy_clipped #2066

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

psorianom
Copy link

@psorianom psorianom commented May 25, 2018

Fixed inconsistent shape of matrix returned by scipy2scipy_clipped, as described in issue #2065
Solved by explicitly setting the second dimension of the new matrix to that of the input matrix.

Fixes #2065.

@menshikh-iv
Copy link
Contributor

hi @psorianom, can you please add tests for this function (at least a case that you described in issue)?

@psorianom
Copy link
Author

hey @menshikh-iv , thanks for checking this out. Sure, I will do it as soon as possible.

@piskvorky piskvorky changed the title Bug inconsistent shape of matrix returned by scipy2scipy_clipped Fixes #2065 Fix inconsistent shape of matrix returned by scipy2scipy_clipped Aug 3, 2018
@piskvorky
Copy link
Owner

piskvorky commented Aug 3, 2018

The entire function looks strange. Its documentation doesn't match the code comments:

Return a scipy.sparse vector/matrix consisting of 'topn' elements of the greatest magnitude (absolute value).
versus
# Sort and clip each row vector first. (so it's actually topn per row?)
versus
# use np.argpartition/argsort and only form tuples that are actually returned. (??)

@menshikh-iv do we have good unit tests here? The code looks dodgy.

@menshikh-iv
Copy link
Contributor

@psorianom
Copy link
Author

psorianom commented Oct 15, 2018

The entire function looks strange. Its documentation doesn't match the code comments:

Return a scipy.sparse vector/matrix consisting of 'topn' elements of the greatest magnitude (absolute value).
versus
# Sort and clip each row vector first. (so it's actually topn per row?)
versus
# use np.argpartition/argsort and only form tuples that are actually returned. (??)

Hi all, sorry for the delay.

Regarding the comments, yes, it is per row, so if the input is an array of n rows, the output will have n rows. This differs from full2sparse_clipped which takes row by row (see in SimilarityABC interface: return [matutils.full2sparse_clipped(v, self.num_best) for v in result])

All in all, I realize that the bug I created is not actually a bug per se, i.e., the code does what it should do according to its description (return the 'topn' elements of the greatest magnitude) without regards to the original size of the matrix.

In my use case, I do need the original size of the matrix, as I use it later for other calculations. So maybe adding a flag to full2sparse_clipped and scipy2scipy_clipped such as "keep_size" would do the trick ? What do you think ?

@menshikh-iv do we have good unit tests here? The code looks dodgy.

@menshikh-iv
Copy link
Contributor

@psorianom hello, sorry for delay

In my use case, I do need the original size of the matrix, as I use it later for other calculations. So maybe adding a flag to full2sparse_clipped and scipy2scipy_clipped such as "keep_size" would do the trick ?

maybe, sounds compatible 👍

@piskvorky
Copy link
Owner

piskvorky commented Jan 17, 2019

If so, it needs a clear description + motivation. I still find the docstrings confusing (and contradictory to code), and adding extra parameters won't help.

@mpenkov mpenkov added this to Needs triage in PR triage Nov 3, 2019
@mpenkov mpenkov added this to In progress in gensim-4.0 via automation Jun 10, 2020
@mpenkov mpenkov removed this from In progress in gensim-4.0 Feb 25, 2021
@mpenkov mpenkov added this to Include in 4.1 in PR triage 2021-06 Jun 22, 2021
@mpenkov mpenkov moved this from Include in 4.1 to Triage in PR triage 2021-06 Jun 22, 2021
@mpenkov mpenkov added the stale Waiting for author to complete contribution, no recent effort label Jun 22, 2021
@mpenkov mpenkov moved this from Unsorted to Handle after next release in PR triage 2021-06 Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Waiting for author to complete contribution, no recent effort
Projects
PR triage
  
Needs triage
PR triage 2021-06
Handle after next release
Development

Successfully merging this pull request may close these issues.

scipy2scipy_clipped may return a matrix with a different shape to that of the input matrix
4 participants