Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling NaNs in NMF #25229

Open
OlenaBugaiova opened this issue Dec 23, 2022 · 7 comments · May be fixed by #8474
Open

Handling NaNs in NMF #25229

OlenaBugaiova opened this issue Dec 23, 2022 · 7 comments · May be fixed by #8474

Comments

@OlenaBugaiova
Copy link

Describe the workflow you want to enable

Motivation:
Sparse matrixes are very common in recommender systems problems. And matrix factorization approach is one of the most popular approaches for this task. But recommender systems often have sparse data with a big amount of missing values
Problem:
In principle, non-negative matrix factorization can work with sparse matrices and optimize the solution based only on the present values. In the scikit-learn implementation, the validation doesn't allow the fit_transform method of NMF to accept sparse matrixes with NaN values

Describe your proposed solution

Can you make it configurable to add force_all_finite='allow_nan' to the _validate_data method in the fit_transform method of NMF?

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@OlenaBugaiova OlenaBugaiova added Needs Triage Issue requires triage New Feature labels Dec 23, 2022
@glemaitre
Copy link
Member

Letting the matrix might not be sufficient because a sparse matrix needs to be handled with care and might require to use of specific methods of the scipy.sparse.

ping @jeremiedbb that knows better the internal implementation that can now if it would be more or less feasible?

@glemaitre glemaitre added Needs Decision Requires decision and removed Needs Triage Issue requires triage labels Dec 30, 2022
@jeremiedbb
Copy link
Member

jeremiedbb commented Dec 30, 2022

The issue seems to be about handling nan rather than supporting sparse matrices (which is already the case). Let me rename the issue.

Regarding nan handling, there is this PR #8474. It would be interesting to revive this PR, but we need to take a look at the litterature to check if the PR implements a common approach.

@jeremiedbb jeremiedbb changed the title Accepting sparse matrixes in non-negative matrix factorization Handling NaNs in NMF Dec 30, 2022
@OlenaBugaiova
Copy link
Author

It makes sense, thanks for looking at this request. I agree, it is about handling Missing Values rather than sparse matrixes.
I see, there was work going on in PR #8474

@TomDLT TomDLT linked a pull request Jan 5, 2023 that will close this issue
3 tasks
@TomDLT
Copy link
Member

TomDLT commented Jan 9, 2023

I revived the PR adding support to missing values in NMF (#8474). Note that for now, the PR only adds support to missing values in dense datasets. Tell me if you have any question.

@OlenaBugaiova
Copy link
Author

That's wonderful! I am trying it locally.
You added a test that the NMF imputation is better than SimpleImputer+NMF. That's why I was looking for it.

Please let me know if I can help with writing more tests or documentation.

@OlenaBugaiova
Copy link
Author

Tested this functionality locally on data having a lot of NaNs. It worked perfectly.
Notes:

  1. NMF ‘cd’ solver doesn't handle missing values
  2. Init method can not be 'nndsvd' / 'nndsvda', 'nndsvdar' in the case of data with missing values

Closing this feature request

nmf_on_data_with_nan

@OlenaBugaiova
Copy link
Author

It would be nice to add NMF to this page after merging https://scikit-learn.org/stable/modules/impute.html

@adrinjalali adrinjalali reopened this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants