New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling NaNs in NMF #25229
Comments
Letting the matrix might not be sufficient because a sparse matrix needs to be handled with care and might require to use of specific methods of the ping @jeremiedbb that knows better the internal implementation that can now if it would be more or less feasible? |
The issue seems to be about handling nan rather than supporting sparse matrices (which is already the case). Let me rename the issue. Regarding nan handling, there is this PR #8474. It would be interesting to revive this PR, but we need to take a look at the litterature to check if the PR implements a common approach. |
It makes sense, thanks for looking at this request. I agree, it is about handling Missing Values rather than sparse matrixes. |
I revived the PR adding support to missing values in NMF (#8474). Note that for now, the PR only adds support to missing values in dense datasets. Tell me if you have any question. |
That's wonderful! I am trying it locally. Please let me know if I can help with writing more tests or documentation. |
It would be nice to add NMF to this page after merging https://scikit-learn.org/stable/modules/impute.html |
Describe the workflow you want to enable
Motivation:
Sparse matrixes are very common in recommender systems problems. And matrix factorization approach is one of the most popular approaches for this task. But recommender systems often have sparse data with a big amount of missing values
Problem:
In principle, non-negative matrix factorization can work with sparse matrices and optimize the solution based only on the present values. In the scikit-learn implementation, the validation doesn't allow the fit_transform method of NMF to accept sparse matrixes with NaN values
Describe your proposed solution
In the fit_transform method there is code:
X = self._validate_data(
X, accept_sparse=("csr", "csc"), dtype=[np.float64, np.float32]
)
Can you make it configurable to add force_all_finite='allow_nan' to the _validate_data method in the fit_transform method of NMF?
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: