Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add dtype preservation to FactorAnalysis #24321

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

svenstehle
Copy link
Contributor

@svenstehle svenstehle commented Sep 2, 2022

Reference Issues/PRs

In scope of #11000
Continuing work of @thibsej already started in #13303

What does this implement/fix? Explain your changes.

Added dtype preservation to FactorAnalysis

Any other comments?

The test passes, but did we also check the numerical stability? PR #13303 added a test for this, is this obsolete with our new check_transformer_preserve_dtypes?

  • No - we still need to check for numerical stability.
sklearn/tests/test_common.py::test_estimators[FactorAnalysis()-check_transformer_preserve_dtypes] PASSED

@glemaitre glemaitre changed the title [ENH] add dtype preservation to FactorAnalysis ENH add dtype preservation to FactorAnalysis Nov 4, 2022
@glemaitre glemaitre self-requested a review December 29, 2022 10:48
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was quite complex to make a review. Instead, I pushed the changes directly into the branch. So what changed is the following:

  • improve the test to check the dtype of the fitted attribute and public function
  • check the equivalence of the transform function between 32 and 64 bits

The latter point is difficult. Doing it I saw 2 other things to change:

  • a bug where the stopping criterion was not the absolute value
  • a SMALL value that was hard coded and actually too small for 32 bits. So I used eps instead depending of the input type.

@glemaitre
Copy link
Member

We can now check if the tests are passing on all platforms.

@glemaitre glemaitre added Waiting for Second Reviewer First reviewer is done, need a second one! and removed Waiting for Reviewer labels Dec 29, 2022
@jeremiedbb
Copy link
Member

1e-1 is really high tolerance. It's almost like having no test at all. I'm not sure we should even add such a test. To me there are 2 possibilities:

  • We care that the results should really be close between float32 and float64. In that case we can't say that this estimator preserves dtype and we force conversion to float64.
  • We consider that preserving dtype essentially means that all computations are done without conversion and it's not so bad that the results between float32 and float64 are not close (up to a reasonable tol).

This PR in its current state assumes the 2nd option. I could live with that, but I'd like the opinion of other devs before merging.

@jeremiedbb jeremiedbb added Needs Decision Requires decision and removed Waiting for Second Reviewer First reviewer is done, need a second one! labels Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants