Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SplineTransformer.inverse_transform #28551

Open
ogrisel opened this issue Feb 28, 2024 · 1 comment
Open

Implement SplineTransformer.inverse_transform #28551

ogrisel opened this issue Feb 28, 2024 · 1 comment
Labels
Needs Decision - Include Feature Requires decision regarding including feature New Feature

Comments

@ogrisel
Copy link
Member

ogrisel commented Feb 28, 2024

Describe the workflow you want to enable

I think it should be possible to implement a new method inverse_transform such that:

import numpy as np
from sklearn.preprocessing import SplineTransformer

rng = np.random.default_rng(0)
X_train = rng.normal(size=(42, 5))
X_test = rng.normal(size=(43, 5))

st = SplineTransformer().fit(X_train)
np.testing.assert_allclose(X_test, st.inverse_transform(st.transform(X_test)))

Describe your proposed solution

There might be several mathematical ways to define such a transform, in particular if when passing a X_fake_transformed that contain real numbers that do not actually result from a spline expansion. For instance when:

  • (X_fake_transformed < 0).any()
  • (X_fake_transformed > 1).any()
  • X_fake_transformed.sum(axis=1) != np.ones(n_samples).

or when all values of a given row are non-zeros at once...

One possible way would be to decode based on X_fake_transformed.argmax(axis=1) and then using the relative strength of neighboring spline activations to resolve ambiguities.

Describe alternatives you've considered, if relevant

The main alternative is to not implement this. The main question is probably why try to implement this in the first place?

Possible use cases:

  • fit a GMM model on spline transformed data (to get a more axis-aligned inductive prior), generate samples in the GMM latent space and then recode those samples back into the original space.

  • fit PCA with a small rank on spline encoded data and then reconstruct back the projected data,

  • fit k-means in spline space and recode the learned centroids back in the original feature space for inspection.

  • idem for NMF or dictionary learning components.

Additional context

If #28043 gets merged, missing values support should also be included when using the 'indicator' strategy.

@ogrisel ogrisel added New Feature Needs Triage Issue requires triage Needs Decision - Include Feature Requires decision regarding including feature and removed Needs Triage Issue requires triage labels Feb 28, 2024
@lorentzenchr
Copy link
Member

While I like the interest in splines, I don’t think an inverse spline method is useful in practice, but difficult to implement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision - Include Feature Requires decision regarding including feature New Feature
Projects
None yet
Development

No branches or pull requests

2 participants