Inverse Transform? #44

lolz0r · 2018-02-18T03:00:48Z

Hi!
Cool project! If there any plan to implement an inverse op: embedding -> data ? Something like

http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.inverse_transform

Thank you!

lmcinnes · 2018-02-18T03:52:06Z

I have given the issue some thought, and have at least a theoretical back of the envelope sketch of how to do it. I don't have any immediate plans in code right now since there are a number of other items that currently have priority. If you would be interested in implementing it yourself please email me and I can try to outline what would be involved.

danilsson · 2018-06-16T11:31:33Z

Hi!
I would be interested in giving this a look. I would like to try to use this for anomaly detection using the standard transform -> inverse transform reconstruction error.

lmcinnes · 2018-06-16T15:24:04Z

Thanks @danielnilssonjj . Email me at leland.mcinnes@gmail.com and I can try to sketch the process for you.

kaijfox · 2019-04-22T22:12:31Z

Wondering if there has been any progress on this @danielnilssonjj @lmcinnes ?

lmcinnes · 2019-04-22T22:33:24Z

Yes, if you check the 0.4dev branch you'll find an implementation. It isn't the fastest, but it should work.

ahsanMah · 2019-10-09T18:13:14Z

Hey guys! I seem to be getting an error when I try to run the inverse_transform on data where the sample size is less than the number of dimensions. Is this expected behavior?

Really appreciate all the great work btw! :)

lmcinnes · 2019-10-09T18:52:22Z

Thanks for highlighting this. It is good to see people exercising this code. I'll have to dig in and find out what is going on here. I can't promise a quick resolution, but it is definitely important to get this working.

ncuxomun · 2020-02-24T19:32:32Z

Thanks for highlighting this. It is good to see people exercising this code. I'll have to dig in and find out what is going on here. I can't promise a quick resolution, but it is definitely important to get this working.

Hi, I was wondering if the "inverse_transform" function is disabled.
I was going through the exercise but I can't complete due to the following error:
AttributeError: 'UMAP' object has no attribute 'inverse_transform'

lmcinnes · 2020-02-24T20:14:21Z

You need the latest version, which is currently on in pre-release. That means it is not the default on PyPI, but can be accessed using the --pre flag. Thus you can do

pip install --pre umap-learn

to install that version.

paudom · 2020-03-10T08:42:53Z

Hi, I was wondering if the problem stated by @ahsanMah is solved. I'm trying to apply the "inverse_transform" but I'm encountering the same problem:

The data I'm working with has 53 samples of 7168 features (53,7168). Thanks to UMAP, I achieve
to project the data into 2D space. Now I want to create new samples on the 2D space back into the data space, so I'm creating an array of shape (# new samples, 2) but the error above appears when calling "inverse_transform". I think this is due to the same problem as @ahsanMah stated.

I'm using the version "umap-learn==0.4.0rc1".

Thanks in advance, I really appreciate all the work done!

lmcinnes · 2020-03-10T12:33:32Z

The current master branch has the problem resolved, but no release to PyPI has the fix yet. You can clone from github and install from that and it should work.

vetrovav · 2020-03-11T02:56:23Z

Hi all,
It seems that I am experiencing the same issue as @ahsanMah and @paudom witn indexes. It would be so great to make it work for our research. Thanks so much in advance.

paudom · 2020-03-11T10:41:46Z

The current master branch has the problem resolved, but no release to PyPI has the fix yet. You can clone from github and install from that and it should work.

Hi, thanks for the quick response!

I have downloaded the current master branch and installed UMAP manually as the readme suggests (installing the dependencies and then using setup.py), but the error keeps appearing. Is there something that I'm missing?

Thanks in advance!

vetrovav · 2020-03-12T04:18:39Z

I guess the issue which occurs when the number of dimensions in the input data is larger the number of samples is due to the fact that the min_vertices variable is the number of dimensions in the original data. However, the second dimension of indices array is the number of samples in the original data. Therefore, this error happens when min_vertices is larger than indices.shape(-1). Would be more than happy to help but I am not sure what is needed to fix it.

lmcinnes · 2020-03-12T13:10:30Z

Okay, I'll try to take a look and see if I can figure out the right fix here.

lmcinnes · 2020-03-12T15:26:52Z

I believe I have a potential fix. I haven't built a reproducer yet, so I can't test right now. If you would like to try with the current master and see if this resolves the issue I would appreciate it.

paudom · 2020-03-12T15:49:54Z

I believe I have a potential fix. I haven't built a reproducer yet, so I can't test right now. If you would like to try with the current master and see if this resolves the issue I would appreciate it.

Hi, trying the current master branch and installing it manually seems to work.

I have achieved to pass from data with shape (53,7168) to 2D space and then use 3 new samples in the 2D space back into the input space with the output shape of (3,7168).

Thanks a lot for looking into it, I appreciate it! Keep the amazing work! :)

lmcinnes · 2020-03-12T15:56:15Z

Thanks -- hopefully others can also verify that it is working for them now.

…

On Thu, Mar 12, 2020 at 11:50 AM paudom ***@***.***> wrote: I believe I have a potential fix. I haven't built a reproducer yet, so I can't test right now. If you would like to try with the current master and see if this resolves the issue I would appreciate it. Hi, trying the current master branch and installing it manually seems to work. [image: Captura de pantalla 2020-03-12 a las 16 45 44] <https://user-images.githubusercontent.com/37597137/76539450-168b2d00-6481-11ea-9bb5-27105fd9f217.png> I have achieved to pass from data with shape (53,7168) to 2D space and then use 3 new samples in the 2D space back into the input space with the output shape of (3,7168). Thanks a lot for looking into it, I appreciate it! Keep the amazing work! :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC3IUBL2HFSKCQ3O7A252CLRHEADFANCNFSM4ERGGMRA> .

paudom · 2020-03-25T11:32:15Z

Hi @lmcinnes, I would like to know:

If a data point is projected using UMAP into 2D space and then use the inverse_transform on that same point, should it return the same original datapoint? Or I'm interpreting wrong how the inverse transform should work?

I'm trying to recover datapoints once UMAP has been applied and I haven't been able to do it.
Thanks in advance.

lmcinnes · 2020-03-26T01:19:48Z

The inverse transform, like the transform, is stochastic in nature. That means that in practice you can't be guaranteed to get the same point back. Ideally you will get a point very close to it.

edebonneuil · 2021-07-17T11:21:46Z

I continued along the idea of @paudom to use the inverse UMAP after UMAP in order to measure the reconstruction error, and to see to what "%var explained" this reconstruction error would correspond to if it were a PCA (https://stats.stackexchange.com/questions/184603/in-pca-what-is-the-connection-between-explained-variance-and-squared-error). This, to give an estimate of the extent to which one grasps "the whole picture" when looking at a 2D or 3D UMAP projection. For example, below 80% "var explained" one would need to be cautious when relying on a UMAP result.

As you indicate @lmcinnes, this reconstruction error embeds a margin due to the stochastic nature of the inverse UMAP. My question is: is there then a way to appreciate how to rectify the reconstruction error ? as a mean to appreciate how representative of the data a UMAP result is

PS: to partially answer, I investigated the behavior of the UMAP-inverseUMAP reconstruction error, starting with columns of uniform random numbers (a case of limited dimension reduction as there is no major pattern). It shed some light :

as long as there are enough lines, eg 2000, the reconstruction error essentially depends on the number of initial columns in the data and the number of dimensions on which they are projected. Ok, intuitive.
the reconstruction error with a UMAP3D is lower than with a UMAP2D, as expected (less dimension reduction). With more dimensions however, it is in-between instead of being further lower: at this stage, the programmed inverse UMAP appears to be not optimized for more than 3 dimensions (as the behavior unlikely comes from the UMAP program itself). Ok, no big deal but good to know for what follows
in the very special case of no-dimension-reduction in 2D, the programmed inverse UMAP appears not to be optimized as this stage (or the UMAP, but unlikely): it leads to the same order of reconstruction error as starting with 3 columns instead of much less.. Ok, no big deal but good to know for what follows. Of note, the inverse UMAP3D does not have this inadequate behavior
taking the above into account, for a low number of columns in the initial data, it seems adequate to offset the [mean-over-all-data, quadratic,] reconstruction error by approx -0.1 : it leads to a null reconstruction error in the no-dimension-reduction case, as one would have with a PCA. With these assumptions, with 2 to 4 columns in the initial data UMAP is more representative of complex (random) data than PCA. Intuitive, but nice to somewhat prove it. It also shows that a few random columns can be relatively well represented with one column less ("%var explained">80%) but not two : random data is hard to compress due to its lack of major pattern, OK, intuitive.
however, offsetting the mean reconstruction error by -0.1 likely isn't enough when having UMAP digest more than 4 or 5 columns. It leads to the impression that UMAP is then much worse than PCA but it might simply correspond to badly correcting for the stochastic error of the inverse UMAP. I did not find a practical or theoretical basis to estimate how to rectify the reconstruction error with many dimensions. Views? Ideas?

lmcinnes added a commit that referenced this issue Mar 12, 2020

Potential fix for #44

d8501a5

edebonneuil mentioned this issue Jul 17, 2021

How does UMAP estimates/evaluates variance? #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inverse Transform? #44

Inverse Transform? #44

lolz0r commented Feb 18, 2018

lmcinnes commented Feb 18, 2018

danilsson commented Jun 16, 2018

lmcinnes commented Jun 16, 2018

kaijfox commented Apr 22, 2019

lmcinnes commented Apr 22, 2019

ahsanMah commented Oct 9, 2019

lmcinnes commented Oct 9, 2019

ncuxomun commented Feb 24, 2020

lmcinnes commented Feb 24, 2020

paudom commented Mar 10, 2020

lmcinnes commented Mar 10, 2020

vetrovav commented Mar 11, 2020

paudom commented Mar 11, 2020

vetrovav commented Mar 12, 2020

lmcinnes commented Mar 12, 2020

lmcinnes commented Mar 12, 2020

paudom commented Mar 12, 2020

lmcinnes commented Mar 12, 2020 via email

paudom commented Mar 25, 2020

lmcinnes commented Mar 26, 2020

edebonneuil commented Jul 17, 2021 •

edited

Inverse Transform? #44

Inverse Transform? #44

Comments

lolz0r commented Feb 18, 2018

lmcinnes commented Feb 18, 2018

danilsson commented Jun 16, 2018

lmcinnes commented Jun 16, 2018

kaijfox commented Apr 22, 2019

lmcinnes commented Apr 22, 2019

ahsanMah commented Oct 9, 2019

lmcinnes commented Oct 9, 2019

ncuxomun commented Feb 24, 2020

lmcinnes commented Feb 24, 2020

paudom commented Mar 10, 2020

lmcinnes commented Mar 10, 2020

vetrovav commented Mar 11, 2020

paudom commented Mar 11, 2020

vetrovav commented Mar 12, 2020

lmcinnes commented Mar 12, 2020

lmcinnes commented Mar 12, 2020

paudom commented Mar 12, 2020

lmcinnes commented Mar 12, 2020 via email

paudom commented Mar 25, 2020

lmcinnes commented Mar 26, 2020

edebonneuil commented Jul 17, 2021 • edited

edebonneuil commented Jul 17, 2021 •

edited