Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligned UMAP: clarification on 'overlapping' points through time? #984

Open
drscotthawley opened this issue Mar 18, 2023 · 2 comments
Open

Comments

@drscotthawley
Copy link

drscotthawley commented Mar 18, 2023

Hi! Thanks very much for developing these wonderful tools. I've used UMAP for a little while and now I'm very excited to try out Aligned UMAP. The instructions provide an example which is not merely "contrived", it's... hard to see how to use actual time-dependent data.

In the example, we have 10 digits and time evolution is put into 10 time steps (the coincidence of the two different 10's really slowed me down lol), but... somehow you make it so that data points are shared between time steps? I don't see how to adapt that to the (common?) case in which all the data points change at each time step.

It's still not clear to me how we're supposed to build some sort of "overlapping" amount of points. (Are we expected to insert "glue frames" in between our time slices, for which we grab half the points from the previous time and half from the next time?)

My current application is that I have 6 different classes of points, with 360 examples for each class, for which there are vectors that are 64 dimensional and evolve over 512 time steps. I'm ok with downsampling the 512 to, lol, maybe 8 or 16 for starters. But... still the "make them overlap" isn't clear to me -- they all change.

Could someone please clarify? I'd be happy to contribute to the documentation,....once I understand how this is supposed to work.

It is generally the case that the indices of points that are supposed to align will persist from time step to time step. Is there a mode whereby we can can make use of that?

Thanks.

@lmcinnes
Copy link
Owner

lmcinnes commented Mar 20, 2023 via email

@drscotthawley
Copy link
Author

drscotthawley commented Mar 22, 2023

Thank you so much for your reply! I was unaware of the congressional voting example so I'm looking at that too.

I wanted to wait to write back until I'd tried to implement your suggestions:

So in your example above, if I understand correctly, after implementing the time-overlapping as you describe, I might end up with a "slices" array with shape like (254, 5, ...) where the 254 is what we get from stepping across 512 with a stride of 2 and a window length of 5, and the "..." would in my case be the additional 360 data points of 64 dimensions each, or (254, 5, 360, 64). (i'll do just one class instead of the 6 i mentioned at first)

So slices.shape == (254, 5, 360, 64). Then the relation_dict variables would be

relation_dict = {i+2:i for i in range(len(slices[0]))}
relation_dicts = [relation_dict.copy() for i in range(len(slices) - 1)]

I'll give that a shot!

...Darn. When I try running

aligned_mapper = umap.AlignedUMAP().fit(slices, relations=relation_dicts)

, I get ValueError: Found array with dim 3. None expected <= 2.
It doesn't seem to want to receive arrays with dimesions >= 3, but isn't that what we naturally get with the overlapping time slices?
Thanks for your time!

PS-
A different strategy I can imagine would be to pass in the full (512, 360, 64) array -- or perhaps downsample in time a bit first, maybe (16, 360, 64), and then use the "identity" relations

relation_dict = {i:i for i in range(len(slices[0]))}
relation_dicts = [relation_dict.copy() for i in range(len(slices) - 1)]

(i.e. no overlapping at all, but telling it that points are supposed to match across time steps?)

When I try that, the execution takes... a very long time. Still waiting to see if/when it completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants