-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aligned UMAP: clarification on 'overlapping' points through time? #984
Comments
Perhaps the more complex, but also more realistic [example based on US
congressional voting](
https://umap-learn.readthedocs.io/en/latest/aligned_umap_politics_demo.html)
may help a little.
Generally the goal is to either have some specific identity that persists
over multiple timesteps (e.g. a given member of congress, who has different
voting in each year, but can be identified from one year to the next), or
to bin the data with an overlapping binning strategy. Without knowing more
about your specific data I can't say whether option one makes some sense
(perhaps it does?). To achieve the latter we can do a kind of downsampling
with overlapping bins. To give a concrete example, given 512 time steps, we
could create a sequence of datasets where the first dataset has data from
timesteps [0,1,2,3,4], the second dataset from timesteps [2,3,4,5,6], the
third dataset from timesteps [4,5,6,7,8], and so on. So now you have
overlapping data, because there are data point from timesteps 2,3, and 4
that are in both the first dataset, and in the second dataset, and so on.
Does this help?
…On Sat, Mar 18, 2023 at 6:03 PM Scott H. Hawley ***@***.***> wrote:
Hi! Thanks very much for developing these wonderful tools. I've used UMAP
for a little while and now I'm very excited to try out Aligned UMAP. The
instructions
<https://umap-learn.readthedocs.io/en/latest/aligned_umap_basic_usage.html>
provide an example which is not merely "contrived", it's... hard to see how
to use actual time-dependent data.
In the example, we have 10 digits and time evolution is put into 10 time
steps (the coincidence of the two different 10's really slowed me down
lol), but... somehow you make it so that data points are shared between
time steps? I don't see how to adapt that to (the very common?) case in
which all the data points change at each time step.
It's still not clear to me how we're supposed to build some sort of
"overlapping" amount of points.
My current application is that I have 6 different classes of points, with
360 examples for each class, for which there are vectors that are 64
dimensional and evolve over 512 time steps. I'm ok with downsampling the
512 to, lol, maybe 8 or 16 for starters.
Uh but... still the "make them overlap" isn't clear to me.
Could someone please clarify? I'd be happy to contribute to the
documentation,....once I understand how this is supposed to work.
It is generally the case that the *indices* of points that are supposed
to align will persist from time step to time step. Is there a mode whereby
we can can make use of that?
Thanks.
—
Reply to this email directly, view it on GitHub
<#984>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3IUBPI74HMNL3I4PQFQULW4YWKJANCNFSM6AAAAAAV7WS77U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you so much for your reply! I was unaware of the congressional voting example so I'm looking at that too. I wanted to wait to write back until I'd tried to implement your suggestions: So in your example above, if I understand correctly, after implementing the time-overlapping as you describe, I might end up with a "slices" array with shape like So relation_dict = {i+2:i for i in range(len(slices[0]))}
relation_dicts = [relation_dict.copy() for i in range(len(slices) - 1)] I'll give that a shot! ...Darn. When I try running
, I get PS- relation_dict = {i:i for i in range(len(slices[0]))}
relation_dicts = [relation_dict.copy() for i in range(len(slices) - 1)] (i.e. no overlapping at all, but telling it that points are supposed to match across time steps?) When I try that, the execution takes... a very long time. Still waiting to see if/when it completes. |
Hi! Thanks very much for developing these wonderful tools. I've used UMAP for a little while and now I'm very excited to try out Aligned UMAP. The instructions provide an example which is not merely "contrived", it's... hard to see how to use actual time-dependent data.
In the example, we have 10 digits and time evolution is put into 10 time steps (the coincidence of the two different 10's really slowed me down lol), but... somehow you make it so that data points are shared between time steps? I don't see how to adapt that to the (common?) case in which all the data points change at each time step.
It's still not clear to me how we're supposed to build some sort of "overlapping" amount of points. (Are we expected to insert "glue frames" in between our time slices, for which we grab half the points from the previous time and half from the next time?)
My current application is that I have 6 different classes of points, with 360 examples for each class, for which there are vectors that are 64 dimensional and evolve over 512 time steps. I'm ok with downsampling the 512 to, lol, maybe 8 or 16 for starters. But... still the "make them overlap" isn't clear to me -- they all change.
Could someone please clarify? I'd be happy to contribute to the documentation,....once I understand how this is supposed to work.
It is generally the case that the indices of points that are supposed to align will persist from time step to time step. Is there a mode whereby we can can make use of that?
Thanks.
The text was updated successfully, but these errors were encountered: