Retrieving the original index of the samples in subset #18

hoangthienan95 · 2020-09-28T23:25:02Z

Hi Jacob,

Thank you for a very cool library. I have a quick question. Maybe I missed something, but right now there are no way to retrieve the original index of the samples in the subset to identify which samples/row number the algorithm chose? This need comes up in 2 scenarios:

When I want to know whether I have sufficiently covered my data distribution based on UMAP 2D embedding, I need to know the index of the samples in the subset to merge it back with the UMAP representation
When I have features [A, B, C], but only want to perform submodular optimization on features [A, B], and then want to know the values of feature C of all the samples in the subset.

Right now I don't see any way to extract out the index/numpy array row number of the chosen samples.

jmschrei · 2020-09-28T23:26:49Z

Howdy

The ranking attribute should return this for you.

hoangthienan95 · 2020-09-28T23:56:40Z

Thanks! I felt like I missed something from the docs.

jmschrei closed this as completed Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieving the original index of the samples in subset #18

Retrieving the original index of the samples in subset #18

hoangthienan95 commented Sep 28, 2020

jmschrei commented Sep 28, 2020

hoangthienan95 commented Sep 28, 2020

Retrieving the original index of the samples in subset #18

Retrieving the original index of the samples in subset #18

Comments

hoangthienan95 commented Sep 28, 2020

jmschrei commented Sep 28, 2020

hoangthienan95 commented Sep 28, 2020