Re-factoring Categorical ROI #601

JudoWill · 2015-04-01T19:12:06Z

As part of the effort @aak65 is doing implementing some Bio-glue features he's going to need a more robust ability to pass around ROIs and subsets that are based on categorical data in a way that's more robust then my approach before ... which was basically convert everything to numbers and hope it all worked out.

This has all of the backings for the new ROI/Subset but I can't quite figure out where to fold it into the Historgram & Scatter viewers. My ideal situation would be after the user uses a Range-selector on an axis defined by a categorical component it would adjust the returned ROI accordingly. Any ideas on where I would find that logic? @ChrisBeaumont or @astrofrog

ChrisBeaumont · 2015-04-01T21:40:45Z

glue/core/tests/test_roi.py

+                                      np.array([True, True, False]))
+
+
+


If I remember the problem correctly, the issue is that CategoricalComponents are internally represented as integer arrays, with a mapping from index to category. This mapping was potentially different for two Components, even if their categories overlapped / were the same. Because the ROI logic was using the underlying numerical arrays and not the categories, they in general wouldn't filter properly.

Do I have that right? If so, let's add some tests that setup up multiple CategoricalCoponents with different number->category mappings, and ensure they filter correctly.

Correct. In the CategoricalRoi code proper I have some logic to deal with inputs of list, np.array, pd.Series and CategoricalComponentsand transparently deal with all variants.

With the components I access their underlying .categorical_data to get the actual data instead of the digitized versions. This way it will be comparable across different Data objects, something that didn't really work in the previous iterations.

ChrisBeaumont · 2015-04-01T21:44:19Z

This is great! This behavior has been broken for too long.

Regarding integrating with the viewers: after a user draws an ROI on the screen viewers typically call apply_roi (https://github.com/glue-viz/glue/blob/master/glue/clients/scatter_client.py#L254). These functions take an roi as input, build a subset from it, and then apply the subset. So ideally you would hook into this function and customize the ROI before building a subset state. Note that multiple different ROIs flow into this function, so you shouldn't assume (eg) a RangeROI.

JudoWill · 2015-04-01T23:09:28Z

Yeah, I saw that area, I figured there had to be something that returns an RoiSubsetState to be passed along to other areas ... but I guess that's done through the magic of callbacks.

From a UI standpoint, do you think I should only accept Range and Rectangular ROIs when one (or both) axes are Categorical? I can't quite picture how I would implement polygonal ROI with one categorical axis and one continuous axis.

astrofrog · 2015-04-02T11:33:35Z

I agree that polygonal selection probably doesn't make sense. In addition to rectangular/range selection it might be cool in future for categorical histograms to allow clicking on a single bin or multiple bins (pressing the shift or command key) to select multiple histogram bins.

JudoWill · 2015-04-02T12:53:54Z

Oooo, that would be really slick. Refactoring the PointRoi should work for
that.

On Thu, Apr 2, 2015, 7:33 AM Thomas Robitaille notifications@github.com
wrote:

I agree that polygonal selection probably doesn't make sense. In addition
to rectangular/range selection it might be cool in future for categorical
histograms to allow clicking on a single bin or multiple bins (pressing the
shift or command key) to select multiple histogram bins.

—
Reply to this email directly or view it on GitHub
#601 (comment).

ChrisBeaumont · 2015-04-03T00:24:38Z

In addition to rectangular/range selection it might be cool in future for categorical histograms to allow clicking on a single bin or multiple bins (pressing the shift or command key) to select multiple histogram bins.

This is actually the internal behavior of the range selector in the histogram (ie it expands to bin boundaries), but it would be good to actually expose that in the UI and let the user shift between smooth/snapped selection

ChrisBeaumont · 2015-04-03T00:26:29Z

Yeah, I saw that area, I figured there had to be something that returns an RoiSubsetState to be passed along to other areas ... but I guess that's done through the magic of callbacks.

We rely on the kind of ugly use of the EditSubsetMode singleton to actually decide how to apply subsets to the datasets. It's dumb :). The upshot is that in that function you should just have to modify the roi, build an otherwise-normal RoiSubsetState, and let EditSubsetMode apply it.

ChrisBeaumont · 2015-04-03T00:30:05Z

From a UI standpoint, do you think I should only accept Range and Rectangular ROIs when one (or both) axes are Categorical? I can't quite picture how I would implement polygonal ROI with one categorical axis and one continuous axis.

Good question. Unfortunately we don't have good UI mechanisms that alert the user why a particular subset isn't valid. So it might be confusing to implement this (unless you want to flat-out disable the mouse modes that let users draw an unsupported ROI). You could imagine implementing a hybrid categorical/continuous ROI -- there are a couple of ways to define it, but one way to do it is to take the bounding-box of the shape, apply a categorical filter on one dimension, a range filter on the other, and "and" them together.

JudoWill · 2015-04-03T15:25:26Z

bounding-box of the shape, apply a categorical filter on one dimension, a range filter on the other, and "and" them together.

This is what I intended to do. But it only works with a rectangular ROI; an arbitrarily constructed polygon (or circle) can't really be built this way. Is there a mechanism for "graying out" particular mouse modes? If so, I could just gray-out the ones that aren't logical when the user makes one of the axes a categorical one.

astrofrog · 2015-06-13T13:51:12Z

@JudoWill - thanks! could you rebase to make sure that Travis passes now? (there was an issue that should be fixed in master). Is this ready for a final review?

JudoWill · 2015-06-15T13:51:24Z

@astrofrog - I want to add a few more tests and add more docs to how the categorical system works.

astrofrog · 2015-06-15T15:01:18Z

@JudoWill - ok, thanks!

…ents.

astrofrog · 2015-06-22T16:01:03Z

@JudoWill - thanks! There is one failure on Travis which looks genuine and is only being caught by older versions of Numpy (but is in fact a real problem) - when categories is None, I think that unique in:

self.categories = np.unique(categories)

should not be called. The only reason this works in recent versions of Numpy is because it returns:

In [4]: np.unique(None)
Out[4]: array([None], dtype=object)

which I don't think is what we want.

JudoWill · 2015-06-22T16:39:54Z

Yup, definitely a bug. Fixed it and added a test that should make sure it doesn't pop up again.

astrofrog · 2015-06-23T13:32:43Z

This looks good!

@JudoWill - did you want to include any docs about this? (you said before * I want to add ... more docs to how the categorical system works.*). Just thought I'd check before we go ahead and merge :)

JudoWill · 2015-06-23T13:42:36Z

@astrofrog You can merge it, I'm making another PR with a whole bunch of sphinx docs edits. I put the immediately important docs in the doc-strings already in the code.

astrofrog · 2015-06-23T14:00:35Z

Ok, sounds good! @ChrisBeaumont - does this look good to you too?

@JudoWill - when preparing the other pull request, can you also add two entries to the CHANGES.md file to mention the changes in this and your other pull request?

ChrisBeaumont · 2015-06-24T16:41:27Z

+1

As a followup, I think we should refactor logic branches in the viewers that look like:

if comp.categorical:
    state = CategoricalRoiSubsetState.from_range(comp, self.component, lo, hi)
else:
  ...

And instead add build_range_subset_state methods to the Component classes.

Re-factoring Categorical ROI

ChrisBeaumont reviewed Apr 1, 2015
View reviewed changes

ChrisBeaumont mentioned this pull request Jun 17, 2015

Release 0.5? #614

Closed

JudoWill added 7 commits June 22, 2015 10:02

changes required to implement a CategoricalRoi

f565fc0

changes required to implement a categorical subset state

6da8196

changes required to implement categorical components from ScatterClient

bd803ef

changes required to implement the new CategoricalRoi on histogram cli…

480273b

…ents.

Fixing for python-3 comp ... no more .next() on generators?

89c40eb

Fixed some crazy branching logic

21924e3

fixed rebase issues and improved docstrings.

ce4070e

JudoWill force-pushed the categorical_roi branch from 9384003 to ce4070e Compare June 22, 2015 14:19

fixed empty categories issue

11810a5

astrofrog added the Ready for final review label Jun 22, 2015

ChrisBeaumont mentioned this pull request Jun 24, 2015

Refactor logic branches based on Component type -- use polymorphism instead #676

Closed

astrofrog added a commit that referenced this pull request Jun 27, 2015

Merge pull request #601 from JudoWill/categorical_roi

3f9215e

Re-factoring Categorical ROI

astrofrog merged commit 3f9215e into glue-viz:master Jun 27, 2015

astrofrog mentioned this pull request Aug 13, 2015

Bug in categorical ROI #718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-factoring Categorical ROI #601

Re-factoring Categorical ROI #601

JudoWill commented Apr 1, 2015

ChrisBeaumont Apr 1, 2015

JudoWill Apr 1, 2015

ChrisBeaumont commented Apr 1, 2015

JudoWill commented Apr 1, 2015

astrofrog commented Apr 2, 2015

JudoWill commented Apr 2, 2015

ChrisBeaumont commented Apr 3, 2015

ChrisBeaumont commented Apr 3, 2015

ChrisBeaumont commented Apr 3, 2015

JudoWill commented Apr 3, 2015

astrofrog commented Jun 13, 2015

JudoWill commented Jun 15, 2015

astrofrog commented Jun 15, 2015

astrofrog commented Jun 22, 2015

JudoWill commented Jun 22, 2015

astrofrog commented Jun 23, 2015

JudoWill commented Jun 23, 2015

astrofrog commented Jun 23, 2015

ChrisBeaumont commented Jun 24, 2015

Re-factoring Categorical ROI #601

Re-factoring Categorical ROI #601

Conversation

JudoWill commented Apr 1, 2015

ChrisBeaumont Apr 1, 2015

Choose a reason for hiding this comment

JudoWill Apr 1, 2015

Choose a reason for hiding this comment

ChrisBeaumont commented Apr 1, 2015

JudoWill commented Apr 1, 2015

astrofrog commented Apr 2, 2015

JudoWill commented Apr 2, 2015

ChrisBeaumont commented Apr 3, 2015

ChrisBeaumont commented Apr 3, 2015

ChrisBeaumont commented Apr 3, 2015

JudoWill commented Apr 3, 2015

astrofrog commented Jun 13, 2015

JudoWill commented Jun 15, 2015

astrofrog commented Jun 15, 2015

astrofrog commented Jun 22, 2015

JudoWill commented Jun 22, 2015

astrofrog commented Jun 23, 2015

JudoWill commented Jun 23, 2015

astrofrog commented Jun 23, 2015

ChrisBeaumont commented Jun 24, 2015