Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-factoring Categorical ROI #601

Merged
merged 8 commits into from Jun 27, 2015
Merged

Conversation

JudoWill
Copy link
Contributor

@JudoWill JudoWill commented Apr 1, 2015

As part of the effort @aak65 is doing implementing some Bio-glue features he's going to need a more robust ability to pass around ROIs and subsets that are based on categorical data in a way that's more robust then my approach before ... which was basically convert everything to numbers and hope it all worked out.

This has all of the backings for the new ROI/Subset but I can't quite figure out where to fold it into the Historgram & Scatter viewers. My ideal situation would be after the user uses a Range-selector on an axis defined by a categorical component it would adjust the returned ROI accordingly. Any ideas on where I would find that logic? @ChrisBeaumont or @astrofrog

np.array([True, True, False]))



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember the problem correctly, the issue is that CategoricalComponents are internally represented as integer arrays, with a mapping from index to category. This mapping was potentially different for two Components, even if their categories overlapped / were the same. Because the ROI logic was using the underlying numerical arrays and not the categories, they in general wouldn't filter properly.

Do I have that right? If so, let's add some tests that setup up multiple CategoricalCoponents with different number->category mappings, and ensure they filter correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. In the CategoricalRoi code proper I have some logic to deal with inputs of list, np.array, pd.Series and CategoricalComponentsand transparently deal with all variants.

With the components I access their underlying .categorical_data to get the actual data instead of the digitized versions. This way it will be comparable across different Data objects, something that didn't really work in the previous iterations.

@ChrisBeaumont
Copy link
Member

This is great! This behavior has been broken for too long.

Regarding integrating with the viewers: after a user draws an ROI on the screen viewers typically call apply_roi (https://github.com/glue-viz/glue/blob/master/glue/clients/scatter_client.py#L254). These functions take an roi as input, build a subset from it, and then apply the subset. So ideally you would hook into this function and customize the ROI before building a subset state. Note that multiple different ROIs flow into this function, so you shouldn't assume (eg) a RangeROI.

@JudoWill
Copy link
Contributor Author

JudoWill commented Apr 1, 2015

Yeah, I saw that area, I figured there had to be something that returns an RoiSubsetState to be passed along to other areas ... but I guess that's done through the magic of callbacks.

From a UI standpoint, do you think I should only accept Range and Rectangular ROIs when one (or both) axes are Categorical? I can't quite picture how I would implement polygonal ROI with one categorical axis and one continuous axis.

@astrofrog
Copy link
Member

I agree that polygonal selection probably doesn't make sense. In addition to rectangular/range selection it might be cool in future for categorical histograms to allow clicking on a single bin or multiple bins (pressing the shift or command key) to select multiple histogram bins.

@JudoWill
Copy link
Contributor Author

JudoWill commented Apr 2, 2015

Oooo, that would be really slick. Refactoring the PointRoi should work for
that.

On Thu, Apr 2, 2015, 7:33 AM Thomas Robitaille notifications@github.com
wrote:

I agree that polygonal selection probably doesn't make sense. In addition
to rectangular/range selection it might be cool in future for categorical
histograms to allow clicking on a single bin or multiple bins (pressing the
shift or command key) to select multiple histogram bins.


Reply to this email directly or view it on GitHub
#601 (comment).

@ChrisBeaumont
Copy link
Member

In addition to rectangular/range selection it might be cool in future for categorical histograms to allow clicking on a single bin or multiple bins (pressing the shift or command key) to select multiple histogram bins.

This is actually the internal behavior of the range selector in the histogram (ie it expands to bin boundaries), but it would be good to actually expose that in the UI and let the user shift between smooth/snapped selection

@ChrisBeaumont
Copy link
Member

Yeah, I saw that area, I figured there had to be something that returns an RoiSubsetState to be passed along to other areas ... but I guess that's done through the magic of callbacks.

We rely on the kind of ugly use of the EditSubsetMode singleton to actually decide how to apply subsets to the datasets. It's dumb :). The upshot is that in that function you should just have to modify the roi, build an otherwise-normal RoiSubsetState, and let EditSubsetMode apply it.

@ChrisBeaumont
Copy link
Member

From a UI standpoint, do you think I should only accept Range and Rectangular ROIs when one (or both) axes are Categorical? I can't quite picture how I would implement polygonal ROI with one categorical axis and one continuous axis.

Good question. Unfortunately we don't have good UI mechanisms that alert the user why a particular subset isn't valid. So it might be confusing to implement this (unless you want to flat-out disable the mouse modes that let users draw an unsupported ROI). You could imagine implementing a hybrid categorical/continuous ROI -- there are a couple of ways to define it, but one way to do it is to take the bounding-box of the shape, apply a categorical filter on one dimension, a range filter on the other, and "and" them together.

@JudoWill
Copy link
Contributor Author

JudoWill commented Apr 3, 2015

bounding-box of the shape, apply a categorical filter on one dimension, a range filter on the other, and "and" them together.

This is what I intended to do. But it only works with a rectangular ROI; an arbitrarily constructed polygon (or circle) can't really be built this way. Is there a mechanism for "graying out" particular mouse modes? If so, I could just gray-out the ones that aren't logical when the user makes one of the axes a categorical one.

@astrofrog
Copy link
Member

@JudoWill - thanks! could you rebase to make sure that Travis passes now? (there was an issue that should be fixed in master). Is this ready for a final review?

@JudoWill
Copy link
Contributor Author

@astrofrog - I want to add a few more tests and add more docs to how the categorical system works.

@astrofrog
Copy link
Member

@JudoWill - ok, thanks!

@ChrisBeaumont ChrisBeaumont mentioned this pull request Jun 17, 2015
@astrofrog
Copy link
Member

@JudoWill - thanks! There is one failure on Travis which looks genuine and is only being caught by older versions of Numpy (but is in fact a real problem) - when categories is None, I think that unique in:

self.categories = np.unique(categories)

should not be called. The only reason this works in recent versions of Numpy is because it returns:

In [4]: np.unique(None)
Out[4]: array([None], dtype=object)

which I don't think is what we want.

@JudoWill
Copy link
Contributor Author

Yup, definitely a bug. Fixed it and added a test that should make sure it doesn't pop up again.

@astrofrog
Copy link
Member

This looks good!

@JudoWill - did you want to include any docs about this? (you said before * I want to add ... more docs to how the categorical system works.*). Just thought I'd check before we go ahead and merge :)

@JudoWill
Copy link
Contributor Author

@astrofrog You can merge it, I'm making another PR with a whole bunch of sphinx docs edits. I put the immediately important docs in the doc-strings already in the code.

@astrofrog
Copy link
Member

Ok, sounds good! @ChrisBeaumont - does this look good to you too?

@JudoWill - when preparing the other pull request, can you also add two entries to the CHANGES.md file to mention the changes in this and your other pull request?

@ChrisBeaumont
Copy link
Member

+1

As a followup, I think we should refactor logic branches in the viewers that look like:

if comp.categorical:
    state = CategoricalRoiSubsetState.from_range(comp, self.component, lo, hi)
else:
  ...

And instead add build_range_subset_state methods to the Component classes.

astrofrog added a commit that referenced this pull request Jun 27, 2015
@astrofrog astrofrog merged commit 3f9215e into glue-viz:master Jun 27, 2015
@astrofrog astrofrog mentioned this pull request Aug 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants