Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to histogram #1784

Closed
wants to merge 33 commits into from

Conversation

Projects
None yet
4 participants
@timothydmorton
Copy link

commented Aug 4, 2017

Two updates to the histogram operation:

  • Implemented the change suggested in #1776. Caveat here is that I don't actually like the label being 'x Frequency' or 'y Frequency'; I suggest that when #1783 is fixed, the label should switch back to 'Frequency'.
  • Added a height_normed parameter. This allows histograms to normalize their frequencies to unity, which is useful when overlaying two histograms that have significantly different frequency magnitudes.
@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

The proposed changes sound good to me!

Two quick comments: it would be good if #1783 is fixed first so we don't have to update display tests (but improve the linking behavior). Secondly, I don't think 'height' prefix conveys much inheight_normed as I don't think it makes much sense to normalize across the kdim. I would prefer a simple name such as normalize which I think is self-explanatory enough.

@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

it would be good if #1783 is fixed first so we don't have to update display tests

I think that is a much more substantial amount of work and should not hold up this PR.

@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

I think that is a much more substantial amount of work and should not hold up this PR.

That is a good point. I'm just not sure about changing the display and then changing it back later.

@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

Maybe we can add an option to the histogram operation to control labelling with a plan on how it can solve our problems now with a plan for how we can deprecate it (or maybe it will still be useful?) once #1783 is merged?

@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 4, 2017

@jlstevens I don't know about just calling it "normalize" because there's already a "normed" parameter that normalizes the total integral of the histogram. That seems like it could be confusing. But I'll leave the naming up to you all.

@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

That is a good point.

The problem I have looking at the docstrings for normed and height_normed is that I am now confused as to the difference and what each really means. That suggests better names and docstrings should be possible.

@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 4, 2017

Does this clarify the difference? Attached is a screenshot of a height_normed histogram overlay. If this were just normed, then the 'star' distribution would be way taller than the galaxy distribution; this way, we can see both of them.
screenshot 2017-08-04 11 23 27

@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

Thanks... that does clarify things!

I am tempted to say there should be one normalization parameter with options such as None (or False), 'integral', 'max-height' as these are all different normalization options.

@philippjfr What do you think? Could we do this without breaking backwards compatibility?

@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 4, 2017

I think people are used to passing normed=True to np.histogram, so it's probably not a good idea to change that behavior; maybe we could change our parameter to not having to be a bool, to accommodate other options?

Converted CHANGELOG to markdown (#1788)
* Fixed problems in rst links

* Used pandoc to convert CHANGELOG to Markdown

* Added CHANGELOG entry for 1.8.2
@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

Right. If we make normed an ObjectSelector it could be one of True, False, 'integral' and 'max-height'. We don't technically need 'integral' as an option but I do like an explicit alternative to True (which doesn't really say anything about what the normalization really is).

@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 4, 2017

Looks like actually normed is deprecated in favor of density in numpy, so I'll correct for that too.

@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 4, 2017

I think this does what we want now; I've just changed it so there's an option to pass 'height' or 'integral' to 'normed' in addition to a bool.

@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2017

Looks good! Though there seems to be failures in the unit tests. E.g:

======================================================================
ERROR: test_points_histogram_mean_weighted (tests.testoperation.OperationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/ioam/holoviews/tests/testoperation.py", line 101, in test_points_histogram_mean_weighted
    op_hist = histogram(points, num_bins=3, weight_dimension='y', mean_weighted=True)
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/param/parameterized.py", line 1888, in __new__
    return inst.__call__(*args,**params)
  File "/home/travis/build/ioam/holoviews/holoviews/core/operation.py", line 141, in __call__
    processed = self._process(element)
  File "/home/travis/build/ioam/holoviews/holoviews/operation/element.py", line 576, in _process
    label=view.label, **params)
TypeError: ParameterizedMetaclass object got multiple values for keyword argument 'vdims'
@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2017

I'd be happy to merge this as soon as that unit test has been resolved. I think it occurs when a weight_dimension is passed in which case vdims is passed twice.

@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 9, 2017

OK, this should be good now... I corrected the bug and updated the tests to reflect the new dimension-naming scheme.

@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2017

Updated the test data and restarted the build, hopefully it will pass now and I can merge.

@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2017

Updated the test data

Are we going to keep 'X Frequency' labels then? Or go back to simply 'Frequency' in 2.0 once axis linking can be done properly? I'm happy with the new normed option but I think that maybe the label change could be optional and not on by default...

Getting started guide tweaks (#1664)
* Fixed installation page link in introduction

* Added link to reference gallery in the introduction

* Fixed remaining links in introduction guide

* Fixed typo in tabular datasets getting started guide
@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 11, 2017

I'd be happy with an option to pass a custom name for the frequency dimension. And then passing custom names would be the solution to the linking issue until the underlying issue is fixed.

@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2017

Are we going to keep 'X Frequency' labels then?

I'd say so yes, the histogram operation is most frequently used for adjoined histograms where the y-label is hidden anyway. I think in most other cases it's still a sensible label and we can decide to revert once the normalization system supports independent normalization of x- and y-axes.

I'd also like to remove a workaround for the gridmatrix operation I had to introduce due to this issue. If you don't mind could you remove the .opts declaration on line 775 of holoviews/operation/element.py, i.e.:

opts = dict(axiswise=True, framewise=True)
el = p.diagonal_operation(element, dimension=d1.name,
                   bin_range=bin_range).opts(norm=opts)

should just become:

el = p.diagonal_operation(element, dimension=d1.name, bin_range=bin_range)

Still working on the fix for the other issue I mentioned.

@jlstevens

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2017

How about a label_format parameter that could be {dim} Frequency by default but could be simply 'Frequency' to go back to the old behavior?

I imagine people might have other words they might want to use for the value dimension other than 'frequency' so such a formatter might be useful regardless (instead of hard coding 'Frequency' into the dimension name).

@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2017

Okay, I'm going to make a branch of this PR and make some of the fixes to get the tests passing.

@timothydmorton

This comment has been minimized.

Copy link
Author

commented Aug 18, 2017

@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 18, 2017

No problem at all, really it's not your mess to clean up, since it's a pre-existing bug (that's revealed by your changes) causing the test failures.

@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2017

Okay, now that JupyterCon is over I can finally look at this. I'll make a copy of this branch and then reopen a PR.

@philippjfr philippjfr referenced this pull request Aug 31, 2017

Merged

Histogram improvements #1836

@philippjfr

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2017

Going to close this PR in favor of #1836, your commits will be preserved though @timothydmorton.

@philippjfr philippjfr closed this Aug 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.