Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support categorical axes on boost histograms #764

Merged
merged 124 commits into from Nov 2, 2022

Conversation

lobis
Copy link
Collaborator

@lobis lobis commented Oct 24, 2022

Closes #672.

The logic for setting the fLabels on the axis is pretty straightforward, however to achieve this we need to serialize THashList which is not currently implemented.

While working on this I found out a problem when serializing TList (which THashList inherits directly from), which is pointed out in #762 and is fixed by #763.

Another issue related to weighted histograms was also found and fixed at #774.

Summary of changes:

  • Implement THashList serialization and to_HashList factory method.
  • Add tests for THashList writing. Compare serialization (bytes) to TList as they should be the same except for class name.
  • The name (fName) of all axes is now xaxis, yaxis or zaxis regardless of user definition, as this field should not be modified by the user since it causes problems on histogram drawing. (The text of the axis is fTitle). Tests have been updated accordingly.
  • Add tests for histograms with categorical axes.

@lobis
Copy link
Collaborator Author

lobis commented Oct 25, 2022

At this point serialization appears to work okay but the axis is not being drawn in the histogram (drawn using root). This is probably due to some badly set option. I will compare the uproot generated histogram with one generated with root to see the differences.

import hist
import uproot
import numpy

cat_axis = hist.axis.IntCategory([1, 2, 3], label='Category')
h = hist.Hist(cat_axis)
h.fill(
    numpy.random.randint(1, 4, 1000)
)

with uproot.recreate("test.root") as f:
    f['h'] = h

image

@lobis lobis marked this pull request as ready for review November 2, 2022 17:42
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all looks good and I approve. Thank you very much for the effort you put into this!

One thing now looks odd to me: you've removed the three instances of

    def to_boost(self, metadata=None, axis_metadata=None):
        return super().to_boost(metadata=metadata, axis_metadata=axis_metadata)

which bypass the no_inherit mechanism, which prevents normal inheritance because of the ROOT histogram inheritance tree. So how is TH1D getting a to_boost method?

I can see that it is because these tests pass:

https://github.com/lobis/uproot5/blob/c294f8f5ccd67ff1d75799074e2621736c5abaab/tests/test_0046-histograms-bh-hist.py#L2821-L2842

(I was about to add them, but they were already there.)

@jpivarski
Copy link
Member

Nevermind—I think I see it: no_inherit prevents Python classes from being applied to C++ objects in which the C++ class hierarchy has a superclass of that name, but it doesn't do anything to Python inheritance itself (which is good, because that would be confusing).

https://github.com/lobis/uproot5/blob/c294f8f5ccd67ff1d75799074e2621736c5abaab/src/uproot/deserialization.py#L78-L83

Tthe Python inheritance is

graph TB
    A((Histogram))-->B((TH1))
    A-->C((TH2))
    A-->D((TH3))
    A-->H((Profile))
    H-->E((TProfile))
    H-->F((TProfile2D))
    H-->G((TProfile3D))

and the C++ inheritance is

When we deserialize a TH3F, for instance, we see that it has a C++ superclass named TH3, and we have a registered Python behavior for TH3, so its methods and properties get mixed into the histogram Model class (Model_TH3F_v#). Since TH3 is set to "no_inherit" from TH1, we don't mix our TH1 methods and properties into that object. However, our Python TH3 is a Python subclass of Histogram, so it gets all of the Histogram methods, including to_boost.

So this PR makes complete sense to me. I'll squash-and-merge it.

Thanks again!

@lobis
Copy link
Collaborator Author

lobis commented Nov 2, 2022

I will merge (squash!) once the tests pass.

@jpivarski jpivarski enabled auto-merge (squash) November 2, 2022 19:58
@jpivarski jpivarski merged commit f701c24 into scikit-hep:main Nov 2, 2022
@jpivarski jpivarski removed the next-release Required for the next release label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support boost_histogram/hist's categorical axes
3 participants