Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External package integration #4

Open
LovelyBuggies opened this issue Mar 15, 2020 · 15 comments
Open

External package integration #4

LovelyBuggies opened this issue Mar 15, 2020 · 15 comments
Projects

Comments

@LovelyBuggies
Copy link
Collaborator

LovelyBuggies commented Mar 15, 2020

@henryiii We are going to add some shortcuts for analysis. Could you please specify which kinds of analysis are needed? And what tools or packages do you think are proper?

  • GooFit provides maximum-likelihood fits for arbitrary functions. It seems good, but it's based on GPU devices, and might no be of common use.
  • Iminuit is most commonly used for likelihood fits of models to data. But it's a Python interface for C++ MINUIT2. We might expect a more Pythonic package.
  • Probfit helps us construct a complex fit. However, it's iminuit-based.
  • Zfit is a TensorFlow based fitting model.

There are problems with the above fitting models: GPU-oriented, C++ based, and externally dependent relied. We expect a less dependent, more pythonic solution for common use. I recommend Scipy. Scipy's optimizer module gives us the flexibility to solve problems related to fitting and other data analysis (though it may not perform as well as the more specialized solutions like maximum-likelihood fits).

In addition to this, it is not clear whether our shortcuts should include classification, regression, clustering, etc. (I did not find any questions on the channel.) If yes, scikit-learn could be a wonderful solution.

@LovelyBuggies LovelyBuggies changed the title Analysis shortcut Analysis shortcuts Mar 15, 2020
@LovelyBuggies
Copy link
Collaborator Author

LovelyBuggies commented Mar 15, 2020

Scipy is not dependency-relied and could provide analyzing methods other than fitting, such as integration ... (though I am not sure whether they are of use for HEP). The points are: 1) It might not be specific as GooFit... 2) Using a Scikit-HEP package might be more, umm... HEP-ecosystemic.

@lukasheinrich
Copy link

lukasheinrich commented Mar 15, 2020

hi @LovelyBuggies if you would like to have a histogram-based statistics model, https://github.com/scikit-hep/pyhf might be interesting and only depends on scipy + numpy

@LovelyBuggies
Copy link
Collaborator Author

@lukasheinrich Thanks for your suggestions, I will dive into pyhf and see whether it is proper for the functionality in hist.

@henryiii
Copy link
Member

This is two separate issues: Shortcuts for easy interaction, and adaptors/integration into other packages (which could also be called shortcuts). In general, we should be able to implement some of them / many of them without adding a dependency on the package, though we will have to be careful when we do.

@henryiii henryiii changed the title Analysis shortcuts External package integration Mar 17, 2020
@henryiii
Copy link
Member

I think we should focus on how to "feed" our histograms to these other packages. Maybe come up with a standard histogram API? Then boost-histogram (and maybe others, like Physt) could also support it.

@henryiii henryiii added this to To do in Hist plans Mar 17, 2020
@lukasheinrich
Copy link

One thing that might be important for all but the most simple clients is feeding a structured set of histograms. I started some work along those lines with @jpivarski with histbook and the idea of a "book" / nest-able structure of histograms would be useful. cc @matthewfeickert @kratsg

@LovelyBuggies
Copy link
Collaborator Author

@lukasheinrich An initiative concerning 'nest' was put forward here.

@HDembinski
Copy link
Member

HDembinski commented Mar 18, 2020

What exactly is the problem with iminuit's interface? What is not pythonic enough about it? iminuit has little in common with the interface of C++ MINUIT, it is pretty pythonic already.

@HDembinski
Copy link
Member

Besides, if you like scipy.optimize.minimize, you may also like https://iminuit.readthedocs.io/en/latest/reference.html#iminuit.minimize

@HDembinski
Copy link
Member

@lukasheinrich boost-histogram supports integer and category axes, which can be used to bundle histograms together. I use these axes to have a common histogram with signal, background, different data subsets, etc. What can histbook do that boost-histogram with these axes cannot do?

@HDembinski
Copy link
Member

@LovelyBuggies I disagree with your initial list of "shortcomings". GPU support is not a problem, it is a feature. Any package that supports the GPU should also fall back to CPU computing when GPUs are not available, of course, like numba and jax.

I hope you got from my previous comment that we cannot replace iminuit with scipy.optimize.

"We expect a less dependent, more pythonic solution for common use."
Having well-justified dependencies is ok, if they can be loaded from PyPI and installed automatically. jax and jupyter are high-quality software and they depend on a gazillion of other packages.

@LovelyBuggies
Copy link
Collaborator Author

@HDembinski Thanks for the correction! Looks like I misunderstand them: integrating iminuit to Hist is feasible and reasonable.

@lukasheinrich
Copy link

@HDembinski yes some of these axes types are perfectly suitable. Would 'jagged' data work as well? Consider this case: 2 phase phase region (one has data, bkg histoograms with 10 bins), the other has [data, signal, bkg] histograms with 5 bins

2 event categodies    / \
                     /   \ 
   2 samples      / |   / | \    3 samples
                 /  |  /  |  \
     10 bins     |  |  |  |   |    5 bins

@LovelyBuggies
Copy link
Collaborator Author

LovelyBuggies commented Mar 24, 2020

@henryiii I have some tries and make a new demo concerning this topic HERE :)

@LovelyBuggies
Copy link
Collaborator Author

LovelyBuggies commented Mar 28, 2020

We can encapsulate the work into funcs like h.to_numpy(), e.g., h.to_aghast(), h.to_mplhep(), h.to_root(), etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Hist plans
  
To do
Development

No branches or pull requests

4 participants