Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advantages of differentiable analyses #8

Open
alexander-held opened this issue May 17, 2020 · 7 comments
Open

Advantages of differentiable analyses #8

alexander-held opened this issue May 17, 2020 · 7 comments

Comments

@alexander-held
Copy link
Member

One great aspect of a differentiable analysis workflow is that it allows to use gradient-based methods to optimize the analysis. This might mean for example optimizing a jet pT cut to minimize the uncertainty of a parameter of interest.

There are likely other interesting things that become possible with differentiable workflows. It is worth thinking about what might specifically be useful for HEP.

An example might be the following: An analysis observes a mis-modelling in the form of a slope in the ratio of data vs model prediction as a function of jet pT. A question might be "is this related to the high lepton pT cut we apply?". This could be studied by scanning over lepton pT cuts and evaluating the data / model slope, or more efficiently by evaluating the gradient wrt. the lepton pT cut.

It would be especially interesting if it turns out that there are things that only become possible with a differentiable workflow (maybe when brute-force parameter scans would be computationally prohibitive).

@signorgelato
Copy link

In that case, one can replace the Heaviside function (associated with the jet pT cut) with a smooth approximation and then use automatic differentiation provided everything is differentiable.

@pablodecm
Copy link

pablodecm commented May 17, 2020

Interesting idea using the gradients to obtain information to evaluate data-model issues.

I think other advantages of a framework for differentiable analysis built upon modern numerical autodiff frameworks that are not directly related with gradients are much faster and portable forward analysis passes and iteration (even possibly in the GPU o TPU), so more brute-force scans are possible. Also more flexibility to construct arbitrary workflows.

Another advantage is more potentially more accurate modelling of the effect of nuisance parameters, if instead of interpolated histograms, the weights of events are interpolated directly or the event features are modified by the parameters (mentioned in #7 (comment)).

@signorgelato
Copy link

Moreover, it will then be possible to optimize the program in a portable high level differentiable intermediate representation and then deploy that onto different hardware backends such as GPU and TPU (see TVM). For reference, here are some old slides on deep learning compilers.

@GilesStrong
Copy link
Member

In that case, one can replace the Heaviside function (associated with the jet pT cut) with a smooth approximation and then use automatic differentiation provided everything is differentiable.

I might be missing something @signorgelato but how would a smooth selection cut working in practice? Presumably the outcome is Boolean, e.g. accept the jet or reject it. I guess one could float the position and scale of a sigmoid and throw a random number per jet and accept the jet if the number is less than the associated value of the sigmoid for that jet's pT. Or perhaps accept all the jets but multiply the associated weights by the sigmoid values (but this could later cause problems with selections based on object multiplicity).

@alexander-held
Copy link
Member Author

An object multiplicity selection might be replaced by something fuzzy like an extra weight taken from a Gaussian centered around the desired multiplicity. The object multiplicity per event could be a weighted sum of objects, with a smoothly varying weight.
This approach can introduce a lot of very small numbers through weights, which might lead to floating point issues.

@signorgelato
Copy link

Yeah, these are potentially things that we can try. Is there some concrete example that you all have in mind we can try now?

@GilesStrong
Copy link
Member

Possibly. We have some MC samples (AMVA4NP's summer 2016 Delphes production (hh->bbtautau, hh->bbbb, QCD, & ttbar, O(1e7) events each)). Generally hh searches work best with a selection as open as possible; any cuts to basic objects are out of necessity (e.g. trigger thresholds). That said, normally one would cut on the Higgs masses (either mass-windows, or a 2D ellipse). However, the minimum cut is dictated by data/MC agreement, which we can't test for these samples, generally I find that cutting isn't really necessary, and at least for the hh->bbtautau search the performance is already quite high.

Selection for the hh->bbtautau normally uses a harsh veto on extra leptons, so perhaps this could be an interesting test for differentiable multiplicity selection. This veto, though, is designed to reduce Drell-Yan contributions, so we'd need to generate DY samples to fully study this.

Another potential use, could be b-jet assignment in hh->bbbb (matching reco b-jets to Higgs candidates). Maybe @pablodecm could comment more the feasibility of this study.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants