Advantages of differentiable analyses #8

alexander-held · 2020-05-17T18:55:52Z

One great aspect of a differentiable analysis workflow is that it allows to use gradient-based methods to optimize the analysis. This might mean for example optimizing a jet pT cut to minimize the uncertainty of a parameter of interest.

There are likely other interesting things that become possible with differentiable workflows. It is worth thinking about what might specifically be useful for HEP.

An example might be the following: An analysis observes a mis-modelling in the form of a slope in the ratio of data vs model prediction as a function of jet pT. A question might be "is this related to the high lepton pT cut we apply?". This could be studied by scanning over lepton pT cuts and evaluating the data / model slope, or more efficiently by evaluating the gradient wrt. the lepton pT cut.

It would be especially interesting if it turns out that there are things that only become possible with a differentiable workflow (maybe when brute-force parameter scans would be computationally prohibitive).

signorgelato · 2020-05-17T19:20:49Z

In that case, one can replace the Heaviside function (associated with the jet pT cut) with a smooth approximation and then use automatic differentiation provided everything is differentiable.

pablodecm · 2020-05-17T20:51:30Z

Interesting idea using the gradients to obtain information to evaluate data-model issues.

I think other advantages of a framework for differentiable analysis built upon modern numerical autodiff frameworks that are not directly related with gradients are much faster and portable forward analysis passes and iteration (even possibly in the GPU o TPU), so more brute-force scans are possible. Also more flexibility to construct arbitrary workflows.

Another advantage is more potentially more accurate modelling of the effect of nuisance parameters, if instead of interpolated histograms, the weights of events are interpolated directly or the event features are modified by the parameters (mentioned in #7 (comment)).

signorgelato · 2020-05-17T21:55:42Z

Moreover, it will then be possible to optimize the program in a portable high level differentiable intermediate representation and then deploy that onto different hardware backends such as GPU and TPU (see TVM). For reference, here are some old slides on deep learning compilers.

GilesStrong · 2020-05-25T10:28:02Z

In that case, one can replace the Heaviside function (associated with the jet pT cut) with a smooth approximation and then use automatic differentiation provided everything is differentiable.

I might be missing something @signorgelato but how would a smooth selection cut working in practice? Presumably the outcome is Boolean, e.g. accept the jet or reject it. I guess one could float the position and scale of a sigmoid and throw a random number per jet and accept the jet if the number is less than the associated value of the sigmoid for that jet's pT. Or perhaps accept all the jets but multiply the associated weights by the sigmoid values (but this could later cause problems with selections based on object multiplicity).

alexander-held · 2020-05-25T13:33:08Z

An object multiplicity selection might be replaced by something fuzzy like an extra weight taken from a Gaussian centered around the desired multiplicity. The object multiplicity per event could be a weighted sum of objects, with a smoothly varying weight.
This approach can introduce a lot of very small numbers through weights, which might lead to floating point issues.

signorgelato · 2020-05-26T23:42:51Z

Yeah, these are potentially things that we can try. Is there some concrete example that you all have in mind we can try now?

GilesStrong · 2020-05-27T09:19:01Z

Possibly. We have some MC samples (AMVA4NP's summer 2016 Delphes production (hh->bbtautau, hh->bbbb, QCD, & ttbar, O(1e7) events each)). Generally hh searches work best with a selection as open as possible; any cuts to basic objects are out of necessity (e.g. trigger thresholds). That said, normally one would cut on the Higgs masses (either mass-windows, or a 2D ellipse). However, the minimum cut is dictated by data/MC agreement, which we can't test for these samples, generally I find that cutting isn't really necessary, and at least for the hh->bbtautau search the performance is already quite high.

Selection for the hh->bbtautau normally uses a harsh veto on extra leptons, so perhaps this could be an interesting test for differentiable multiplicity selection. This veto, though, is designed to reduce Drell-Yan contributions, so we'd need to generate DY samples to fully study this.

Another potential use, could be b-jet assignment in hh->bbbb (matching reco b-jets to Higgs candidates). Maybe @pablodecm could comment more the feasibility of this study.

phinate added big picture discussion labels May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advantages of differentiable analyses #8

Advantages of differentiable analyses #8

alexander-held commented May 17, 2020

signorgelato commented May 17, 2020

pablodecm commented May 17, 2020 •

edited

Loading

signorgelato commented May 17, 2020

GilesStrong commented May 25, 2020

alexander-held commented May 25, 2020

signorgelato commented May 26, 2020

GilesStrong commented May 27, 2020

Advantages of differentiable analyses #8

Advantages of differentiable analyses #8

Comments

alexander-held commented May 17, 2020

signorgelato commented May 17, 2020

pablodecm commented May 17, 2020 • edited Loading

signorgelato commented May 17, 2020

GilesStrong commented May 25, 2020

alexander-held commented May 25, 2020

signorgelato commented May 26, 2020

GilesStrong commented May 27, 2020

pablodecm commented May 17, 2020 •

edited

Loading