Possible to integrate CMS's Combine workflow? #344

kratsg · 2018-10-25T18:09:37Z

Question

CMS uses a tool called Combine which is built on top of RooStats/RooFit.

It seems very possible, as it appears that CMS' workspace is defined as a plaintext file called a datacard, to be able to provide a datacard2json tool to translate the datacard into something usable by pyhf.

The text was updated successfully, but these errors were encountered:

matthewfeickert · 2018-10-27T05:03:39Z

After having some discussions at the US LUA meeting I think that we might want to talk with Josh Bendavid (@bendavid) about this.

jonas-eschle · 2019-06-19T14:18:46Z

Since this is still an open issue, let me mention that on 25.6 there will be a meeting including Josh and others to talk about the implementations of binned/template pdfs in order to move the community in this niche closer together.

matthewfeickert · 2020-01-02T20:31:59Z

Tagging @mattbellis and @benkrikler here, given the email conversation that Matt started (thanks Matt!) RE: what would need to be done to extend the HistFactory JSON v1.0.0 schema to allow translation of CMS Combine cards. @mattbellis provided a toy Combine card that we can start with. @benkrikler had started some discussion on this front at CHEP 2019, so his thoughts and input are very welcome here too.

kratsg · 2020-01-12T16:10:28Z

I'll assign myself on this for now, since I have a small side project that is looking into Combine (part of my SUSY role in ATLAS) and want to explore some code for this.

matthewfeickert · 2020-12-04T19:12:38Z

So it seems that CMS has added some rather complete tutorials that describe the Combine model (HT @kpedro88):

lukasheinrich · 2020-12-04T21:41:48Z

together with #1188 it should be much more straight forward to built a combine-like model

kratsg · 2021-03-03T15:26:52Z

page @alexander-held

alexander-held · 2021-03-03T15:47:29Z

I was curious about the possibility of converting datacards into pyhf workspaces and wrote a small utility https://github.com/alexander-held/datacard-to-pyhf. I do not know much about CMS Combine and the datacard format, so the implementation likely has a range of issues. The most glaring one is that it only supports single-bin channels (and no shape systematics) at the moment.
It runs fine with the toy example from above, resulting in a best-fit of

r =  0.9040 -0.2753 +0.3202

The paper reports 0.93 +0.26 −0.23 (stat.) +0.13 −0.09 (syst.) in the abstract. With another simple example, I do not see perfect agreement between the fit with Combine and pyhf (via MINUIT) either, so there are probably other differences to be understood.

lukasheinrich · 2021-03-03T15:53:55Z

awesome that's a great start.. taking on the simplest example and successively adding features was also pyhf's approach in general. tagging also @clelange

nsmith- · 2021-03-05T22:07:53Z

In case its helpful @andrzejnovak put together a conda recipe for combine in cms-analysis/HiggsAnalysis-CombinedLimit#648
Though it is still python 2 :(

nucleosynthesis · 2021-04-22T16:49:45Z

Are you able to run the "standalone" (works on a CernVM) version of combine [1]? that might help better compare the expected from combine vs pyHF (since the tutorial cards, even the more advanced ones are not identical to the real thing in the papers)

[1] http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/#standalone-version-of-combine

alexander-held · 2021-04-22T16:56:26Z

Could a standalone Docker image also be possible? Having no CVMFS dependence at all would be useful to allow running validations anywhere.

andrzejnovak · 2021-04-22T16:57:12Z

@alexander-held check the PR Nick linked

alexander-held · 2021-04-22T17:04:46Z

Having the conda version available is great! I view a ready-to-use Docker image as complementary to that (I guess with conda there is compilation involved?).

andrzejnovak · 2021-04-22T17:14:05Z

Sure, just wanted to point out that with the conda env you can build the image on the fly as well without having to access cvmfs when compiling stuff.

nucleosynthesis · 2021-04-22T17:18:19Z

Any way to run standalone is fine. I'm not sure how well synched the version with conda env is with the main branch (the 102x vs 112x), but for this I don't think it really matters too much. Just wasn't sure whether the comparison by @alexander-held was a direct comparison of a combine run or not.

alexander-held · 2021-04-22T17:41:58Z

@nucleosynthesis Yes, for the comparison with pyhf I was running Combine on lxplus. I saw small differences for these example datacards when comparing a result obtained with Combine to a result obtained by converting the model to pyhf and then minimizing the resulting HistFactory version of the model. The differences were small enough for me to be confident that the conversion is generally working, but slightly larger than what I would expect purely from slight differences in minimization. They probably come down to things like interpolation algorithm differences.

While writing this comment I noticed one discrepancy: my lnN treatment for a value 1.2 would use 0.8 for -1σ, but I should use 1/1.2. I will give this a try.

Would you recommend the CombineHarvester Python API for datacard parsing? I remember looking at it last month but did not know how complete and up-to-date it is. I think the biggest challenge in creating the corresponding pyhf model for a given datacard is figuring out how to correctly parse the datacard format.

Is there a good place to ask technical questions about Combine model building details (as a non-CMS member)?

nucleosynthesis · 2021-04-22T18:53:46Z

CombineHarvester is probably overkill, though it should be up to date. A while ago we made a python dumping option in the Datacard parser --dump-datacard which prints to stdout an equivalent python script that can be run to just do the same thing as running text2workspace.py over the datacard. That might be helpful to see whats being mapped to what (see here: http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part2/settinguptheanalysis/#automatic-production-of-datacards-and-workspaces )

For discussions / Q's to the combine team, probably the easiest thing is to submit an issue here and add the label "question" for now.

alexander-held · 2021-04-23T14:30:10Z

I had not noticed --dump-datacard before, but this looks super useful. Thanks a lot!

matthewfeickert · 2021-11-09T17:04:23Z

Just dumping here that @ajgilbert gave a session on Combine (:+1:) at the first hands-on workshop on publication of statistical models where the last 4 slides are relevant to pyhf and Combine interop and probability model preservation.

mattbellis · 2021-11-09T17:46:37Z

There's a CMS Top workshop taking place this week where Combine will be discussed. It is at a time that I can't attend, but I'm going to try to reach out to the speaker(s) to see if there's any interest in understanding how / if we can create examples comparing and contrasting combine and pyhf.

Building some hypersimple comparison examples is on a very long to-do list of mine. :)

maxgalli · 2021-11-10T08:50:12Z

Hi! I don't know if this can be useful, but a while ago I started working on @alexander-held's repo with the intention of adding support for shape-based analyses datacards. You can find it here and the output can be tested e.g. with this datacard.

A few huge disclaimers:

the reason why I started working on this was mostly to have a way to easily handle and visualize huge datacards like the ones I use for the Run2 differential combination, and the json format is probably the best way to do it (text files like datacards clearly don't scale well when there are a lot of channels, samples and modifiers); this means that the way I translate some aspects are not compatible with pyhf itself, and have to be changed (see e.g. up and down modifiers, where I introduce a dictionary called shape with entries up and down containing the bin values for each modifier - clearly incompatible with a pyhf analysis flow)
as you can see, there are quite some types of modifiers that I didn't even try to translate - in these cases I simply ignored them with a comment or translated them to something completely meaningless

kratsg added follow up research experimental stuff labels Oct 25, 2018

matthewfeickert mentioned this issue Sep 10, 2019

pyhf 2019 into 2020 Roadmap #561

Open

41 tasks

kratsg self-assigned this Jan 12, 2020

matthewfeickert mentioned this issue Feb 5, 2020

Initial roadmap for ecosystem coherency iris-hep/project-milestones#8

Open

kratsg mentioned this issue Mar 8, 2021

simplified likelihoods and pyhf #1359

Open

mhl0116 mentioned this issue Apr 8, 2021

Implement signal region optimization tools cmstas/HggAnalysisDev#8

Open

matthewfeickert mentioned this issue May 1, 2021

Question: Template-fit method with Coffea in Fake Photon analysis alexander-held/template-fit-workflows#1

Closed

matthewfeickert mentioned this issue Oct 27, 2021

mentor: Add Matthew Feickert as pyhf + Combine project mentor iris-hep/iris-hep.github.io#1118

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to integrate CMS's Combine workflow? #344

Possible to integrate CMS's Combine workflow? #344

kratsg commented Oct 25, 2018

matthewfeickert commented Oct 27, 2018 •

edited

Loading

jonas-eschle commented Jun 19, 2019

matthewfeickert commented Jan 2, 2020

kratsg commented Jan 12, 2020

matthewfeickert commented Dec 4, 2020 •

edited

Loading

lukasheinrich commented Dec 4, 2020

kratsg commented Mar 3, 2021

alexander-held commented Mar 3, 2021

lukasheinrich commented Mar 3, 2021

nsmith- commented Mar 5, 2021

nucleosynthesis commented Apr 22, 2021

alexander-held commented Apr 22, 2021

andrzejnovak commented Apr 22, 2021

alexander-held commented Apr 22, 2021

andrzejnovak commented Apr 22, 2021

nucleosynthesis commented Apr 22, 2021 •

edited

Loading

alexander-held commented Apr 22, 2021 •

edited

Loading

nucleosynthesis commented Apr 22, 2021

alexander-held commented Apr 23, 2021

matthewfeickert commented Nov 9, 2021

mattbellis commented Nov 9, 2021

maxgalli commented Nov 10, 2021

Possible to integrate CMS's Combine workflow? #344

Possible to integrate CMS's Combine workflow? #344

Comments

kratsg commented Oct 25, 2018

Question

matthewfeickert commented Oct 27, 2018 • edited Loading

jonas-eschle commented Jun 19, 2019

matthewfeickert commented Jan 2, 2020

kratsg commented Jan 12, 2020

matthewfeickert commented Dec 4, 2020 • edited Loading

lukasheinrich commented Dec 4, 2020

kratsg commented Mar 3, 2021

alexander-held commented Mar 3, 2021

lukasheinrich commented Mar 3, 2021

nsmith- commented Mar 5, 2021

nucleosynthesis commented Apr 22, 2021

alexander-held commented Apr 22, 2021

andrzejnovak commented Apr 22, 2021

alexander-held commented Apr 22, 2021

andrzejnovak commented Apr 22, 2021

nucleosynthesis commented Apr 22, 2021 • edited Loading

alexander-held commented Apr 22, 2021 • edited Loading

nucleosynthesis commented Apr 22, 2021

alexander-held commented Apr 23, 2021

matthewfeickert commented Nov 9, 2021

mattbellis commented Nov 9, 2021

maxgalli commented Nov 10, 2021

matthewfeickert commented Oct 27, 2018 •

edited

Loading

matthewfeickert commented Dec 4, 2020 •

edited

Loading

nucleosynthesis commented Apr 22, 2021 •

edited

Loading

alexander-held commented Apr 22, 2021 •

edited

Loading