Possible bug in the handling fo the normalization in the skimmers #273

francescobrivio · 2022-11-22T21:07:34Z

While re-checking the sample normalization applied in the skimmers for the EFT studies of the non-resonant Run 2 analysis
I noticed a possible bug in the skimmers.
More precisely, in this line (which is used in the numerator when the samples are normalized)

KLUBAnalysis/test/skimNtuple2016_HHbtag.cpp

Line 2263 in a5742e5

    
           theSmallTree.m_MC_weight   = (isMC ? theBigTree.aMCatNLOweight * XS * stitchWeight * HHweight : 1) ;

the weight topPtReweight is missing.
This same weight is instead added in the denominator in these lines:

KLUBAnalysis/test/skimNtuple2016_HHbtag.cpp

Lines 1494 to 1497 in a5742e5

    
           if (theBigTree.npu >= 0 && theBigTree.npu <= 99) // good PU weights 
        
             EvtW = isMC ? (theBigTree.aMCatNLOweight * reweight.weight(PUReweight_MC,PUReweight_target,theBigTree.npu,PUreweightFile) * topPtReweight * HHweight * stitchWeight) : 1.0; 
        
           else if (theBigTree.npu >= 100)                  // use the last available bin for the PU weight 
        
             EvtW = isMC ? (theBigTree.aMCatNLOweight * reweight.weight(PUReweight_MC,PUReweight_target,99, PUreweightFile) * topPtReweight * HHweight * stitchWeight) : 1.0;

The topPtReweight is computed in the gen-section of the skimmers and it's different from 1 only for TT samples.
Even though the branches containing the topPtReweight weight are never used later on in the analysis, as shown above, this weight is actually included in the denominator (totalEvents), but not in the numerator.

I quickly looked at the TTtopPtreweight_up branch of the skims (which is actually filled with topPtReweight, as shown here) for a skim file of the TT_semiLep sample and I can confirm the value is != 1:

@portalesHEP @dzuolo @jonamotta this is most probably a bug that we missed in the past, but further checks (i.e. including re-skimming the TT samples with the addition of this weight in the numerator) are in order.

The text was updated successfully, but these errors were encountered:

portalesHEP · 2022-11-23T08:02:07Z

Thanks for the heads up! I'm not entirely clear on why this weight is needed, and I found out that its value is hardcoded in the skimmer:

KLUBAnalysis/test/skimNtuple2016_HHbtag.cpp

Lines 73 to 74 in a5742e5

    
           const double aTopRW = 0.0615; 
        
           const double bTopRW = -0.0005;

KLUBAnalysis/test/skimNtuple2016_HHbtag.cpp

Lines 1139 to 1141 in a5742e5

    
           float SFTop1 = TMath::Exp(aTopRW+bTopRW*ptTop1); 
        
           float SFTop2 = TMath::Exp(aTopRW+bTopRW*ptTop2); 
        
           topPtReweight = TMath::Sqrt (SFTop1*SFTop2);

Do you know where these values come from and if we should simply remove them from the sum of weight or rather add them back to the event weight for TT events?

dzuolo · 2022-11-23T08:18:21Z

Hi @portalesHEP! If we understood correctly the twiki https://twiki.cern.ch/twiki/bin/viewauth/CMS/TopPtReweighting we are in category "Case 3.1: Analyses with SM tt as background (not in signal)" and, specifically: "In a control region which is signal-depleted and tt-enriched, one should check the data-MC agreement of the main distributions of the analysis, together with the top pT. If the agreement between the data and MC is within the available uncertainties (syst. and stat.) then the effect of top pT mismodeling can be considered covered by the existing uncertainties and no additional correction or uncertainty is needed." So we should remove this weight from the sum of weight and check the data/MC agreement in the inverted resolved2b0j category.

jonamotta · 2022-11-23T08:20:31Z

To my knowledge, the reweighting is done according to what is written in this page.
Even if I admit that by looking at it now, the hardcoded numbers coincide but the method itself no...

portalesHEP · 2022-11-23T08:30:00Z

Ok, thanks for the pointers. Then I'd tend to agree that we should remove the weights before the next skimming round. I suppose that this should not have had any critical impact on the non-resonant result though (?), since there was a dedicated TT correction extracted which should've absorbed any issue introduced by this bug

jonamotta · 2022-11-23T08:41:43Z

I am not sure I agree with either of the points.

I would say that the weights should be correctly re-introduced for the next skimming, after we have confirmed that it was an error on our side not to include it in the numerator (and after testing the difference in a non-resonant TT production).
Even if the data-MC agreement was indeed good, I am not sure I agree that the effect of this scale factor is actually absorbed by our custom ttSF. The ttSF is a normalization factor computed on the mHH distribution, whereas this is a "shape weight" as a function of pT. Maybe we have been lucky, but this does not appear evident to me at first sight.

dzuolo · 2022-11-23T08:57:41Z

The instructions on the twiki seems to indicate to first check the data/MC in a tt dominated control region and then eventually compute a correction. So I would suggest to do this first.

portalesHEP · 2022-11-23T08:59:22Z

for (1): I think @dzuolo misquoted the twiki and we are in fact in the second bullet situation: " In case significant discrepancies are observed, a dedicated top pT reweighting function should be derived from this control region and applied across the analysis while monitoring the agreement of other distributions as a result of this reweighting. ". Our 'fault' here would then be that the custom correction that was derived was indeed not pT-dependant, but that does not change the conclusion that the correction provided on the twiki is not to be used.

jonamotta · 2022-11-23T09:12:44Z

Now I read better the TWiki, and I agree with @portalesHEP that the weight should be removed completely according to bullets 2 and 4 of case 3.1

What we could do is move to a pT-dependent computation of the ttSFs for the resonant analysis (or at least test it and see if that is in any way better than the normalization one we already have).

portalesHEP · 2022-11-23T09:17:05Z

What we could do is move to a pT-dependent computation of the ttSFs for the resonant analysis (or at least test it and see if that is in any way better than the normalization one we already have).

Agreed, but as @dzuolo said, before anything we should check if such weights are still needed

francescobrivio · 2022-11-23T09:20:13Z

I think the way to move forward for the resonant analysis is:

remove both topPtReweight and ttSF and check the data/MC agreement in the inverted resolved2b0j category (including the distribution of the top_pt)
check with Top POG if the recommendations for this reweigthing has changed for UL samples
only at that point decide if we want to apply the topPtReweight or not:
- if yes, apply it, check again the distributions and re-compute the ttSFs on top of this
- if no, the ttSFs should be re-computed in any case since the TT normalization is directly affected by this "bug"

For the EFT results based on the non-resonant analysis I will check with HH conveners to see if we want update this (which would require a significant effort) or if we want to base the EFT results exactly on the same HIG-20-010 analysis.

dzuolo · 2023-03-08T11:00:21Z

@portalesHEP @bfonta @kramerto @riga We need to decide how to proceed with this issue in the new ntuples: I would propose to remove topPtReweight from EvtW in the skimmers and then check the data/MC agreement in the inverted resolved2b0j. What do you think?

portalesHEP · 2023-03-08T11:04:40Z

Hi, I think we should keep it in the skimmer (but correct it), and just remove the -t option in the submission script (that would just set the weight to its default value of 1, like for any non-top sample). That way, if we do realise later on that we have some reason to put it back it'll be easier

dzuolo · 2023-03-08T11:08:45Z

I am not sure I understand your point Louis. The weight is already stored in the skims with the last value suggested by the POG in the m_TTtopPtreweight_up branch. What i am saying is that we should remove it from the computation of the EvtW which is the denominator of the normalization. I believe it should not be there if we do not apply the weight also in the numerator.

portalesHEP · 2023-03-08T11:11:32Z

I'm saying that instead of removing it from the denominator, we should add it to the numerator, but insure that for now the weight is set to 1 (which should be done by removing the -t flag in the skim submission iirc)

dzuolo · 2023-03-08T11:14:21Z

Ah ok! Now I understand, this seems ok to me!

kramerto · 2023-03-08T12:19:44Z

That is also how we submitted the 2017 Skims for now. Without the -t option, so no top pt reweighting and no tt stitching (which shouldn't be needed anyway since the samples we use have no overlap)

francescobrivio added the bug label Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in the handling fo the normalization in the skimmers #273

Possible bug in the handling fo the normalization in the skimmers #273

francescobrivio commented Nov 22, 2022

portalesHEP commented Nov 23, 2022

dzuolo commented Nov 23, 2022

jonamotta commented Nov 23, 2022 •

edited

Loading

portalesHEP commented Nov 23, 2022

jonamotta commented Nov 23, 2022

dzuolo commented Nov 23, 2022

portalesHEP commented Nov 23, 2022

jonamotta commented Nov 23, 2022

portalesHEP commented Nov 23, 2022 •

edited

Loading

francescobrivio commented Nov 23, 2022 •

edited

Loading

dzuolo commented Mar 8, 2023

portalesHEP commented Mar 8, 2023

dzuolo commented Mar 8, 2023

portalesHEP commented Mar 8, 2023

dzuolo commented Mar 8, 2023

kramerto commented Mar 8, 2023

Possible bug in the handling fo the normalization in the skimmers #273

Possible bug in the handling fo the normalization in the skimmers #273

Comments

francescobrivio commented Nov 22, 2022

portalesHEP commented Nov 23, 2022

dzuolo commented Nov 23, 2022

jonamotta commented Nov 23, 2022 • edited Loading

portalesHEP commented Nov 23, 2022

jonamotta commented Nov 23, 2022

dzuolo commented Nov 23, 2022

portalesHEP commented Nov 23, 2022

jonamotta commented Nov 23, 2022

portalesHEP commented Nov 23, 2022 • edited Loading

francescobrivio commented Nov 23, 2022 • edited Loading

dzuolo commented Mar 8, 2023

portalesHEP commented Mar 8, 2023

dzuolo commented Mar 8, 2023

portalesHEP commented Mar 8, 2023

dzuolo commented Mar 8, 2023

kramerto commented Mar 8, 2023

jonamotta commented Nov 23, 2022 •

edited

Loading

portalesHEP commented Nov 23, 2022 •

edited

Loading

francescobrivio commented Nov 23, 2022 •

edited

Loading