Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in the handling fo the normalization in the skimmers #273

Open
francescobrivio opened this issue Nov 22, 2022 · 16 comments
Open
Labels

Comments

@francescobrivio
Copy link
Contributor

While re-checking the sample normalization applied in the skimmers for the EFT studies of the non-resonant Run 2 analysis
I noticed a possible bug in the skimmers.
More precisely, in this line (which is used in the numerator when the samples are normalized)

theSmallTree.m_MC_weight = (isMC ? theBigTree.aMCatNLOweight * XS * stitchWeight * HHweight : 1) ;

the weight topPtReweight is missing.
This same weight is instead added in the denominator in these lines:
if (theBigTree.npu >= 0 && theBigTree.npu <= 99) // good PU weights
EvtW = isMC ? (theBigTree.aMCatNLOweight * reweight.weight(PUReweight_MC,PUReweight_target,theBigTree.npu,PUreweightFile) * topPtReweight * HHweight * stitchWeight) : 1.0;
else if (theBigTree.npu >= 100) // use the last available bin for the PU weight
EvtW = isMC ? (theBigTree.aMCatNLOweight * reweight.weight(PUReweight_MC,PUReweight_target,99, PUreweightFile) * topPtReweight * HHweight * stitchWeight) : 1.0;

The topPtReweight is computed in the gen-section of the skimmers and it's different from 1 only for TT samples.
Even though the branches containing the topPtReweight weight are never used later on in the analysis, as shown above, this weight is actually included in the denominator (totalEvents), but not in the numerator.

I quickly looked at the TTtopPtreweight_up branch of the skims (which is actually filled with topPtReweight, as shown here) for a skim file of the TT_semiLep sample and I can confirm the value is != 1:
TTtopPtreweight_up_TTsemiLep_2016

@portalesHEP @dzuolo @jonamotta this is most probably a bug that we missed in the past, but further checks (i.e. including re-skimming the TT samples with the addition of this weight in the numerator) are in order.

@portalesHEP
Copy link
Contributor

Thanks for the heads up! I'm not entirely clear on why this weight is needed, and I found out that its value is hardcoded in the skimmer:

const double aTopRW = 0.0615;
const double bTopRW = -0.0005;
float SFTop1 = TMath::Exp(aTopRW+bTopRW*ptTop1);
float SFTop2 = TMath::Exp(aTopRW+bTopRW*ptTop2);
topPtReweight = TMath::Sqrt (SFTop1*SFTop2);
Do you know where these values come from and if we should simply remove them from the sum of weight or rather add them back to the event weight for TT events?

@dzuolo
Copy link
Contributor

dzuolo commented Nov 23, 2022

Hi @portalesHEP! If we understood correctly the twiki https://twiki.cern.ch/twiki/bin/viewauth/CMS/TopPtReweighting we are in category "Case 3.1: Analyses with SM tt as background (not in signal)" and, specifically: "In a control region which is signal-depleted and tt-enriched, one should check the data-MC agreement of the main distributions of the analysis, together with the top pT. If the agreement between the data and MC is within the available uncertainties (syst. and stat.) then the effect of top pT mismodeling can be considered covered by the existing uncertainties and no additional correction or uncertainty is needed." So we should remove this weight from the sum of weight and check the data/MC agreement in the inverted resolved2b0j category.

@jonamotta
Copy link
Contributor

jonamotta commented Nov 23, 2022

To my knowledge, the reweighting is done according to what is written in this page.
Even if I admit that by looking at it now, the hardcoded numbers coincide but the method itself no...

@portalesHEP
Copy link
Contributor

Ok, thanks for the pointers. Then I'd tend to agree that we should remove the weights before the next skimming round. I suppose that this should not have had any critical impact on the non-resonant result though (?), since there was a dedicated TT correction extracted which should've absorbed any issue introduced by this bug

@jonamotta
Copy link
Contributor

I am not sure I agree with either of the points.

  1. I would say that the weights should be correctly re-introduced for the next skimming, after we have confirmed that it was an error on our side not to include it in the numerator (and after testing the difference in a non-resonant TT production).
  2. Even if the data-MC agreement was indeed good, I am not sure I agree that the effect of this scale factor is actually absorbed by our custom ttSF. The ttSF is a normalization factor computed on the mHH distribution, whereas this is a "shape weight" as a function of pT. Maybe we have been lucky, but this does not appear evident to me at first sight.

@dzuolo
Copy link
Contributor

dzuolo commented Nov 23, 2022

The instructions on the twiki seems to indicate to first check the data/MC in a tt dominated control region and then eventually compute a correction. So I would suggest to do this first.

@portalesHEP
Copy link
Contributor

for (1): I think @dzuolo misquoted the twiki and we are in fact in the second bullet situation: " In case significant discrepancies are observed, a dedicated top pT reweighting function should be derived from this control region and applied across the analysis while monitoring the agreement of other distributions as a result of this reweighting. ". Our 'fault' here would then be that the custom correction that was derived was indeed not pT-dependant, but that does not change the conclusion that the correction provided on the twiki is not to be used.

@jonamotta
Copy link
Contributor

Now I read better the TWiki, and I agree with @portalesHEP that the weight should be removed completely according to bullets 2 and 4 of case 3.1

What we could do is move to a pT-dependent computation of the ttSFs for the resonant analysis (or at least test it and see if that is in any way better than the normalization one we already have).

@portalesHEP
Copy link
Contributor

portalesHEP commented Nov 23, 2022

What we could do is move to a pT-dependent computation of the ttSFs for the resonant analysis (or at least test it and see if that is in any way better than the normalization one we already have).

Agreed, but as @dzuolo said, before anything we should check if such weights are still needed

@francescobrivio
Copy link
Contributor Author

francescobrivio commented Nov 23, 2022

I think the way to move forward for the resonant analysis is:

  • remove both topPtReweight and ttSF and check the data/MC agreement in the inverted resolved2b0j category (including the distribution of the top_pt)
  • check with Top POG if the recommendations for this reweigthing has changed for UL samples
  • only at that point decide if we want to apply the topPtReweight or not:
    • if yes, apply it, check again the distributions and re-compute the ttSFs on top of this
    • if no, the ttSFs should be re-computed in any case since the TT normalization is directly affected by this "bug"

For the EFT results based on the non-resonant analysis I will check with HH conveners to see if we want update this (which would require a significant effort) or if we want to base the EFT results exactly on the same HIG-20-010 analysis.

@dzuolo
Copy link
Contributor

dzuolo commented Mar 8, 2023

@portalesHEP @bfonta @kramerto @riga We need to decide how to proceed with this issue in the new ntuples: I would propose to remove topPtReweight from EvtW in the skimmers and then check the data/MC agreement in the inverted resolved2b0j. What do you think?

@portalesHEP
Copy link
Contributor

Hi, I think we should keep it in the skimmer (but correct it), and just remove the -t option in the submission script (that would just set the weight to its default value of 1, like for any non-top sample). That way, if we do realise later on that we have some reason to put it back it'll be easier

@dzuolo
Copy link
Contributor

dzuolo commented Mar 8, 2023

I am not sure I understand your point Louis. The weight is already stored in the skims with the last value suggested by the POG in the m_TTtopPtreweight_up branch. What i am saying is that we should remove it from the computation of the EvtW which is the denominator of the normalization. I believe it should not be there if we do not apply the weight also in the numerator.

@portalesHEP
Copy link
Contributor

I'm saying that instead of removing it from the denominator, we should add it to the numerator, but insure that for now the weight is set to 1 (which should be done by removing the -t flag in the skim submission iirc)

@dzuolo
Copy link
Contributor

dzuolo commented Mar 8, 2023

Ah ok! Now I understand, this seems ok to me!

@kramerto
Copy link
Contributor

kramerto commented Mar 8, 2023

That is also how we submitted the 2017 Skims for now. Without the -t option, so no top pt reweighting and no tt stitching (which shouldn't be needed anyway since the samples we use have no overlap)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants