In [None]:
import ROOT

### Exercise 1

1. **Question**: Further improve the plot with the pull distribution by visualizing also the post-fit uncertainty of the model. Figure out how to do this by reading the documentation of [RooAbsPdf::plotOn()](https://root.cern.ch/doc/master/classRooAbsPdf.html#aa0f2f98d89525302a06a1b7f1b0c2aa6).

1. **Answer**: this can be done with the `VisualizeError()` option like this:
```python
model.plotOn(x_frame, Name="model", VisualizeError=fit_result);
```
Note that the uncertainty is relatively small because we have a large number of events.

### Exercise 2

2. **Question**: Look at the [rf203_ranges.py RooFit tutorial](https://root.cern/doc/master/rf203__ranges_8py.html) to learn how to restrict the fit to a subrange. Redo the convoluted template fit to the $y$ variable, but restricted to the range from 3 to 7.

   Why does the uncertainty of the `resolution` parameter increase, even though we are not excluding that much signal and `resolution` doesn't affect the background?

2. **Answer**: Define a new subrange for $y$ with `y.setRange("subrange", 3, 7)` and call `fitTo` with the `Range="subrange"` keyword argument.

The uncertainty on the `resolution` will increase, because also the sidebands with only background help to constrain the signal: they require that the resolution must be small enough for the signal to leak not too much in the sidebands.

### Exercise 3

3. **Question**: Interpret the likelihood plot over `n_sig`. Why is the profile NLL always below the other plotted NLL?

3. **Answer**: From the likelihood plot, we can read the best fit parameter value at the minimum and the uncertainties at the points where the NLL is offset by 0.5 from the minimum (see yesterdays statistics lecture on parameter estimation).

The red curve with the profile NLL is always lower than the NLL with the nuisance parameters fixed to the best-fit values, because for each value of the parameter of interest (POI), the nuisance parameters are optimized again. That's why the linklihood is always higher, or the negative log-likelihood lower.

The uncertainties from the fixed likelihood can be interpreted as statistical uncertainties only, while the uncertainties from the profile liklihood include the systematic uncertainties associated to the unknown nuisance parameters.

### Exercise 4

4. **Question**: In a fresh notebook, open the `RooWorkspace` we wrote to disk and create new toy data according to the 2D model. Re-fit the model to the new toy dataset.

4. **Answer**:

In [None]:
file = ROOT.TFile.Open("../notebooks/myworkspace.root", "READ")

In [None]:
ws = file.Get("myworkspace")

In [None]:
ws.Print()

In [None]:
data_xy = ws["model_xy"].generate([ws["x"], ws["y"]])

In [None]:
fit_result_xy = ws["model_xy"].fitTo(data_xy, PrintLevel=-1, Save=True)

In [None]:
fit_result_xy.Print()

### Exercise 5

5. **Question**: Which parameters are strongly (anti)correlated in the final 2D fit? Can you explain why?

5. **Answer**: if you print the correlation matrix of the fit result with `fit_result_xy.correlationMatrix().Print()`, you should see this:

```txt
     |      0    |      1    |      2    |      3    |      4    |      5    
-----------------------------------------------------------------------------
   0 |          1     0.05982    -0.08723   -0.008045      -0.116     -0.1308
   1 |    0.05982           1     -0.3631     -0.1182      -0.254      0.1301
   2 |   -0.08723     -0.3631           1      0.1724      0.3704     -0.1896
   3 |  -0.008045     -0.1182      0.1724           1     0.07469    -0.06765
   4 |     -0.116      -0.254      0.3704     0.07469           1     -0.1167
   5 |    -0.1308      0.1301     -0.1896    -0.06765     -0.1167           1
```
The strongest (anti-)correlations are between parameters 1 and 2, which are `n_sig` and `n_bkg`, and parameters 2 and 4, which are `n_sig` and `sigma`.

These correlations make intuitive sense: the sum of `n_sig` and `n_bkg` must be constant to match the total number of observed events, which is why they are anticorrelated. If `n_sig` would be larger, `n_bkg` needs to decrease. The positive correlation between `n_sig` and the signal width for $x$ (`sigma`) is also understandable: if the width of the signal distribution would be larger, the distribution would be less peaked, and it would need to be scaled up to still match the peak in the center.