# What Detector Effects Break $\omega\pi^0$ Amplitude Analysis?
Initial input output tests in $\omega\pi^0$ are showing bin-to-bin inconsistencies in fit results from an amplitude analysis. The goal of this notebook will be to determine where these inconsistencies are sourced from so that we may go about solving them. (Discussion of AmpTools efficiency calculation here)

Questions: 
1. **What detector effects are most detrimental to the amplitude analysis results?**
2. **Does the AmpTools efficiency calculation impact fit performance?**

## Setup
To test this, we will be conducting input-output tests using [$\omega\pi^0$ Monte Carlo signal and phasespace files.](https://halldweb.jlab.org/wiki-private/index.php/Omega_Pi_Simulation_Samples_Version_3#Neutral_signal_versionsver3.1). We'll perform the amplitude analysis in independent bins of $\omega\pi^0$ invariant mass, and the "output" fit results in each bin are compared to the generated "input" values we expect the fit to successfully converge to. 

### Effect Options
Through the DSelector, we have access to a wide array of options to turn on / off effects of our detector simulation. These are:
1. **Detector Acceptance:** applies the detector efficiency to events 
2. **Reconstruction:** An algorithm uses the hits within the detector to reconstruct the particle tracks from the decay. This effectively "smears" the particle 4-momenta with detector resolution effects and reconstruction error
3. **Misidentification & Combinatorics:** Allow ourselves to "forget" the origin and exact ID of each particle, and rely on the kinematic fitter and our cuts to do this instead.
4. **Out-of-time photons (Oot $\gamma$):** simulates background photon "sidebands" that are statistically subtracted to obtain the photon that generated the event. This option can be combined with any of the above effects

By performing an amplitude analysis to datasets produced from these options, we can determine which effect is most detrimental to the fit result

### The Input
(make a table of consistent cuts being applied across all bins, -t, masses, E_beam, etc)

Input files are obtained by passing the Monte Carlo ROOT trees through a DSelector, using a standard set of cuts on values like the $\chi^2_{\text{kinfit}}$. Depending on what detector effects we include some of these cuts are turned on/off. For example, if we don't want to include out-of-time photons then we do not utilize an RF sideband subtraction, since we know for certain what photon generated the event. For each detector effect situation, we then have a data file for the entire $\omega\pi^0$ mass range. To obtain the files for the mass-independent fits, we copy the trees to each bin and apply cuts on the $-t$, $\gamma_{\text{beam}}$, and $M_{\omega\pi^0}$ values. Note that an extra cut is included on the invariant mass of the $p\pi^0$ pair as well, requiring $M_{p\pi^0} > 1.4~GeV$. This cut is applied in real data to remove the $\Delta^+$ baryon contribution. We don't generate the $\Delta^+$ in our MC and so do not strictly *need* this cut, but past work has indicated that leaving this cut in is not detrimental to the fit results.

The data files, now portioned into bins of $\omega\pi^0$ mass, allow us to calculate the detector efficiency $\eta_{\text{data}}$ in each bin. We have a data file for the "thrown" case, in which no detector effects are applied, and so dividing any dataset by this tells us the efficiency of our detector:

$$\eta_{\text{data}} = \frac{\text{Events}_{~\text{gen}}}{\text{Events}_{~\text{detected}}}$$

With the data files ready for PWA, we arrive at a problem. The "true" production parameters that the Monte Carlo was generated with originate from a mass-dependent fit, applied over the entire $\omega\pi^0$ mass range, but we need the parameter values for individual bins so that we may determine how well a fit performed. To obtain these we do the following:
1. Create a config file that has all its amplitude parameters fixed to the generated values. Include the same fixed Breit-Wigner functions from the original mass-dependent fit in the config file as well.
2. Perform a "fit" in each $\omega\pi^0$ mass bin with this entirely fixed config file. The Breit-Wigner function will then weight the production parameters for that particular mass bin. For example, this ensures the $b_1(1235)$ is strongest in the 1.22 - 1.24 GeV bin, but has effectively zeroed out by 2.0 GeV.
3. Lastly the overall scaling of the production parameters is sensitive to the number of events in the original mass-dependent fit. So we introduce one free floating `scale` factor, that multiplies every amplitude in the cfg file. This allows the fit to adjust the overall scaling to match the events in each bin

The additional advantage of running a fit like this, is that all the coherent sum calculations can be done using AmpTools `FitResults` class on the output `.fit` files. We can produce dataframes that allow us to plot the *true* values we generated for any amplitude, and directly evaluate our fit performance relative to them.

### The Output
(explanation here, include that I'm grabbing the scale value from the truth fit in order to properly initialize it. Also need to discuss the efficiency we calculate from AmpTools)

## Analysis
As discussed, we have a wide array of combinations to choose from to simulate our detector effects. This section will go through each one of interest and produce plots to view fit results and determine relations between them and the efficiencies. Start by loading in our packages and paths

In [None]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from typing import List, Tuple

from pathlib import Path
import sys
parent_dir = str(Path().resolve().parents[2])
working_dir = f"{parent_dir}/analysis/input-output-tests/detector_effects/"
sys.path.insert(0, parent_dir)
import analysis.scripts.pwa_tools as pwa_tools

In addition to the pwa Plotter class, we'll need to quickly define some custom functions for obtaining and plotting the efficiencies and any correlation the efficiency has with amplitude pull distributions

In [None]:
# load in default matplotlib style
plt.style.use("/w/halld-scshelf2101/kscheuer/neutralb1/analysis/scripts/pwa_plotter.mplstyle")

def get_fit_efficiency(fit_file: pd.DataFrame) -> tuple[float,float]:
    # Get the efficiency and its error from a fit file
    eff = fit_file["detected_events"] / fit_file["generated_events"]
    eff_err = eff * np.sqrt(
        np.square(fit_file["detected_events_err"] / fit_file["detected_events"]) +
        np.square(fit_file["generated_events_err"] / fit_file["generated_events"])
    )
    return eff, eff_err

def get_data_efficiency(acc_file: pd.DataFrame, gen_file: pd.DataFrame) -> tuple[float,float]:
    # Get the efficiency and its error from a data file
    eff = acc_file["bin_contents"] / gen_file["bin_contents"]
    eff_err = eff * np.sqrt(
        np.square(acc_file["bin_error"] / acc_file["bin_contents"]) +
        np.square(gen_file["bin_error"] / gen_file["bin_contents"])
    )
    return eff, eff_err

def plot_efficiency(fit_df, acc_df, gen_df):
    # Plot efficiency as function of omega-pi0 mass

    fit_eff, fit_eff_err = get_fit_efficiency(fit_df)
    data_eff, data_eff_err = get_data_efficiency(acc_df, gen_df)
    
    mass_bins = gen_df["mass_mean"]
    bin_width = (gen_df["mass_high_edge"] - gen_df["mass_low_edge"])[0]
    
    fig, ax = plt.subplots()
    ax.errorbar(
        x=mass_bins, y=data_eff, xerr=bin_width/2., yerr=data_eff_err,
        marker=".", linestyle="", color="black", label=r"$\eta_{~\text{data}}$"
    )
    ax.errorbar(
        x=mass_bins, y=fit_eff, xerr=bin_width/2., yerr=fit_eff_err, 
        marker=".", linestyle="", color="red", label=r"$\eta_{~\text{fit}}$"
    )
    ax.legend()
    ax.set_xlabel(r"$\omega\pi^0$ inv. mass $(GeV)$", loc="right")
    ax.set_ylabel(f"Efficiency / 0.02 GeV", loc="top")

    plt.minorticks_on()
    plt.show()
    pass

def plot_jp_efficiency(fit_df, acc_df, gen_df, truth_df):
    # Plot difference of efficiencies as function of omega-pi0 mass with JP values
    # Essentially a copy of the mass_phase function in the pwa_tools module

    # grab the efficiencies and their errors
    fit_eff, fit_eff_err = get_fit_efficiency(fit_df)
    data_eff, data_eff_err = get_data_efficiency(acc_df, gen_df)
    
    # calculate the difference between the two efficiencies with their errors
    diff_eff = data_eff - fit_eff
    diff_eff_err = np.sqrt(
        np.square(data_eff_err) + np.square(fit_eff_err)
    )

    # grab the mass bins and widths
    mass_bins = gen_df["mass_mean"]
    bin_width = (gen_df["mass_high_edge"] - gen_df["mass_low_edge"])[0]
        
    fig, axs = plt.subplots(
        2, 1,
        sharex=True,
        gridspec_kw={"wspace": 0.0, "hspace": 0.07},
        height_ratios=[3, 1],
    )

    # ---AXS 0---
    # plot data points
    axs[0].errorbar(
        x=mass_bins, y=acc_df["bin_contents"], xerr=bin_width/2., yerr=acc_df["bin_error"],        
        fmt="k.", label="MC Events",
    )
    # plot fit result as gray histogram
    axs[0].bar(
        x=mass_bins, height=fit_df["detected_events"], width=bin_width,            
        color="0.1", alpha=0.15, label="Fit Result"
    )
    axs[0].errorbar(
        x=mass_bins, y=fit_df["detected_events"], yerr=fit_df["detected_events_err"],
        fmt=",", color="0.1", alpha=0.2, markersize=0,
    )

    # plot the JP values, hardcoded since we know what JP are present
    colors = matplotlib.colormaps["Dark2"].colors # use the same colors as the pwa Plotter
    axs[0].errorbar( # 1+
        x=mass_bins, y=fit_df["1p"], xerr=bin_width/2., yerr=fit_df["1p_err"],
        marker="o", markersize=6, linestyle="", color=colors[2], label=r"$1^{+}$"
    )
    axs[0].plot( # 1+ truth
        mass_bins, truth_df["1p"], 
        linestyle="-", marker="", color=colors[2],
    )
    axs[0].errorbar( # 1-
        x=mass_bins, y=fit_df["1m"], xerr=bin_width/2., yerr=fit_df["1m_err"],
        marker="s", markersize=6, linestyle="", color=colors[3], label=r"$1^{-}$"
    )
    axs[0].plot( # 1- truth
        mass_bins, truth_df["1m"], 
        linestyle="-", marker="", color=colors[3],
    )


    axs[0].set_ylabel(f"Events / {bin_width:.3f} GeV", loc="top")
    axs[0].set_ylim(bottom=0.0)

    # ---AXS 1---
    # plot the efficiency difference
    axs[1].errorbar(
        x=mass_bins, y=diff_eff, xerr=bin_width/2., yerr=diff_eff_err,
        marker=".", linestyle="", color="black"
    )
    axs[1].axhline(0, color="black", linestyle="-")

    axs[1].set_xlabel(r"$\omega\pi^0$ inv. mass $(GeV)$", loc="right")
    axs[1].set_ylabel(r"$\Delta \eta$", loc="center")

    axs[0].legend(loc="upper right")

    plt.minorticks_on()
    plt.show()
    pass

def plot_correlation(fit_df, acc_df, gen_df, truth_df, columns: List[str]) -> None:
    # Plot the correlation between the efficiency difference and the pull magnitude of the columns

    # check if the columns are present in the dataframes
    for col in columns:
        if col not in fit_df.columns:
            print(f"WARNING: Column {col} not present in the fit_df, exiting")
            return
        if col not in truth_df.columns:
            print(f"WARNING: Column {col} not present in the truth_df, exiting")
            return

    # calculate the difference between the two efficiencies
    fit_eff, _ = get_fit_efficiency(fit_df)    
    data_eff, _ = get_data_efficiency(acc_df, gen_df)    
    diff_eff = data_eff - fit_eff    

    if diff_eff.eq(0).all():
        print("WARNING: All efficiency differences are zero, returning 0 correlation")    
        corr = 0 

    # plot the correlation between the efficiency difference and the pull magnitude of each column
    for col in columns:
        pull = ((fit_df[col] - truth_df[col]) / fit_df[f"{col}_err"]).abs()

        if np.isnan(pull).any():
            print(f"WARNING: NaN values in pull, skipping column {col}")
            continue           
        if not diff_eff.eq(0).all():
            corr = np.corrcoef(diff_eff, pull)[0,1]

        plt.scatter(diff_eff, pull, marker=".", label=f"{pwa_tools.convert_amp_name(col)}: r={corr:.3f}")
        
    plt.xlabel(r"$\Delta \eta$", loc="right")
    plt.ylabel(r"$|~\text{pull}~|$", loc="top")
    plt.minorticks_on()    
    plt.legend()
    plt.show()
    pass

Load in the thrown file, as its common to all the sections below

In [None]:
thrown_data = pd.read_csv(f"{working_dir}/thrown/data.csv")

### Ideal Detector (no effects)
TODO: Explain each of these plots, so future ones make more sense

In [None]:
thrown_fit = pd.read_csv(f"{working_dir}/thrown/fit.csv", index_col="index")
thrown_truth = pd.read_csv(f"{working_dir}/thrown/truth.csv", index_col="index")
# no accepted data since the thrown data is the "accepted" data

thrown_plotter = pwa_tools.Plotter(thrown_fit, thrown_data, truth_df=thrown_truth)

plot_efficiency(thrown_fit, thrown_data, thrown_data)
plot_jp_efficiency(thrown_fit, thrown_data, thrown_data, thrown_truth)
plot_correlation(thrown_fit, thrown_data, thrown_data, thrown_truth, ["1p", "1m"])

thrown_plotter.intensities()
thrown_plotter.intensities(True, True)

pos_refl_waves = [x for x in pwa_tools.get_coherent_sums(thrown_fit)["eJPmL"] if x[-1] != "D" and x[0] == "p"]
plot_correlation(thrown_fit, thrown_data, thrown_data, thrown_truth, pos_refl_waves)

### Detector Acceptance

In [None]:
acc_noaccid_data = pd.read_csv(f"{working_dir}/accept_noaccidental/data.csv")
acc_noaccid_fit = pd.read_csv(f"{working_dir}/accept_noaccidental/fit.csv", index_col="index")
acc_noaccid_truth = pd.read_csv(f"{working_dir}/accept_noaccidental/truth.csv", index_col="index")

acc_noaccid_plotter = pwa_tools.Plotter(acc_noaccid_fit, acc_noaccid_data, truth_df=acc_noaccid_truth)

plot_efficiency(acc_noaccid_fit, acc_noaccid_data, thrown_data)
plot_jp_efficiency(acc_noaccid_fit, acc_noaccid_data, thrown_data, acc_noaccid_truth)
plot_correlation(acc_noaccid_fit, acc_noaccid_data, thrown_data, acc_noaccid_truth, ["1p", "1m"])

acc_noaccid_plotter.intensities()
acc_noaccid_plotter.intensities(True, True)

pos_refl_waves = [x for x in pwa_tools.get_coherent_sums(acc_noaccid_fit)["eJPmL"] if x[-1] != "D" and x[0] == "p"]
plot_correlation(acc_noaccid_fit, acc_noaccid_data, thrown_data, acc_noaccid_truth, pos_refl_waves)

### Detector Acceptance & $\gamma_{\text{tag}}$
$\gamma_{\text{tag}}$ refers to the beam photon 4-vector being the values extracted by the tagger, and not using the generated values

In [None]:
acc_data = pd.read_csv(f"{working_dir}/accept/data.csv")
acc_fit = pd.read_csv(f"{working_dir}/accept/fit.csv", index_col="index")
acc_truth = pd.read_csv(f"{working_dir}/accept/truth.csv", index_col="index")

acc_plotter = pwa_tools.Plotter(acc_fit, acc_data, truth_df=acc_truth)

plot_efficiency(acc_fit, acc_data, thrown_data)
plot_jp_efficiency(acc_fit, acc_data, thrown_data, acc_truth)
plot_correlation(acc_fit, acc_data, thrown_data, acc_truth, ["1p", "1m"])

acc_plotter.intensities()
acc_plotter.intensities(True, True)

pos_refl_waves = [x for x in pwa_tools.get_coherent_sums(acc_fit)["eJPmL"] if x[-1] != "D" and x[0] == "p"]
plot_correlation(acc_fit, acc_data, thrown_data, acc_truth, pos_refl_waves)

### All Effects *Except* $\gamma_{\text{tag}}$
This includes detector acceptance, resolution from reconstruction, combinatorics / Mis-ID, but does *not* use any sideband subtraction for the beam photons

In [None]:
noacc_fit = pd.read_csv(f"{working_dir}/noaccidental/fit.csv", index_col="index")
noacc_truth = pd.read_csv(f"{working_dir}/noaccidental/truth.csv", index_col="index")
noacc_data = pd.read_csv(f"{working_dir}/noaccidental/data.csv")

noacc_plotter = pwa_tools.Plotter(noacc_fit, noacc_data, truth_df=noacc_truth)

plot_efficiency(noacc_fit, noacc_data, thrown_data)
plot_jp_efficiency(noacc_fit, noacc_data, thrown_data, noacc_truth)
plot_correlation(noacc_fit, noacc_data, thrown_data, noacc_truth, ["1p", "1m"])
noacc_plotter.intensities()
noacc_plotter.intensities(True, True)

pos_refl_waves = [x for x in pwa_tools.get_coherent_sums(noacc_fit)["eJPmL"] if x[-1] != "D" and x[0] == "p"]
plot_correlation(noacc_fit, noacc_data, thrown_data, noacc_truth, pos_refl_waves)

### All Effects Applied

In [None]:
all_fit = pd.read_csv(f"{working_dir}/all_effects/fit.csv", index_col="index")
all_truth = pd.read_csv(f"{working_dir}/all_effects/truth.csv", index_col="index")
all_data = pd.read_csv(f"{working_dir}/all_effects/data.csv")

all_effects_plotter = pwa_tools.Plotter(all_fit, all_data, truth_df=all_truth)

plot_efficiency(all_fit, all_data, thrown_data)
plot_jp_efficiency(all_fit, all_data, thrown_data, all_truth)
plot_correlation(all_fit, all_data, thrown_data, all_truth, ["1p", "1m"])
all_effects_plotter.intensities()
all_effects_plotter.intensities(True, True)

pos_refl_waves = [x for x in pwa_tools.get_coherent_sums(all_fit)["eJPmL"] if x[-1] != "D" and x[0] == "p"]
plot_correlation(all_fit, all_data, thrown_data, all_truth, pos_refl_waves)