In [None]:
import subprocess
import neutralb1.utils

WORKSPACE_DIR = neutralb1.utils.get_workspace_dir()

git_hash = subprocess.check_output(['git', 'rev-parse', 'HEAD'], cwd=WORKSPACE_DIR).decode('utf-8').strip()
print(git_hash)

**Repository Version** 
This notebook was run at commit:
`4f209e3f781a818361126aac3d976f7fec2d7e52`

# Verifying the Projection of Moments with Signal MC
As found previously, the projected moments have some missing factors causing them not to match expectations. Since then, two major updates have occured:
1. I've developed a direct fit to data using moments as AmpTools parameters, which should now provide a set of "true" values we can compare the projected moments to.
   <br>a. Has issues in extracting moments with $>1\%$ contribution, but this should be enough to track down factors
2. The old python projection script has now been replaced by a c++ version, that also includes the necessary normalization integrals
   a. The script may likely be updated over time, so check the commit hash for what version to use.

This study will proceed as follows:
1. Generate Signal Monte Carlo (MC) according to a pseudo-realistic set of waves (no acceptance effects i.e. *thrown*)
   <br>a. 35% polarization and in the PARA_0 orientation
2. Fit MC with same waveset, and obtain a fit result that should match the generated wave values
3. Project moments from the fit result to obtain a projected moment-set $H_{\text{proj}}$
4. Fit MC with the same number of moments, and obtain a fitted moment-set $H_{\text{fit}}$
5. Compare the fit and projected sets to investigate the missing factors.

## Setup

In [None]:
# load common libraries
import pandas as pd
import pathlib
import os, sys
import numpy as np
import matplotlib.pyplot as plt

# load neutralb1 libraries
import neutralb1.utils as utils
from neutralb1.analysis.result import ResultManager

utils.load_environment()

# load in useful directories as constants
CWD = pathlib.Path.cwd()
STUDY_DIR = f"{WORKSPACE_DIR}/studies/input-output-tests/verify-moment"
TRUTH_DIR = f"{STUDY_DIR}/data/amp_truth"
AMP_DIR = f"{STUDY_DIR}/data/amplitude_results"
MOMENT_DIR = f"{STUDY_DIR}/data/moment_results"

# set env variables for shell cells
os.environ["WORKSPACE_DIR"] = WORKSPACE_DIR
os.environ['STUDY_DIR'] = STUDY_DIR
os.environ['TRUTH_DIR'] = TRUTH_DIR
os.environ['AMP_DIR'] = AMP_DIR
os.environ['MOMENT_DIR'] = MOMENT_DIR


## Data Generation and Fits

### Generate
We'll use the same cfg file to generate and fit the amplitude-based Monte Carlo with. This will be done in a single bin of mass at:
* $0.1 < -t < 0.2$
* $8.2 < E_\gamma < 8.8$
* $1.20 < M_{\omega\pi^0} < 1.22$

The data file produced by `gen_vec_ps` is used by the fits, but only contains the simple 4-vectors and none of the histograms that we typically use for conversion to a csv file. So we must use the `gen_vec_ps_diagnostic.root` file to extract this information. This file has no information on the $E_\gamma$ variable though, but this is okay to leave empty as we know what range we're generating / do not use it from the csv.

In [None]:
%%bash
cat $STUDY_DIR/cfg_files/amplitudes.cfg

In [None]:
%%bash 
if [ -e "${STUDY_DIR}/data/root_files/data.root" ]; then
    echo "data exists, skipping generation."
else 
    echo "Generating data..."
    gen_vec_ps -c ${STUDY_DIR}/cfg_files/amplitudes.cfg\
        -o ${STUDY_DIR}/data/root_files/data.root\
        -l 1.20 -u 1.22\
        -n 50000\
        -a 8.2 -b 8.8\
        -tmin 0.1 -tmax 0.2
    if [ -e "${STUDY_DIR}/data/root_files/data.root" ]; then
        echo "Data generation successful."
    else
        echo "Data generation failed."
        exit 1
    fi
fi

In [None]:
%%bash
python $WORKSPACE_DIR/src/neutralb1/batch/convert_to_csv.py\
    -i $STUDY_DIR/data/root_files/gen_vec_ps_diagnostic.root -o $STUDY_DIR/data/csv_files/data.csv

### Fitting

#### Truth Generation
To compare our later amplitude results to the true values, we need to perform a "truth fit" by fixing the production coefficients in a `.cfg` file to the same values we used to generate the MC with. Since these values are sensitive to the total number of events, we multiply all the fixed amplitudes by a common `intensity_scale` factor to adjust them properly

In [None]:
%%bash
cat $STUDY_DIR/cfg_files/truth.cfg

Only a single fit needs to be performed, which we can easily do here. There's no need to produce angular distribution plots since we know they will match by construction

In [None]:
%%bash
cd $TRUTH_DIR

# symlink to data and phasespace files
ln -sf $STUDY_DIR/data/root_files/data.root ./data.root
ln -sf ${STUDY_DIR}/data/root_files/anglesOmegaPiPhaseSpace.root ./anglesOmegaPiPhaseSpace.root
ln -sf ${STUDY_DIR}/data/root_files/anglesOmegaPiPhaseSpaceAcc.root ./anglesOmegaPiPhaseSpaceAcc.root

if [ -e ./omegapi.fit ]; then
    echo "Truth fit already exists, skipping."
else
    echo "Running truth fit..."
    fit -c ${STUDY_DIR}/cfg_files/truth.cfg > truth_fit.log
    if [ -e ./omegapi.fit ]; then
        echo "Truth fit successful."
    else
        echo "Truth fit failed."        
    fi
fi

In [None]:
%%bash

cd $TRUTH_DIR
python $WORKSPACE_DIR/src/neutralb1/batch/convert_to_csv.py\
    -i omegapi.fit -o $STUDY_DIR/data/csv_files/truth.csv

#### Amplitudes
Amplitude fits will require a GPU session due to their performance requirements

In [None]:
%%bash
if [ -e "${AMP_DIR}/omegapi.fit" ]; then
    echo "Amplitude results exist, skipping fitting."
else
    echo "Run 'fit -c ${STUDY_DIR}/cfg_files/amplitudes.cfg -m 10000000 -r 50 > amplitude_fit.log' on an interactive GPU node to fit the data."
fi

Once fits are complete, generate files to view the angular distributions for the vecps_plotter

In [None]:
%%bash

cd ${AMP_DIR}

# create symlinks so the vecps_plotter can find the data/phasespace files
ln -sf ${STUDY_DIR}/data/root_files/data.root ./data.root
ln -sf ${STUDY_DIR}/data/root_files/anglesOmegaPiPhaseSpace.root ./anglesOmegaPiPhaseSpace.root
ln -sf ${STUDY_DIR}/data/root_files/anglesOmegaPiPhaseSpaceAcc.root ./anglesOmegaPiPhaseSpaceAcc.root

if [ -e ./vecps_plot.root ]; then
    echo "Plotter output already exists, skipping plotting."
else
    echo "Plotting results..."
    vecps_plotter ./omegapi.fit
    angle_plotter ./vecps_plot.root "Thrown MC" "" ${AMP_DIR} --gluex-style
fi

In [None]:
utils.display_pdf(f"{AMP_DIR}/fit.pdf")

Convert the fit output to csv files

In [None]:
%%bash

cd ${AMP_DIR}
python $WORKSPACE_DIR/src/neutralb1/batch/convert_to_csv.py\
    -i omegapi.fit -o  $STUDY_DIR/data/csv_files/amplitude_result.csv
python $WORKSPACE_DIR/src/neutralb1/batch/convert_to_csv.py\
    -i omegapi.fit -o $STUDY_DIR/data/csv_files/projected_moments.csv --moments

#### Moments

Same process as the amplitude fits

In [None]:
%%bash
cat ${STUDY_DIR}/cfg_files/moments.cfg

In [None]:
%%bash
cd $MOMENT_DIR
ln -sf ${STUDY_DIR}/data/root_files/data.root ./data.root
ln -sf ${STUDY_DIR}/data/root_files/anglesOmegaPiPhaseSpace.root
ln -sf ${STUDY_DIR}/data/root_files/anglesOmegaPiPhaseSpaceAcc.root

if [ -e "./omegapi.fit" ]; then
    echo "Moment results exist, skipping fitting."
else
    echo "Run 'fit -c ${STUDY_DIR}/cfg_files/moments.cfg -m 10000000 -r 50 > moment_fit.log' on an interactive GPU node to fit the data."
fi

In [None]:
%%bash

cd ${MOMENT_DIR}
if [ -e ./vecps_plot.root ]; then
    echo "Plotter output already exists, skipping plotting."
else
    echo "Plotting results..."
    vecps_plotter ./omegapi.fit
    angle_plotter ./vecps_plot.root "Thrown MC" "" ${MOMENT_DIR} --gluex-style
fi

In [None]:
utils.display_pdf(f"{MOMENT_DIR}/fit.pdf")

In [None]:
%%bash

# The error for the following command is expected, the vecPSMoment amplitude cannot be parsed into coherent sums. This is fine because the 
# moments are stored as parameters, which do get extracted.
cd ${MOMENT_DIR}
python $WORKSPACE_DIR/src/neutralb1/batch/convert_to_csv.py\
    -i omegapi.fit -o  $STUDY_DIR/data/csv_files/moment_result.csv

## Analysis

In [None]:
# first load in our dataframes
data_df = pd.read_csv(f"{STUDY_DIR}/data/csv_files/data.csv")
fit_results_df = pd.read_csv(f"{STUDY_DIR}/data/csv_files/amplitude_result.csv")
truth_df = pd.read_csv(f"{STUDY_DIR}/data/csv_files/truth.csv")
projected_moments_df = pd.read_csv(f"{STUDY_DIR}/data/csv_files/projected_moments.csv")
fitted_moments_df = pd.read_csv(f"{STUDY_DIR}/data/csv_files/moment_result.csv")

### Checking Amplitude Results
We'll first want to make sure that our amplitude-based fits actually resolved to the values we generated with before we project them out

In [None]:
amplitude_results = ResultManager(fit_results_df, data_df, truth_df=truth_df)
amplitude_results.preprocess(linker_max_depth=2)

In [None]:
amp_columns = [l for sublist in utils.get_coherent_sums(amplitude_results.fit_df).values() for l in sublist]
phase_columns = list(set(utils.get_phase_differences(amplitude_results.fit_df).values()))

Print the percentage uncertainty of the results to make sure we don't have errors on the same order of magnitude as the result. Negative reflectivities are quite small so some of their values may be >50%

In [None]:
for col in amp_columns:
    rel_err = (amplitude_results.fit_df[f"{col}_err"] / amplitude_results.fit_df[col]).iloc[0] * 100
    if rel_err > 50:
        print(f"{col}: {rel_err}")
for col in phase_columns:    
    rel_err = (amplitude_results.fit_df[f"{col}_err"] / amplitude_results.fit_df[col]).iloc[0] * 100    
    print(f"{col}: {rel_err}")

The standardized residuals for the amplitudes will let us know if any of the coherent sums are very far outside of the range captured by the error. They are not printed for the phases, as the errors are *very* underestimated for them and thus the standardized residuals are not trustworthy

In [None]:
for col in amp_columns:
    std_res = ((amplitude_results.fit_df[col] - amplitude_results.truth_df[col]) / amplitude_results.fit_df[f"{col}_err"]).iloc[0]
    if std_res > 3 or std_res < -3:
        print(f"{col}: {std_res}")

### Comparing Projected to Fitted Moments
We don't have "truth" moments to compare to, so we can move forward with checking how the moments projected from the amplitudes $H_{\text{projected}}$ compare to the moments obtained by fitting them to data with AmpTools $H_{\text{fitted}}$

In [None]:
# Find columns starting with "H" in both dataframes
proj_cols = [col for col in projected_moments_df.columns if col.startswith("H")]
fit_cols = [col for col in fitted_moments_df.columns if col.startswith("H") and not col.endswith("_err")]
print(proj_cols)
print(fit_cols)

The projected moments are split into real and imaginary parts, whereas the fitted moments are forced to be one or the other. We expect the $H^0$ and $H^1$ moments to be purely real, and the $H^2$ moments to be purely imaginary. We'll make sure those values are close to zero.

In [None]:
H0_imag = [c for c in proj_cols if "H0" in c and "_imag" in c]
H1_imag = [c for c in proj_cols if "H1" in c and "_imag" in c]
H2_real = [c for c in proj_cols if "H2" in c and "_real" in c]

print(projected_moments_df[H0_imag + H1_imag + H2_real].max().max())

In [None]:
# Drop the "_imag" columns for H0 and H1, and the "_real" columns for H2
imag_cols_to_drop = [col for col in proj_cols if (col.startswith("H0") or col.startswith("H1")) and col.endswith("_imag")]
filtered_proj_moments_df = projected_moments_df.drop(columns=imag_cols_to_drop)
real_cols_to_drop = [col for col in proj_cols if col.startswith("H2") and col.endswith("_real")]
filtered_proj_moments_df = filtered_proj_moments_df.drop(columns=real_cols_to_drop)

# remove the real or imag suffix
filtered_proj_moments_df = filtered_proj_moments_df.rename(
    columns={col: col.replace("_real", "").replace("_imag", "") for col in filtered_proj_moments_df.columns}
)

In [None]:
# Ensure both dataframes have the same columns
print(set(filtered_proj_moments_df.columns) - set(fitted_moments_df.columns))

In [None]:
# obtain the ratios between the projected and fitted moments
ratio_df = pd.DataFrame(index=filtered_proj_moments_df.index)
moment_columns = [c for c in filtered_proj_moments_df.columns if c.startswith("H")]
ratios = []
ratio_errs = []

for col in moment_columns:
    ratio = filtered_proj_moments_df[col] / fitted_moments_df[col]
    # for now we'll just use the MINUIT errors from the fitted moments
    ratio_err = ratio * np.sqrt(np.square(fitted_moments_df[f"{col}_err"] / fitted_moments_df[col]))
    ratios.append(ratio)
    ratio_errs.append(ratio_err)

ratio_errs_df = pd.DataFrame(ratio_errs).T
ratio_errs_df.columns = moment_columns

# add the ratio errors to the ratio dataframe as a new row
ratio_df = pd.concat([ratio_df, pd.DataFrame(ratios).T], axis=1)
ratio_df = pd.concat([ratio_df, ratio_errs_df], axis=0, ignore_index=True)

In [None]:
plt.figure(figsize=(18, 5))
plt.errorbar(x=ratio_df.columns, y=ratio_df.iloc[0], yerr=abs(ratio_df.iloc[1]), marker='o', color="black")
plt.axhline(y=1, color='red', linestyle='--', label='Expected Ratio = 1')
plt.xticks(rotation=90)
plt.ylabel("Projected / Fitted Moment Ratio")
plt.xlabel("Moment")
plt.grid(True)
plt.tight_layout()
plt.show()