# Add correlations to independent fission yield covariance matrix

In [None]:
import numpy as np
import pandas as pd
import scipy.sparse as sps

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")

In [None]:
import sandy

## Retrive nuclear data file

First, we define the energy and zam that we want to analyze and for the chosen zam we get the evaluated nuclear data file - in this case U-235 from ENDF/B-VIII.0 - using get_endf6_file:

In [None]:
e = 0.0253
zam = 922350
tape = sandy.get_endf6_file('endfb_80','nfpy', zam)

Then we can read the decay data information stored in the ENDF6 file and extract the values of the fission yields and their associated uncertainties:

In [None]:
nfpy = sandy.Fy.from_endf6(tape)
nfpy.data.set_index(["MAT", "MT", "ZAM", "ZAP"]).head()

## Obtain covariance matrix  

Assuming the uncertaities equal to the standard deviations, we can build the diagonal covariance matrix as follows:

In [None]:
fy_stdev = nfpy.data.query(f"ZAM=={zam} & E=={e} & MT==454").set_index('ZAP').DFY
Vx_prior = sandy.CategoryCov.from_stdev(fy_stdev)
Vx_prior.data.head().T.head()

We want to update the variance matrices of the independent fission yields using the method of generalised least squares (GLS). It is an adjustment technique that states that the information on some prior system parameters can be improved with the addition of newknowledge for which relationships between data and parameters are established. These relationships, or constraints, must be linearised in the form:

$$
y - y_a = S \cdot (\theta -\theta_a) 
$$

where $\theta$ are the parameters of the system, $\theta_a$ the prior estimates of $\theta$, $y$ the responses of the constraining  equation,  $y_a$ the responses of the constraining equation to the prior  estimates $\theta_a$ and S are the sensitivity coefficients of the response $y – y_a$ to the parameters $\theta – \theta_a$. 

It is assumed that no correlations existed between the prior and the new information. Then, further information $\eta$ could be introduced in order to derive refined values for the parameters $\theta$,  with all the available uncertainty information properly incorporated into the formalism. The updating process is the following:

$$
\theta - \theta_a = V_a - S^T \cdot (S \cdot V_a \cdot S^T + V)^{-1} \cdot (\eta - y_a)
$$

$$
V_s = V_a - V_a \cdot S^T \cdot \left(S\cdot V_a \cdot S^T + V \right)^{-1} \cdot S \cdot V_a
$$

where $V_a$ is the covariance matrix of the prior estimates of the  parameters $\theta$, $V$ is the covariance matrix of the introduced data fitting the constraining system $\eta$, and $V_s$ is the updated  covariance matrix of the system parameters $\theta$.

A more complete overview of the the  followed GLS technique is described in https://doi.org/10.1016/j.anucene.2015.10.027

To perform this updating process we exploit the relationship between the independent fission yield and the $\textit{chain ﬁssion yield ch(A)}$, which in matrix form is:

$$
D^T \cdot IFY = Ch 
$$

The design matrix $S$ in this case is $D^T$, while the parameters $\theta$ are the independent fission yields $IFY$ and the response is the vector with the chain ﬁssion yields $Ch$ 

By using evaluated chain ﬁssion yields to modify independent ﬁssion yield data, we assume that we have a deeper knowledge on the ﬁrst. This is a consistent assumption since 
the chain ﬁssion yield and uncertainties are evaluated mostly directly from the measurements while the independent ﬁssion yields are not.

The extra information for the evaluation of $V$ are obtained from the IAEA document  [Evaluation and Compilation of Fission Product Yields 1993](https://www-nds.iaea.org/endf349/la-ur-94-3106.pdf) (page 18-29).

In [None]:
ch_info = sandy.fy.get_chain_yields()
e_ = 'thermal'
ch_info_std = ch_info.query(f"ZAM=={zam} & E =='{e_}'").set_index("A").DCHY
Vy_extra = sandy.CategoryCov.from_stdev(ch_info_std).data
Vy_extra.head().T.head()

The chain fission yield may be confused with the $\textit{mass ﬁssion yield M(A)}$, indeed the two can differ by a few percent. In SANDY is possible to perform the GLS update procedure, adding the above constrain, considering the chain yield or mass yield and the related design matrices. Both options are shown in this notebook.

- ### First option: `mass yield`

With this option we assume that the chain fission yield is equal to the mass yield, so we don't take into account the delayed neutron emission. All the necessary information to calculate the $D^T$ matrix are stored in the `sandy.Fy` object. For this reason, the sensitivity matrix can be obtained with the method `sandy.Fy.get_mass_yield_sensitivity()`.

In [None]:
S = nfpy.get_mass_yield_sensitivity()
cov_massyield = nfpy.gls_cov_update(zam, e, S, Vy_extra=Vy_extra)
cov_massyield.data.head().T.head()

- ### Second option: `chain yield` 

With this option we evaluate the $D^T$ matrix which respects the constrain $\sum_{A_i=A}y(A_i) = ch(A)$. To do so, we also need the information stored in `sandy.DecayData` instance, so the design matrix $S$ must be calculated with the method `sandy.DecayData.get_chain_yield_sensitivity()`.

In [None]:
tape_rdd = sandy.get_endf6_file('endfb_80', 'decay', 'all')

In [None]:
rdd = sandy.DecayData.from_endf6(tape_rdd)

In [None]:
S = rdd.get_chain_yield_sensitivity()
cov_chainyield = nfpy.gls_cov_update(zam, e, S, Vy_extra=Vy_extra)
cov_chainyield.data.head().T.head()

## Visual comparison of correlation matrices 

A visual comparison of the correlation matrices obtained with the GLS method is shown above. Some nuclides were selected among the zap numbers to have a more clear visualization of the added correlations in a spy plot.

In [None]:
zap = pd.Index([
    320790,
    320791,
    360950,
    360940,
    551490,
    561480,
    561490,
    571480,
    571490,
    581480,
    591480,
    591481,
    601480,])
nuclide = pd.Series(zap.values, index = zap, name="ZAP").to_frame()
nuclide['nuclide']= nuclide.ZAP.apply(sandy.zam.zam2latex)
nuclide_index = nuclide.nuclide.values
corr_my_plot = sandy.CategoryCov(cov_massyield.data.loc[zap, zap].set_index(nuclide_index).T.set_index(nuclide_index)).get_corr().data
corr_cy_plot = sandy.CategoryCov(cov_chainyield.data.loc[zap, zap].set_index(nuclide_index).T.set_index(nuclide_index)).get_corr().data
Vx_prior_corr_plot = sandy.CategoryCov(Vx_prior.data.loc[zap, zap].set_index(nuclide_index).T.set_index(nuclide_index)).get_corr().data

In [None]:
%%capture --no-display
fig, axes = plt.subplots(ncols=3, figsize=(15, 5), dpi=100)
ax1, ax2 , ax3 = axes

ax1.spy(Vx_prior_corr_plot)
ax1.set_title(f' From prior covariance matrix of IFY')
ax1.set_yticklabels(nuclide_index)
ax1.set_xticklabels(nuclide_index, rotation=90)
ax1.set_xticks(np.arange(len(nuclide_index)))
ax1.set_yticks(np.arange(len(nuclide_index)))

ax2.spy(corr_my_plot)
ax2.set_title(f' From GLS update covariance matrix of IFY with `mass yield`')
ax2.set_yticklabels(nuclide_index)
ax2.set_xticklabels(nuclide_index, rotation=90)
ax2.set_xticks(np.arange(len(nuclide_index)))
ax2.set_yticks(np.arange(len(nuclide_index)))

ax3.spy(corr_cy_plot)
ax3.set_title(f' From GLS update covariance matrix of IFY with `chain yield`')
ax3.set_yticklabels(nuclide_index)
ax3.set_xticklabels(nuclide_index, rotation=90)
ax3.set_xticks(np.arange(len(nuclide_index)))
ax3.set_yticks(np.arange(len(nuclide_index)))

fig.tight_layout()

Obtained the covariance matrix, we can go ahead with the sampling procedure. In this notebook we use the updated covariace matrix obtained with the first option but the steps to follow are the same whatever covariance matrix is selected.

## Relative covariance matrix

We prefer to work with relative covariance matrix to be consistent with the covariance matrices stored in the ENDF-6 format. SANDY can work with relative or absolute covariance matrix thanks to the `relative` kwarg option inserted in the `sandy.CategoryCov.sampling()` method. Now we apply the $sandwich \ rule$ with the diagonal sensitivity matrix equals to the collection of the best estimates of the fission yields to have the updated relative covariance matrix.

In [None]:
conditions = {'ZAM': zam, "E": e, "MT": 454}
S = np.diag(nfpy._filters(conditions).data.FY)

In [None]:
idx = cov_massyield.data.index
cov_relative = sandy.CategoryCov(pd.DataFrame(cov_massyield.sandwich(S).data.values, index=idx, columns=idx))
cov_relative.data.head().T.head()

## Obtain perturbation coefficient

In [None]:
nsmp = 500 
coeff = cov_relative.sampling(nsmp, pdf='normal', relative=True, tolerance=0)
coeff.data.head()

## Apply first perturbation coefficient to fission yields

This step will be iteratively repeted for each perturbation coefficient to obtain nsmp perturbed fission yields. The perturbation coefficients are given as ratio values, e.g., 1.05 for a perturbation of +5%.

In [None]:
nfpy_new = nfpy.custom_perturbation(pert=coeff.data.iloc[0,:], zam=zam, zap=list(coeff.data.iloc[0,:].index), e=e, mt=454)

## Create an ENDF6 file with the perturbed nuclear data

In [None]:
tape_new = nfpy_new.to_endf6(tape)
file = tape_new.to_file("perturbed_fy")