In [None]:
%matplotlib notebook
from pyfitit import *

# PyFitIt PCA module

## Import the Data file and Plot it

Import the data file and plot it. The data file must be composed by an initial column representing the energy followed by the related spectral columns.
Warning: each spectrum must have the same length (i.e. the same number of points) of the energy column.

In [None]:
dataFileBrowser = openFile('PdHC.dat')

In [None]:
fileName = dataFileBrowser.chosenFileName
data = np.loadtxt(fileName)
energy=data[:,0]
data=np.delete(data,0,1)
plot_data(energy,data)

## Calculation (SVD)

This function returns the the eigenvalues of the covariance matrix associated to the input dataset and its principal components (PCs). These arrays will be used for the calculation of the stastistical parameters used to identify the correct number of pure species present in the data mixture.

In [None]:
principal_components,l = calcSVD(data)
print(calcSVD(data))

## Statistical Parameters

Calculation of the statistical estimators used for the identification of the correct number (i.e. not associated to the noise contribution) of pure species characterizing the acquired data mixture.

In [None]:
statistic, pc = MalinowskyParameters(data, l)
statistic # Table of Statistichal Parameters.

## Save Statisical Parameters

The eigenvalues of the data covariance matrix (l) and the statistical parameters (statistic) calculated in the above cell are saved.

In [None]:
saveToFile('results/statistic.dat', statistic) # IND, F-test and IE factor.
saveToFile('results/eigenvalues.dat',l) # Eigenvalues of the Covariance matrix.

## Graphs: Statistical Analysis

Plot of the statistical parameters. In the Scree Plot, the correct number of pure spectra must be identified at the level of the elbow of the curve. For the IND and IE test, the number of pure spectra should correspond to the minimum of these functions. Finally, for the F-Test, fixing a level of acceptance of 5%, the number of points located over this value correspond to the number of pure spectra in the dataset. 

In [None]:
plotTestStatistic(statistic, pc, l)

## Number of PCs suggested by IND-factor and F-Test

Number of "pure" spectra suggested by IND (that seems to be more accurate than IE , see Malinowsky 1977) and F-Test.

In [None]:
recommendPCnumber(statistic)

## Graphs: Abstract Components

Plot of the Abstract Components vs the energy. Clicking on the radio-button "Normalized", each spectrum is normalized for the area under the curve, making the comparison with other spectra more reliable. 

In [None]:
plotPCAcomponents(energy, principal_components)

## Graphs: Experimental vs Reconstructed spectra

This module allows to compare each experimental spectrum with the related reconstruction. The reconstructed spectrum is a function the number of components chosen. In the inset it is represented the plot of the residuals associated to the reconstruction procedure. If the correct number of PCs has been chosen properly, the experimental curve should be adequately fitted by the reconstructed one. At the same time, the residual plot should show a trend without any particular features (i.e. oscillatory/noisy trend).

In [None]:
PCAcomparison(energy,data)

## Data Interpolation

By this function, user (modifying the energy step between two consecutive points by the "step" parameter) can increase the number of points for each spectrum in the experimental dataset. This step is foundamental for the data normalization procedure (described below), in fact, higher is the number of point in each spectrum, more accurate will be the data normalization. On the other hand, if the user wish to use the original energy range, this step can be skipped.

In [None]:
energy,data=interpolation(energy,data,step=0.05) 

## Data Normalization

This step is required in order to perform the following "Target Transformation" module. Each experimental spectrum is normalized using this equation:
$$\sigma=\sqrt{\frac{1}{E_{2}-E_{1}}\int_{E_{1}}^{E_{2}}\mu(E)^{2}dE}$$ Where $E_{1}$ and $E_{2}$ are respectively the initial and last energy values of the energy column while $\mu(E)$ represents the XANES values for each spectral column.

In [None]:
data=normalization(energy,data)

## Target Transformation

The "Target Transformation" module allow to retrieve, from the experimental dataset, a set of pure spectra and their related concentration profiles having a well defined physical/chemical meaning. This technique foresees the usage of a transformation matrix whose elements can be directly modified by user moving some sliders. Once that the number of PCs (i.e. the number of pure species in the dataset) has been identified, two working configuration are available. "Case: 1": The "pure" spectral profiles are not normalized. The first or the last spectrum (or both) of the experimental dataset can be fixed, reducing in this way the number of sliders that can be moved. "Case: 2" imposes the normalization of the "pure" spectral profiles. Moreover, as for "Case:1" the first or the last (or both) experimental spectra of the dataset can be fixed.    

In [None]:
pcaResult = targetTransformationPCA(energy, data,sign=-1,min_val=-5,max_val=5,step_val=0.05)

## Save Data and Images

These commands allows to save the "pure" spectral profiles and their related concentrations obtained from the "Target Transformation" module.

In [None]:
saveToFile('results/params.txt', pcaResult.params)
pcaResult.fig.savefig('results/image.png', dpi=200)
saveToFile('results/Pure Spectra.txt', pcaResult.pureSpectra)
saveToFile('results/Pure Concentrations.txt', pcaResult.pureConcentrations)