# Example script

This simple Jupyter Notebook will exemplify a simple workflow using this library, starting from data fusion and ending with prediction.

## Step zero: install the library (and get the example data)
Let's install the package from `PyPI`.

In [None]:
%pip install chemfusekit

# Optional: download the example data from the repository (you could upload your own files)
!wget https://github.com/f-aguzzi/tesi/raw/main/tests/qepas.xlsx
!wget https://github.com/f-aguzzi/tesi/raw/main/tests/rt.xlsx

## First step: Low-Level Data Fusion
- the `LLDF` class is used for data fusion
- the `LLDF_Settings` class is a helper class for setting up `LLDF`
- `LLDF` data can then be exported, or used for further processing

In [None]:
from chemfusekit.lldf import LLDFSettings, LLDF

# Initialize the settings for low-level data fusion
lldf_settings = LLDFSettings(
    qepas_path='qepas.xlsx',    # (or put the name of your files)
    qepas_sheet='Sheet1',
    rt_path='rt.xlsx',
    rt_sheet='Sheet1',
    preprocessing='snv'  # normalization preprocessing; other options: savgol or both
)

# Initialize and run low-level data fusion
lldf = LLDF(lldf_settings)
lldf.lldf()

In [None]:
# (optional) export the LLDF data to an Excel file
lldf.export_data('output_file.xlsx')

## Second step: PCA

The `PCA` class provides Principal Component Analysis tools. Given:
- a target variance level to maintain even in the reduced component model;
- a confidence level for statistical tests;
- a number of initial components for the analysis
through the `PCASettings` class, the `PCA` class will perform an automated PCA analysis.

In [None]:
from chemfusekit.pca import PCASettings, PCA

# Initialize the settings for Principal Component Analysis
pca_settings = PCASettings(
    target_variance=0.99,
    confidence_level=0.05,
    initial_components=10,
    output=True # graphs will be printed
)

# Initialize and run the PCA class
pca = PCA(lldf.fused_data, pca_settings)
pca.pca()

# Print the number of components and the statistics
print(pca.components)
pca.pca_stats()

## Third step: LDA training

- the `LDA` class provides Linear Discriminant Analysis tools
- the `LDASettings` helper class holds the settings for the `LDA` class

In [None]:
from chemfusekit.lda import LDASettings, LDA

settings = LDASettings(
    components=(pca.components - 1),    # one less component than the number determined by PCA
    output=True # graphs will be printed
)

# Initialize and run the LDA class
lda = LDA(lldf.fused_data, settings)
lda.lda()

## Step 4: LDA prediction

In [None]:
# Let's pick a random sample and see if it gets recognized correctly:
x_data_sample = lldf.fused_data.x_train.iloc[119] # should be DMMP
x_data_sample = x_data_sample.iloc[1:].to_frame().transpose()

# Let's run the prediction:
predictions = lda.predict(x_data_sample)
print(predictions)