# Example script

This simple Jupyter Notebook will exemplify a simple workflow using this library, starting from data fusion and ending with prediction.

## Step zero: install the library (and get the example data)
Let's install the package from `PyPI`.

In [None]:
%pip install chemfusekit

# Optional: download the example data from the repository (you could upload your own files)
!wget https://github.com/f-aguzzi/tesi/raw/main/tests/qepas.xlsx
!wget https://github.com/f-aguzzi/tesi/raw/main/tests/rt.xlsx

# Automatically inline the graphs
%matplotlib inline

## First step: Low-Level Data Fusion
- the `LLDF` class is used for data fusion
- the `LLDF_Settings` class is a helper class for setting up `LLDF`
- `LLDF` data can then be exported, or used for further processing

In [None]:
from chemfusekit.lldf import LLDFSettings, LLDF, Table, GraphMode

# Initialize the settings for low-level data fusion
# Perform preliminary data fusion
lldf_settings = LLDFSettings(output=GraphMode.NONE)
qepas_table = Table(
    file_path="qepas.xlsx",
    sheet_name="Sheet1",
    preprocessing="snv"
)
rt_table = Table(
    file_path="rt.xlsx",
    sheet_name="Sheet1",
    preprocessing="none"
)

tables = [qepas_table, rt_table]

# Initialize and run low-level data fusion
lldf = LLDF(lldf_settings, tables)
lldf.lldf()

In [None]:
# (optional) export the LLDF data to an Excel file
lldf.export_data('output_file.xlsx')

## Second step: PCA

The `PCA` class provides Principal Component Analysis tools. Given:
- a target variance level to maintain even in the reduced component model;
- a confidence level for statistical tests;
- a number of initial components for the analysis
through the `PCASettings` class, the `PCA` class will perform an automated PCA analysis.

In [None]:
from chemfusekit.pca import PCASettings, PCA, GraphMode

# Initialize the settings for Principal Component Analysis
pca_settings = PCASettings(
    target_variance=0.99,
    confidence_level=0.05,
    initial_components=10,
    output=GraphMode.GRAPHIC # graphs will be printed as pictures
)

# Initialize and run the PCA class
pca = PCA(pca_settings, lldf.fused_data)
pca.pca()

# Print the number of components and the statistics
print(f"\nNumber of components: {pca.components}\n")
pca.pca_stats()

# Export data from PCA
pca_data = pca.export_data()

## Third step: LDA training

- the `LDA` class provides Linear Discriminant Analysis tools
- the `LDASettings` helper class holds the settings for the `LDA` class

In [None]:
from chemfusekit.lda import LDASettings, LDA, GraphMode

settings = LDASettings(
    output=GraphMode.GRAPHIC    # Graphs will be printed
    test_split=True     # Run split test
)

# Initialize and run the LDA class
lda = LDA(settings, pca_data)   # components will be determined automatically from the PCA data
lda.lda()

## Step 4: LDA prediction

In [None]:
# Let's pick a random sample and see if it gets recognized correctly:
x_data_sample = lldf.fused_data.x_train.iloc[119] # should be DMMP
x_data_sample = x_data_sample.iloc[1:].to_frame().transpose()

# Let's run the prediction:
predictions = lda.predict(x_data_sample)
print(predictions)