<a href="https://colab.research.google.com/github/mahynski/chemometric-carpentry/blob/main/notebooks/4_Conventional_Chemometric_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
❓ ***Objective***: This notebook will introduce some common chemometric models.  

🔁 ***Remember***: You can always revisit this notebook for reference again in the future.  Ideas and best practices will be reinforced in future notebooks, so don't worry about remembering everything the first time you see something new.

🧑 Author: Nathan A. Mahynski

📆 Date: May 30, 2024

---

<img src="https://pychemauth.readthedocs.io/en/latest/_images/pipeline.png" height=425 align="right"/>

So far we have introduced the concept of a pipeline, which is essentially a combination of:

1. pre-processing steps, ending in a
2. modeling step.

The PyChemAuth [classifier subpackage](https://pychemauth.readthedocs.io/en/latest/pychemauth.classifier.html)
contains a variety of models useful for classification and authentication, while the
 [regressor subpackage](https://pychemauth.readthedocs.io/en/latest/pychemauth.regressor.html)
 contains models used for regression.

These models can all be placed at the end of a pipeline like this:
```python
pipe = Pipeline(steps=[
    ('preprocessor_1', PCA_IA(n_components=3)),
    ('preprocessor_2', CorrectedScaler(pareto=True)),
    ('preprocessor_3', SNV(robust=True)),
    ...,
    ('final_model', PLSDA(...))
])

pipe.fit(X_train, y_train)
```

---

Since the [PyChemAuth documentation](https://pychemauth.readthedocs.io/en/latest/chemometrics.html) already contains numerous notebooks designed to introduce these models and [how to use them](https://pychemauth.readthedocs.io/en/latest/examples.html), we will simply provide links to those notebooks here.


1. 📚 "Learn" links point to notebook designed to introduce the model and some basic mathematics.

2. ⌨ "API" links point to the documentation for, or examples of, using these models in practice.

3. 🤝 "Interactive Tool" links point to [streamlit](https://streamlit.io/) web applications, hosted in the [community cloud](https://streamlit.io/cloud), that allow you to play around and explore these models and the effect of different hyperparameters.

---
   
* 📈 Regression Models
    * Ordinary Least Squares (OLS)
        * [Learn](https://pychemauth.readthedocs.io/en/latest/jupyter/learn/ols.html) | [sklearn API](https://scikit-learn.org/stable/modules/linear_model.html) | [Interactive Tool](https://chemometric-carpentry-ols.streamlit.app/)
    * Principal Components Analysis (PCA) and Regression (PCR)
        * [Learn](https://pychemauth.readthedocs.io/en/latest/jupyter/learn/pca_pcr.html) | [API](https://pychemauth.readthedocs.io/en/latest/jupyter/api/pca.html) | [Interactive PCA Tool](https://chemometric-carpentry-pca.streamlit.app/), [Interactive PCR Tool](https://chemometric-carpentry-pcr.streamlit.app/)
    * Partial Least-Squares (PLS) or Projection to Latent Structures
        * [Learn](https://pychemauth.readthedocs.io/en/latest/jupyter/learn/pls.html) | [API](https://pychemauth.readthedocs.io/en/latest/jupyter/api/pls.html) | [Interactive Tool](https://chemometric-carpentry-pls.streamlit.app/)
* ✅ Classification and Authentication Models
    * Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA)
        * [Learn](https://pychemauth.readthedocs.io/en/latest/jupyter/learn/lda.html) | [sklearn API](https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html) | [Interactive Tool](https://chemometric-carpentry-lda.streamlit.app/)
    * Partial Least-Squares-Discriminant Analysis (PLS-DA)
         * [Learn](https://pychemauth.readthedocs.io/en/latest/jupyter/learn/plsda.html) | [API](https://pychemauth.readthedocs.io/en/latest/jupyter/api/plsda.html) | [Interactive Tool](https://chemometric-carpentry-plsda.streamlit.app/)
    * Soft Independent Modeling of Class Analogies (SIMCA)
        * [Learn](https://pychemauth.readthedocs.io/en/latest/jupyter/learn/simca.html) | [API](https://pychemauth.readthedocs.io/en/latest/jupyter/api/simca.html) | [Interactive Tool](https://chemometric-carpentry-ddsimca.streamlit.app/)

# Examples

Here are some minimal code examples to illustrate how easy it is use to use the models.

## OLS

```python
from sklearn.linear_model import LinearRegression

# Fit model
model = LinearRegression(fit_intercept=True,)
model.fit(X_train, y_train)

# Make predictions on test set
prediction = model.predict(X_test)
```

## PCA

```python
from sklearn.decomposition import PCA
from pychemauth.preprocessing.scaling import CorrectedScaler

# Fit model
model = PCA(n_components=1)
scaler = CorrectedScaler(with_mean=True, with_std=True)
model.fit(scaler.fit_transform(X_train))

# Loadings (not yet scaled by eigenvalues)
loadings = model.components_

# Eigenvalues (for each component)
eigenvalues = model.explained_variance_

# Compute scores for test set
scores = model.transform(scaler.transform(X_test))
```

## PCR

```python
from pychemauth.regressor.pcr import PCR

# Fit model
model = PCR(n_components=1)
model.fit(X_train, y_train)

# Make predictions on test set
predictions = model.predict(X_test)
```

## PLS

```python
from pychemauth.regressor.pls import PLS

# Fit model
model = PLS(n_components=1)
model.fit(X_train, y_train)

# Make predictions on test set
predictions = model.predict(X_test)
```

## LDA

```python
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

# Fit model
model = LDA(n_components=1)
scaler = CorrectedScaler(with_mean=True, with_std=True)
model.fit(scaler.fit_transform(X_train), y_train)

# Analogous to PCA loadings
scalings = model.scalings_.T

# Discriminability ratio (ratio of eigenvalues)
discr = model.explained_variance_ratio_

# Compute scores for dimensionality reduction
scores = model.transform(scaler.transform(X_test))

# Predict class if using as classifier
predictions = model.predict(X_test)
```

## PLS-DA

```python
from pychemauth.classifier.plsda import PLSDA

# Fit model
hard_plsda = PLSDA(n_components=3, style="hard")
soft_plsda = PLSDA(n_components=3, style="soft")
_ = hard_plsda.fit(X_train, y_train)
_ = soft_plsda.fit(X_train, y_train)

# Make predictions on test set
hard_predictions = hard_plsda.predict(X_test)
soft_predictions = soft_plsda.predict(X_test)

# We can visualize the results if we are modeling 2 or 3 classes.
_ = hard_plsda.visualize(styles=['hard'])
_ = soft_plsda.visualize(styles=['soft'])
```

## DD-SIMCA

```python
from pychemauth.classifier.simca import DDSIMCA_Model

# Select data from a single class to model
chosen_class = 'setosa'
X_train_dds = X_train[y_train == chosen_class]

# Fit model
model = DDSIMCA_Model(n_components=1, scale_x=True)
_ = model.fit(X_train_dds)

# Visualize the results
_ = model.visualize(X_train_dds, y_train_dds)

# Predict class membership on any X
membership = model.predict(X_test)
```