
# 🧪 Multivariate Analysis: Concepts and Applications

Welcome to this notebook on **multivariate analysis**!  
Multivariate methods are essential for exploring, modelling, and understanding complex datasets — where many variables interact simultaneously.

We'll explore:
- ✨ Key concepts of multivariate analysis
- 🧩 Principal Component Analysis (PCA)
- 🎯 Partial Least Squares Discriminant Analysis (PLS-DA)
- 📈 Bayesian multivariate models
- 🔍 Machine learning approaches to classification and prediction

As a practical example, we'll use a **synthetic metabolomics dataset**, similar to real-world studies, to illustrate the methods.  
However, the techniques you will learn apply just as well to fields like nutrition, clinical data, finance, or engineering!

Let's dive in! 🚀

---

Before we start, let's set up the workspace, load the data and the necessary libraries:

<details>

<summary>Detailed description of libraries</summary>


#### 1. NumPy

- **Purpose**: NumPy is the foundational library for numerical computing in Python. It provides efficient array operations, mathematical functions, and linear algebra tools (e.g., matrix inversion, eigenvalues) used in data preprocessing and calculations like Hotelling’s T².
- **Used For**: Array manipulation, linear algebra (e.g., `la.inv` for inverse covariance in PCA), and mathematical operations.
- **Documentation**: [NumPy Documentation](https://numpy.org/doc/stable/)

#### 2. Pandas

- **Purpose**: Pandas offers data structures (e.g., DataFrame) and tools for data manipulation and analysis, ideal for handling tabular data like datasets for PCA or machine learning.
- **Used For**: Data loading (e.g., CSV files), preprocessing, and feature engineering before PCA or PLS regression.
- **Documentation**: [Pandas Documentation](https://pandas.pydata.org/docs/)

#### 3. Scikit-learn (sklearn)

- **Purpose**: Scikit-learn is a comprehensive machine learning library providing tools for data preprocessing, dimensionality reduction, regression, classification, and model evaluation.
- **Used For**:
  - `StandardScaler`: Standardizing features for PCA or PLS regression.
  - `PCA`: Dimensionality reduction for data visualization or analysis.
  - `PLSRegression`: Partial Least Squares regression for predictive modeling.
  - `train_test_split`: Splitting data into training and test sets.
  - `accuracy_score`: Evaluating classification model performance.
  - `RandomForestClassifier`: Building ensemble classification models.
- **Documentation**: [Scikit-learn Documentation](https://scikit-learn.org/stable/)

#### 4. Matplotlib

- **Purpose**: Matplotlib is a plotting library for creating static, interactive, and publication-quality visualizations, such as scatter plots or PCA biplots.
- **Used For**: Visualizing PCA results (e.g., scatter plots with Hotelling’s T² ellipses) or model performance metrics.
- **Documentation**: [Matplotlib Documentation](https://matplotlib.org/stable/)

#### 5. PyMC

- **Purpose**: PyMC is a probabilistic programming library for Bayesian statistical modeling and inference, enabling flexible model specification and sampling.
- **Used For**: Building Bayesian models to estimate parameters or uncertainty, potentially for metabolomics or dietary data analysis.
- **Documentation**: [PyMC Documentation](https://www.pymc.io/)

#### 6. ArviZ

- **Purpose**: ArviZ is a library for exploratory analysis of Bayesian models, providing tools for visualizing posterior distributions, convergence diagnostics, and model comparison.
- **Used For**: Analyzing and visualizing PyMC model outputs (e.g., trace plots, posterior predictive checks).
- **Documentation**: [ArviZ Documentation](https://python.arviz.org/en/stable/)

#### 7. SciPy

- **Purpose**: SciPy builds on NumPy to provide advanced scientific computing tools, including statistical distributions, optimization, and signal processing.
- **Used For**: Statistical functions (e.g., `f.ppf` for F-distribution critical values in Hotelling’s T²) and linear algebra operations.
- **Documentation**: [SciPy Documentation](https://docs.scipy.org/doc/scipy/)

</details>



In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
import os
from google.colab import files

# Define the module and dataset for this notebook
MODULE = '10_mini_projects'  # e.g., '01_infrastructure'
DATASET = 'metabolomics_dataset.csv'  # e.g., 'hippo_diets.csv'
BASE_PATH = '/content/data-analysis-toolkit-FNS'
MODULE_PATH = os.path.join(BASE_PATH, 'notebooks', MODULE)
DATASET_PATH = os.path.join('data', DATASET)

# Step 1: Attempt to clone the repository (automatic method)
# Note: If you encounter a cloning error (e.g., 'fatal: destination path already exists'),
#       reset the runtime (Runtime > Restart runtime) and run this cell again.
try:
    print('Attempting to clone repository...')
    if os.path.exists(BASE_PATH):
        print('Repository already exists, skipping clone.')
    else:
        !git clone https://github.com/ggkuhnle/data-analysis-toolkit-FNS.git
    
    # Debug: Print directory structure
    print('Listing repository contents:')
    !ls {BASE_PATH}
    print(f'Listing notebooks directory contents:')
    !ls {BASE_PATH}/notebooks
    
    # Check if the module directory exists
    if not os.path.exists(MODULE_PATH):
        raise FileNotFoundError(f'Module directory {MODULE_PATH} not found. Check the repository structure.')
    
    # Set working directory to the notebook's folder
    os.chdir(MODULE_PATH)
    
    # Verify dataset is accessible
    if os.path.exists(DATASET_PATH):
        print(f'Dataset found: {DATASET_PATH} 🦛')
    else:
        print(f'Error: Dataset {DATASET} not found after cloning.')
        raise FileNotFoundError
except Exception as e:
    print(f'Cloning failed: {e}')
    print('Falling back to manual upload option...')

    # Step 2: Manual upload option
    print(f'Please upload {DATASET} manually.')
    print(f'1. Click the "Choose Files" button below.')
    print(f'2. Select {DATASET} from your local machine.')
    print(f'3. Ensure the file is placed in notebooks/{MODULE}/data/')
    
    # Create the data directory if it doesn't exist
    os.makedirs('data', exist_ok=True)
    
    # Prompt user to upload the dataset
    uploaded = files.upload()
    
    # Check if the dataset was uploaded
    if DATASET in uploaded:
        with open(DATASET_PATH, 'wb') as f:
            f.write(uploaded[DATASET])
        print(f'Successfully uploaded {DATASET} to {DATASET_PATH} 🦛')
    else:
        raise FileNotFoundError(f'Upload failed. Please ensure you uploaded {DATASET}.')

# Install required packages for this notebook
%pip install pandas numpy
print('Python environment ready.')

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pymc as pm
import arviz as az

from scipy.stats import f, chi2
import numpy.linalg as la

from sklearn.ensemble import RandomForestClassifier

import seaborn as sns

# Set seaborn style for professional, clean visuals
sns.set_style("whitegrid")

# Load metabolomics dataset (1000 participants, 200 metabolites)
df = pd.read_csv('data/metabolomics_dataset.csv')

# Extract features (metabolites) and labels
X = df.filter(like='Metabolite_')  # Columns like 'Metabolite_1', ..., 'Metabolite_200'
labels = df['Label']

## 1. Introduction to Multivariate Analysis 📊

Multivariate datasets - such as for example in metabolomics - are like a galaxy of stars ✨ — thousands of data, each twinkling with information. Multivariate analysis helps us find patterns, classify samples (e.g., healthy vs. diseased), and uncover biomarkers. These methods are essential because:

- **High-dimensionality**: Metabolomics data often have more variables (metabolites) than samples.
- **Correlations**: Metabolites do not act in isolation; they interact in complex and structured ways.
- **Noise**: Biological and technical variability can obscure true signals.

In this notebook, we will use Python with libraries such as `scikit-learn`, `PyMC`, and `pandas` to explore these techniques.  
If you are new to this area — no problem! We'll guide you through each step. 😊

---

### **Exercise 1**  
Why do you think multivariate methods are better than analysing each metabolite separately?  
Write your thoughts below — no code needed, just reflect.

<details>
<summary>💡 Hint</summary>

Think about how metabolites might be connected through biological pathways, and why examining them together could reveal patterns that are invisible when looking at them one by one.

</details>


## 2. Principal Component Analysis (PCA): The Unsupervised Explorer 🗺️

PCA is like a treasure map for your data — it reduces dimensionality by finding **principal components** (PCs) that capture the greatest variance.  
In metabolomics, PCA helps us:

- Visualise sample similarities and groupings 🧩
- Detect outliers 🚨
- Explore underlying structure without using class labels (unsupervised)

PCA projects high-dimensional data onto a smaller number of dimensions while preserving as much information as possible.  
It is often the first step in a metabolomics analysis to get a quick overview of the dataset.

We will implement PCA using `scikit-learn` and visualise the results with `matplotlib` and `seaborn`. 📈

---

### **Exercise 2**  
Before we dive into the code, think about this:  
Why might PCA sometimes *hide* important biological information?

<details>
<summary>💡 Hint</summary>

PCA is optimised for variance, not necessarily for biological relevance. Sometimes important differences (e.g., between healthy and diseased) might not be the biggest source of variance!

</details>

## Step 1: Visualizing the Messy Raw Data 📊

Our dataset has 200 metabolites per participant, creating a high-dimensional maze. Direct visualization is nearly impossible (imagine a 200-dimensional scatter plot!). Instead, we’ll use two techniques to show the data’s “messiness”:

- **Correlation Heatmap**: Reveals pairwise correlations between metabolites. High correlations suggest redundant features, a common issue in metabolomics that PCA can address.
- **Pairwise Scatter Plots**: Shows relationships between a subset of metabolites, highlighting how scattered and unstructured the raw data appears.

These plots will demonstrate why we need PCA to simplify this complex dataset.

In [None]:
# Visualize raw data: Correlation heatmap
plt.figure(figsize=(10, 8))  # Larger size for 200 metabolites
X_df = df.filter(like='Metabolite_')  # Ensure we use the DataFrame
corr = X_df.corr()  # Compute pairwise correlations
sns.heatmap(corr, cmap='coolwarm', center=0, vmin=-1, vmax=1)
plt.title('Correlation Heatmap of 200 Metabolites')
plt.tight_layout()
plt.show()

### What’s Happening Here? 🤔
The heatmap shows correlations between our 200 metabolites. Notice the dense patterns of red (positive) and blue (negative) correlations — this redundancy makes the data “messy” and hard to interpret. Many metabolites move together, suggesting we can reduce dimensions without losing much information.

Next, let’s try visualizing pairs of metabolites to see if patterns emerge naturally.

In [None]:
# Visualize raw data: Pairwise scatter plots for top 5 metabolites
subset_cols = df.filter(like='Metabolite_').columns[:5]  # Select first 5 metabolite columns
sns.pairplot(df[subset_cols], diag_kind='kde', plot_kws={'alpha': 0.5})
plt.suptitle('Pairwise Scatter Plots of Raw Metabolomics Data', y=1.02)
plt.show()

### The High-Dimensional Challenge 🌪️
The scatter plots show relationships between just 5 of our 200 metabolites, and already it’s chaotic! Points are scattered, with no clear groupings, and we’re only seeing a tiny slice of the data. Imagine trying to plot all 200 dimensions — it’s impossible! This messiness is why PCA is our go-to tool: it finds the directions (principal components) that capture the most variance, simplifying the data into a 2D map we can explore.

Before PCA, we need to preprocess the data to ensure fair comparisons

In [None]:
# Preprocess: Standardize features to ensure equal scale
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## Step 2: Applying PCA 🧑‍🔬

Now that we’ve seen the raw data’s complexity, let’s apply PCA to reduce our 200 metabolites to 2 principal components (PCs). PCA identifies the directions of maximum variance, projecting our high-dimensional data onto a 2D plane. We’ll use `scikit-learn`’s `PCA` to do this efficiently.

Standardization (via `StandardScaler`) was critical to ensure metabolites with different scales (e.g., concentrations) don’t skew the results. Let’s transform the data and explore the results!

In [None]:
# Apply PCA to reduce to 2 components
pca = PCA(n_components=2)
pca_result = pca.fit_transform(X_scaled)

## Step 3: Detecting Outliers with Hotelling’s T² 🚨

PCA helps us spot outliers — samples that deviate significantly from the main data cloud. We’ll use **Hotelling’s T²**, a statistical test that measures how far each sample is from the center of the PCA scores, accounting for the data’s variance. Samples outside a 95% confidence ellipse are flagged as outliers.

We’ll compute T² scores using `numpy.linalg` and `scipy.stats.chi2`, then visualize them in our PCA plot.

In [None]:
# Ensure labels are numerical for plotting (if categorical, encode them)
if labels.dtype == 'object':
    labels = pd.Categorical(labels).codes  # Convert to numerical codes (e.g., 0, 1)

# Calculate Hotelling's T^2 for outlier detection
n_samples, n_components = pca_result.shape
mean_scores = np.mean(pca_result, axis=0)
cov_scores = np.cov(pca_result.T)
# Ensure covariance matrix is well-conditioned
if np.linalg.cond(cov_scores) < 1e6:  # Check condition number
    inv_cov = la.inv(cov_scores)
else:
    # Add small diagonal for numerical stability
    cov_scores += np.eye(n_components) * 1e-6
    inv_cov = la.inv(cov_scores)

# Compute T^2 scores
t2_scores = np.array([
    (pca_result[i] - mean_scores).T @ inv_cov @ (pca_result[i] - mean_scores)
    for i in range(n_samples)
])

# Use chi-squared distribution for 95% confidence ellipse (simpler for 2D PCA)
alpha = 0.05
critical_value = chi2.ppf(1 - alpha, df=n_components)  # Chi-squared with 2 degrees of freedom

# Identify outliers
outliers = t2_scores > critical_value

# Compute ellipse for 95% confidence region
eigenvalues, eigenvectors = np.linalg.eig(cov_scores)
radii = np.sqrt(critical_value * eigenvalues)  # Scale by chi-squared critical value
theta = np.linspace(0, 2 * np.pi, 100)
ellipse = (eigenvectors @ np.diag(radii) @ np.array([np.cos(theta), np.sin(theta)])).T + mean_scores

# Visualize PCA results with Hotelling's T^2 ellipse
plt.figure(figsize=(8, 6))
plt.scatter(pca_result[:, 0], pca_result[:, 1], c=labels, cmap='viridis', alpha=0.7, label='Samples')
plt.scatter(pca_result[outliers, 0], pca_result[outliers, 1], c='red', s=100, marker='x', label='Outliers')
plt.plot(ellipse[:, 0], ellipse[:, 1], 'k--', label='95% Confidence Ellipse')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('PCA of Metabolomics Data (1000 Participants) 🗺️')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

# Print explained variance ratio and outlier count
print(f'Explained Variance Ratio: PC1 = {pca.explained_variance_ratio_[0]:.2f}, '
      f'PC2 = {pca.explained_variance_ratio_[1]:.2f}')
print(f'Number of Outliers Detected: {np.sum(outliers)}')

## Step 4: Interpreting the PCA Results 🧩

Wow, look at that plot! Compared to the raw data’s chaos, PCA reveals clear groupings and outliers. The **explained variance ratio** tells us how much information PC1 and PC2 capture (e.g., 35% and 20%). Outliers outside the ellipse may indicate unusual samples, like measurement errors or unique biological profiles.

But PCA isn’t perfect. Let’s reflect on its limitations before moving forward.

---

### **Exercise 2**
Why might PCA sometimes *hide* important biological information?

<details>
<summary>💡 Hint</summary>
PCA is optimised for variance, not necessarily for biological relevance. Important differences (e.g., between healthy and diseased) might not be the biggest source of variance!
</details>

---

### Learning Points
- **Raw Data Messiness**: The heatmap and scatter plots showed high dimensionality and redundancy, making PCA essential.
- **PCA’s Power**: PCA simplifies 200 metabolites into 2D, revealing patterns and outliers.
- **Preprocessing Matters**: Standardization ensures fair metabolite comparisons.
- **Outlier Detection**: Hotelling’s T² flags anomalies for further investigation.

*Ready to explore more of your data’s hidden treasures? Let’s keep going! 🧑‍🔬*

## Step 5: Interpreting the PCA Results 🧩

Wow, look at that plot! Compared to the raw data’s chaos, PCA reveals clear groupings and outliers. The **explained variance ratio** tells us how much information PC1 and PC2 capture (e.g., 35% and 20%). Outliers outside the ellipse may indicate unusual samples, like measurement errors or unique biological profiles.

But PCA isn’t perfect. Let’s reflect on its limitations before moving forward.

---

### **Exercise 2**
Why might PCA sometimes *hide* important biological information?

<details>
<summary>💡 Hint</summary>
PCA is optimised for variance, not necessarily for biological relevance. Important differences (e.g., between healthy and diseased) might not be the biggest source of variance!
</details>

---

### Learning Points
- **Raw Data Messiness**: The heatmap and scatter plots showed high dimensionality and redundancy, making PCA essential.
- **PCA’s Power**: PCA simplifies 200 metabolites into 2D, revealing patterns and outliers.
- **Preprocessing Matters**: Standardization ensures fair metabolite comparisons.
- **Outlier Detection**: Hotelling’s T² flags anomalies for further investigation.

*Ready to explore more of your data’s hidden treasures? Let’s keep going! 🧑‍🔬*

## 3. Partial Least Squares Discriminant Analysis (PLS-DA): The Supervised Classifier 🏷️

PLS-DA is like a guided missile 🎯—it’s supervised, meaning it uses class labels (e.g., "healthy" vs. "diseased") to find components that maximize both variance and group separation. Perfect for classifying metabolomics samples!

### 3.1 PLS-DA in Action

Let's add class labels to our synthetic dataset and apply PLS-DA.

In [None]:

X = df.filter(like='Metabolite_')
y = df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=11088)
plsda = PLSRegression(n_components=2)
plsda.fit(X_train, y_train)
scores = plsda.transform(X)
plt.scatter(scores[:, 0], scores[:, 1], c=df['Label'], cmap='viridis')
plt.xlabel('PLS Component 1')
plt.ylabel('PLS Component 2')
plt.title('PLS-DA of Metabolomics Data')


**Explanation**:
- **PLSRegression**: Used for PLS-DA by treating class labels as continuous (threshold at 0.5 for binary classification).
- **train_test_split**: Splits data to evaluate model performance.
- **Scores Plot**: Shows how well PLS-DA separates classes.

**Exercise 3**: Change `n_components` to 3 in the PLS-DA model. Does the accuracy improve? Why or why not?

<details>
<summary>💡 Hint</summary>
More components capture more variance but may lead to overfitting, especially with small datasets. Check the accuracy and consider the trade-off!
</details>

**Learn More**: Explore [PLS-DA in metabolomics](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6017634/) for real-world applications! 🧬

## 4. Bayesian Multivariate Models: Embracing Uncertainty 🌈

Bayesian methods are like a crystal ball 🔮—they model uncertainty and let us build flexible multivariate models. In metabolomics, Bayesian approaches can handle missing data, model latent variables, or perform regression.

### 4.1 Bayesian PCA with PyMC

Let's use `PyMC` to implement a simple Bayesian PCA model. This assumes metabolites are generated from latent PCs with Gaussian noise.

Please be patient.

In [None]:

X = df.filter(like='Metabolite_').values
with pm.Model() as bayes_pca:
    z = pm.Normal('z', mu=0, sigma=1, shape=(1000, 2))  # Latent PCs
    w = pm.Normal('w', mu=0, sigma=1, shape=(200, 2))  # Loadings
    mu = pm.math.dot(z, w.T)
    X_obs = pm.Normal('X', mu=mu, sigma=0.1, observed=X)
    trace = pm.sample(500, return_inferencedata=True)
az.plot_posterior(trace, var_names=['w'], coords={'w_dim_0': [0]})


**Explanation**:
- **z**: Latent PCs for each sample.
- **w**: Loadings (how metabolites contribute to PCs).
- **X**: Observed data modeled as a linear combination of PCs plus noise.
- **pm.sample**: Uses MCMC to estimate posterior distributions.

**Exercise 4**: Increase the number of samples (`500` to `1000`) in `pm.sample`. Does the posterior distribution change significantly? Why?

<details>
<summary>💡 Solution</summary>
More samples improve the precision of the posterior but may not change the mean estimates much if the model has converged. Check the plot for tighter distributions!
</details>

**Learn More**: Dive into [PyMC's documentation](https://www.pymc.io/welcome.html) for more Bayesian modeling ideas! 🧠

## 5. Machine Learning: A Quick Dip into Random Forests 🌳

Machine learning (ML) is like a superpower for metabolomics—models like Random Forests can classify samples or identify important metabolites (potential biomarkers).

### 5.1 Random Forest Classifier

Let's use a Random Forest to classify our samples and find important metabolites.

In [None]:
X = df.filter(like='Metabolite_')
y = df['Label']
rf = RandomForestClassifier(n_estimators=100, random_state=11088)
rf.fit(X, y)
importance = rf.feature_importances_
plt.bar(range(10), importance[:10], tick_label=X.columns[:10])
plt.xticks(rotation=90, ha='right')
plt.title('Top 10 Metabolite Importances')
plt.xlabel('Metabolite')
plt.ylabel('Importance')


**Explanation**:
- **RandomForestClassifier**: Builds multiple decision trees and aggregates their predictions.
- **feature_importances_**: Shows which metabolites contribute most to classification (potential biomarkers!).

**Exercise 5**: Increase `n_estimators` to 200. Does the accuracy improve? Plot the feature importances again—are the top metabolites the same?

<details>
<summary>💡 Hint</summary>
More trees reduce variance but may not change feature rankings much if the model is stable. Compare the plots visually!
</details>

**Learn More**: Check out [Random Forests in scikit-learn](https://scikit-learn.org/stable/modules/ensemble.html#forests-of-randomized-trees) for more ML fun! 🚀

## 6. Using Principal Components in Regression: Biomarker Detection 🔍

Now, let's use PCA scores as predictors in a regression model to predict a continuous outcome (e.g., disease severity). This combines dimensionality reduction with predictive modeling.

### 6.1 PCA + Linear Regression

We'll use the PCA scores from Section 2 and regress them against a synthetic outcome.

In [None]:

X = df.filter(like='Metabolite_')
y = df['Severity']
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=2)
pca_result = pca.fit_transform(X_scaled)
lr = LinearRegression()
lr.fit(pca_result, y)
y_pred = lr.predict(pca_result)
plt.scatter(y, y_pred, c='purple', alpha=0.7)
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--')
plt.xlabel('True Severity')
plt.ylabel('Predicted Severity')
plt.title('PCA + Linear Regression')
plt.savefig('pca_regression.png')
plt.close()

**Explanation**:
- **pca_result**: PCA scores (PCs) used as predictors.
- **LinearRegression**: Models the relationship between PCs and the outcome.
- **Scatter Plot**: Shows how well predictions match the true outcome.

**Exercise 6**: Use only PC1 (`pca_result[:, 0].reshape(-1, 1)`) in the regression. Does the model perform worse? Why?

<details>
<summary>💡 Solution</summary>
Using only PC1 reduces the information available to the model, likely worsening performance unless PC1 captures most of the relevant variation. Compare the scatter plots!
</details>

## 7. Summary: Your Metabolomics Toolkit 🧰

Here's what you've learned:

- **PCA** 🗺️: Unsupervised, reduces dimensionality, explores data structure.
- **PLS-DA** 🏷️: Supervised, classifies samples, maximizes group separation.
- **Bayesian Models** 🔮: Handle uncertainty, flexible for complex problems.
- **Random Forests** 🌳: ML for classification and biomarker detection.
- **PCA + Regression** 🔍: Uses PCs for predictive modeling (e.g., disease severity).

**Final Exercise**: Pick a real metabolomics dataset (e.g., from [MetaboLights](https://www.ebi.ac.uk/metabolights/)) and apply one of these methods. Share your findings in a short paragraph!

**What's Next?** Try advanced methods like t-SNE, SVMs, or deep learning for metabolomics. Keep exploring, and happy analyzing! 😄