# Linear Discriminant Analysis (LDA)

## Problem Type
**Linear Discriminant Analysis (LDA)** is primarily used for:
- **Classification** problems
- **Supervised** learning

### How LDA Works
- **Dimensionality reduction and classification:**
  - Projects data onto a lower-dimensional space while maintaining the class-discriminatory information.
- **Maximizes class separability:**
  - Finds a linear combination of features that best separates two or more classes.
  - Maximizes the ratio of between-class variance to within-class variance.
- **Assumes normal distribution:**
  - Assumes that each class follows a Gaussian distribution with a different mean but the same covariance matrix.
- **Decision boundary:**
  - Computes linear decision boundaries between classes in the transformed space.
- **Training process:**
  - Involves estimating class means, shared covariance, and priors, and then using these to define the linear discriminants.

### Key Tuning Metrics
- **`solver`:**
  - **Description:** Algorithm to use for the computation (`svd`, `lsqr`, `eigen`).
  - **Impact:** `svd` does not compute the covariance matrix and is more stable, while `lsqr` and `eigen` are suitable for larger datasets with more predictors.
  - **Default:** `svd`.
- **`shrinkage`:**
  - **Description:** A regularization technique to adjust the covariance estimate; useful when `solver` is `lsqr` or `eigen`.
  - **Impact:** Helps prevent overfitting, especially with high-dimensional data.
  - **Default:** `None`.
- **`n_components`:**
  - **Description:** Number of linear discriminants to compute.
  - **Impact:** Reduces the dimensionality of the feature space; typically `n_classes - 1`.
  - **Default:** `None` (automatically set based on the number of classes).
- **`priors`:**
  - **Description:** Class priors used in the model; can be uniform or specified based on prior knowledge.
  - **Impact:** Adjusts the model's bias toward certain classes based on prior probabilities.
  - **Default:** `None` (automatically inferred from the training data).

### Pros vs Cons

| Pros                                                  | Cons                                                   |
|-------------------------------------------------------|--------------------------------------------------------|
| Effective for small sample sizes with normally distributed data | Assumes linear decision boundaries, which may not capture complex relationships |
| Reduces dimensionality while retaining class information | Sensitive to the assumption of equal covariance matrices across classes |
| Computationally efficient and interpretable           | Performance degrades if classes have similar means     |
| Robust to overfitting, especially with regularization | May struggle with non-Gaussian distributions and outliers |
| Works well with linearly separable classes            | Less effective with a large number of features relative to observations |

### Evaluation Metrics
- **Accuracy (Classification):**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better; generally, values above 0.85 indicate strong model performance.
  - **Bad Value:** Below 0.5 suggests poor model performance.
- **Precision (Classification):**
  - **Description:** Proportion of positive identifications that were actually correct.
  - **Good Value:** Higher values indicate fewer false positives; crucial in imbalanced datasets.
  - **Bad Value:** Low values suggest many false positives.
- **Recall (Classification):**
  - **Description:** Proportion of actual positives that were correctly identified.
  - **Good Value:** Higher values indicate fewer false negatives; important in recall-sensitive applications.
  - **Bad Value:** Low values suggest many false negatives.
- **F1 Score (Classification):**
  - **Description:** Harmonic mean of Precision and Recall.
  - **Good Value:** Higher values indicate a good balance between Precision and Recall.
  - **Bad Value:** Low values suggest a poor balance between Precision and Recall.
- **AUC-ROC (Classification):**
  - **Description:** Measures the ability of the model to distinguish between classes across all thresholds.
  - **Good Value:** Values closer to 1 indicate strong separability between classes.
  - **Bad Value:** Values near 0.5 suggest random guessing.
- **Log Loss (Classification):**
  - **Description:** Measures the performance of a classification model where the output is a probability value between 0 and 1.
  - **Good Value:** Lower values indicate better model calibration and performance.
  - **Bad Value:** Higher values suggest poor probabilistic predictions.



In [None]:
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import classification_report, log_loss
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [None]:
# Load the wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
# Initialize the LDA model with specified parameters
lda = LinearDiscriminantAnalysis(
    solver="svd",  # Using SVD solver
    shrinkage=None,  # No shrinkage applied (only used with 'lsqr' or 'eigen')
    n_components=None,  # Automatically determine based on number of classes
    priors=None,  # Use class priors inferred from the data
)

# Train the model
lda.fit(X_train, y_train)

# Predict probabilities
y_pred_proba = lda.predict_proba(X_test)

# Predict class labels
y_pred = lda.predict(X_test)

In [None]:
# Evaluate the model using log loss
logloss = log_loss(y_test, y_pred_proba)
print(f"Log Loss: {logloss:.2f}")

# Print classification report
print('Classification Report:')
print(classification_report(y_test, y_pred))