### What is Normalization & Standardization and how is it helpful?
### 1. **Normalization**
Normalization rescales data to fit within a specific range, usually between 0 and 1. It’s commonly used when the data doesn’t follow a Gaussian (normal) distribution or when the algorithm relies on a defined data range.

**Formula**:
\[
X_{\text{normalized}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}
\]

- **Purpose**: Useful when features have different scales or ranges (e.g., age and income).
- **Applications**: Useful in algorithms like neural networks and k-nearest neighbors (KNN), which use distance-based calculations and can benefit from all features being on the same scale.

### 2. **Standardization**
Standardization transforms the data so that it has a mean of 0 and a standard deviation of 1. It centers the data by subtracting the mean and then scales it by dividing by the standard deviation.

**Formula**:
\[
X_{\text{standardized}} = \frac{X - \mu}{\sigma}
\]
where \(\mu\) is the mean, and \(\sigma\) is the standard deviation.

- **Purpose**: Useful for data that follows a Gaussian distribution or when the algorithm assumes data is normally distributed.
- **Applications**: Commonly used in linear regression, logistic regression, and support vector machines (SVMs), which assume normally distributed data.

### Why These Techniques Are Helpful
1. **Improved Model Performance**: Many algorithms are sensitive to the scale of data. Properly scaled data can lead to faster training times and more accurate models.
2. **Prevents Bias in Distance-Based Algorithms**: Normalizing or standardizing ensures that features with larger ranges don't dominate distance calculations in algorithms like KNN and k-means clustering.
3. **Stability and Convergence**: Optimization algorithms, like gradient descent, converge faster when data is on a consistent scale. This can make training more efficient and less prone to numerical instability.

In summary, both techniques aim to bring data to a comparable scale, making it easier for models to learn patterns effectively.

### What techniques can be used to address multicollinearity in multiple linear regression?

### 1. **Remove Highly Correlated Predictors**
   - **Approach**: Identify and remove predictors with high correlation. 
   - **Method**: Calculate the correlation matrix of predictors and remove those with high correlation (typically, above 0.8 or 0.9).
   - **Trade-Off**: This can simplify the model but may lose potentially valuable information.

### 2. **Principal Component Analysis (PCA)**
   - **Approach**: PCA transforms correlated predictors into a set of linearly uncorrelated components.
   - **Method**: Create new predictor variables (principal components) that are combinations of the original variables, capturing the majority of variance in the data.
   - **Trade-Off**: PCA can reduce multicollinearity but may make interpretation harder, as new components don’t correspond directly to original variables.

### 3. **Partial Least Squares Regression (PLS)**
   - **Approach**: Similar to PCA, PLS reduces the number of predictors to a smaller set of uncorrelated components, while also considering the response variable.
   - **Method**: PLS creates components by maximizing the covariance between predictors and the target variable.
   - **Trade-Off**: It handles multicollinearity and maintains some interpretability, though it still alters the original predictor structure.

### 4. **Regularization Techniques (Ridge and Lasso Regression)**
   - **Approach**: Regularization adds a penalty to the regression model to reduce the effect of correlated predictors.
   - **Ridge Regression**: Adds an L2 penalty (squared magnitude of coefficients) to reduce the impact of multicollinearity by shrinking coefficients of correlated predictors.
   - **Lasso Regression**: Adds an L1 penalty (absolute value of coefficients), which can shrink some coefficients to zero, effectively selecting a subset of predictors.
   - **Trade-Off**: Regularization reduces multicollinearity, but the model becomes slightly more complex and less interpretable, especially in Ridge regression.

### 5. **Variance Inflation Factor (VIF)**
   - **Approach**: Calculate the VIF for each predictor to quantify the extent of multicollinearity.
   - **Method**: A high VIF (typically >10) suggests that the predictor is highly collinear with others. Consider removing predictors with high VIF scores.
   - **Trade-Off**: This technique helps identify problematic predictors, though it may require removing some potentially useful variables.

### 6. **Centering Variables**
   - **Approach**: Subtract the mean of each predictor variable from its values to center them around zero.
   - **Method**: This does not eliminate multicollinearity but can reduce its impact on the interaction terms, especially if interaction terms are included in the model.
   - **Trade-Off**: Centering maintains the information of predictors but may only partially reduce multicollinearity issues.

### 7. **Data Collection or Transformation**
   - **Approach**: Sometimes multicollinearity is reduced by increasing sample size or gathering additional data. Alternatively, log or polynomial transformations can help reduce correlations between variables.
   - **Trade-Off**: Data collection may not be feasible, and transformations may make interpretation more challenging.

### Summary Table of Techniques

| Technique | Advantage | Drawback |
|-----------|-----------|----------|
| Remove Highly Correlated Predictors | Simplifies the model | Potential loss of information |
| PCA | Reduces multicollinearity effectively | Reduces interpretability |
| PLS | Maintains relevance to the target variable | Alters predictor structure |
| Ridge/Lasso Regression | Reduces multicollinearity | Complexity in interpretation |
| VIF Analysis | Identifies collinear predictors | May require variable removal |
| Centering | Reduces impact on interaction terms | Only partial solution |
| Data Collection/Transformation | May solve multicollinearity | Not always feasible |

Each of these techniques offers a way to address multicollinearity, so the choice depends on the specific context, goals, and requirements of the model.