# Regression

## Linear Regression 

Linear regression is a fundamental statistical and machine learning technique used for predictive analysis and modeling relationships between variables.

Linear regression aims to predict the value of a dependent variable based on one or more independent variables by fitting a linear equation to the observed data

Basic concept:
It establishes a linear relationship between variables, represented by a straight line (in simple linear regression) or a hyperplane (in multiple linear regression)

Simple linear regression: Involves one independent variable and one dependent variable
Multiple linear regression: Involves multiple independent variables and one dependent variable


![image](https://images.spiceworks.com/wp-content/uploads/2022/04/07040339/25-4.png)


** Assumptions:**
Linear regression relies on certain assumptions, such as linearity, independence of errors, homoscedasticity, and normality of residuals.
Homoscedasticity: The variance of residuals should be constant across all levels of the independent variables


## **UnderFitting OverFitting**

![](https://docs.aws.amazon.com/images/machine-learning/latest/dg/images/mlconcepts_image5.png)

![](https://cdn.prod.website-files.com/6108e07db6795265f203a636/64491a56b7454104ae269887_Overvvsunder%20%281%29.jpg)


Underfitting in machine learning occurs when a model is too simple to capture the underlying patterns in the data. This results in poor performance on both the training set and unseen data.

Causes
Model Simplicity: Using a model that is too simple, such as a linear model for data that has a non-linear relationship.
Insufficient Training: Not training the model long enough or using poorly chosen hyperparameters.
Inadequate Data: Having too few training samples or data that does not represent the full range of possible values.
High Bias: Models with high bias, such as those that assume overly simplistic relationships, are prone to underfitting

Symptoms:
Poor performance on both the training set and new data.
High bias and low variance in model predictions

**Overfitting**
Definition:
Overfitting occurs when a model becomes too closely adapted to the specific details and noise in the training data, rather than learning the general underlying patterns.

Characteristics:
High accuracy on training data
Poor performance on new, unseen data
High variance and low bias in predictions

Causes:
Complex models that are too flexible relative to the amount of training data
Insufficient training data
Noisy data with errors or random fluctuations
Training for too long or with too many iterations

Detection:
Large gap between training and validation performance
Learning curves that show divergence between training and validation errors
Poor performance on cross-validation tests

Consequences:
Reduced ability to generalize to new data
Unreliable predictions in real-world applications
Capturing noise rather than true underlying patterns

Prevention and mitigation:
Use simpler models or reduce model complexity
Increase the amount of training data
Apply regularization techniques (e.g., L1/L2 regularization)
Use early stopping during training
Perform cross-validation
Use ensemble methods
Improve data quality and remove noise

**Regularization**
How to overcome overfitting: Regularization
L1 and L2 regularization: prevent model weights from becoming overly specific,
aka prevent big/large weights

L1 is good when some features might not be relevant, L1 can drive them to zero
L2 is good for correlated features, as it promotes equal features

Regularization aims to reduce model complexity and prevent overfitting by adding a penalty term to the loss function.

Lasso (L1) Regularization:
Encourages sparsity by shrinking some coefficients to exactly zero.
Useful for feature selection in high-dimensional datasets.

Ridge (L2) Regularization:
Shrinks all coefficients towards zero, but rarely makes them exactly zero.
Effective in handling multicollinearity.

Elastic Net:
Combines L1 and L2 penalties, offering a balance between feature selection and handling multicollinearity.


![](https://miro.medium.com/v2/resize:fit:1400/1*rVTCIffI2D_-i_CGeHwF6A.png)
