# ML Basics

**What is linear regression?**

In simple terms, linear regression is a method of finding the best straight line fitting to the given data, i.e. finding the best linear relationship between the independent and dependent variables.
In technical terms, linear regression is a machine learning algorithm that finds the best linear-fit relationship on any given data, between independent and dependent variables. It is mostly done by the Sum of Squared Residuals Method.

**State the assumptions in a linear regression model.**
1. The assumption about the form of the model:
- It is assumed that there is a linear relationship between the dependent and independent variables. It is known as the ‘linearity assumption’.

2. Assumptions about the residuals:
 - Normality assumption: 
    - It is assumed that the error terms, ε(i), are normally distributed.
 - Zero mean assumption: 
    - It is assumed that the residuals have a mean value of zero.
 - Constant variance assumption: 
    - It is assumed that the residual terms have the same (but unknown) variance, $\sigma^2$     
    - This assumption is also known as the assumption of homogeneity or homoscedasticity.
 - Independent error assumption: 
    - It is assumed that the residual terms are independent of each other, i.e. their pair-wise covariance is zero.
 - If the residuals are not normally distributed, their randomness is lost, which implies that the model is not able to explain the relation in the data.
 - If the expectation(mean) of residuals, E(ε(i)), is zero, the expectations of the target variable and the model become the same, which is one of the targets of the model.
The residuals (also known as error terms) should be independent. This means that there is no correlation between the residuals and the predicted values, or among the residuals themselves. If some correlation is present, it implies that there is some relation that the regression model is not able to identify. 

3. Assumptions about the estimators:
 - The independent variables are measured without error.
 - The independent variables are linearly independent of each other, i.e. there is no multicollinearity in the data.
 - If the independent variables are not linearly independent of each other, the uniqueness of the least squares solution (or normal equation solution) is lost.


**What is the use of regularisation? Explain L1 and L2 regularisations.**

Regularisation is a technique that is used to tackle the problem of overfitting of the model. When a very complex model is implemented on the training data, it overfits. At times, the simple model might not be able to generalise the data and the complex model overfits. To address this problem, regularisation is used.
Regularisation is nothing but adding the coefficient terms (betas) to the cost function so that the terms are penalised and are small in magnitude. This essentially helps in capturing the trends in the data and at the same time prevents overfitting by not letting the model become too complex. 

- L1 or LASSO regularisation: Here, the absolute values of the coefficients are added to the cost function. This can be seen in the following equation; the highlighted part corresponds to the L1 or LASSO regularisation. This regularisation technique gives sparse results, which lead to feature selection as well.
![image.png](attachment:46a0a222-6022-48d6-967e-d7f5c9ac2f52.png)
- L2 or Ridge regularisation: Here, the squares of the coefficients are added to the cost function. This can be seen in the following equation, where the highlighted part corresponds to the L2 or Ridge regularisation.

![image.png](attachment:0b38e40e-7eeb-41a5-a5ed-a03109ca625e.png)

Inroducing a penalty to the sum of the weights means that the model has to “distribute” its weights optimally, so naturally most of this “resource” will go to the simple features that explain most of the variance, with complex features getting small or zero weights.

https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261

**How to choose the value of the regularisation parameter (λ)?**

Selecting the regularisation parameter is a tricky business. If the value of λ is too high, it will lead to extremely small values of the regression coefficient β, which will lead to the model underfitting (high bias – low variance). On the other hand, if the value of λ is 0 (very small), the model will tend to overfit the training data (low bias – high variance).
There is no proper way to select the value of λ. What you can do is have a sub-sample of data and run the algorithm multiple times on different sets. Here, the person has to decide how much variance can be tolerated. Once the user is satisfied with the variance, that value of λ can be chosen for the full dataset.
One thing to be noted is that the value of λ selected here was optimal for that subset, not for the entire training data. 