## Bias vs Variance tradeoff

In the past few years, machine learning has become a very popular tool which is used across a plethora of industries, not limited to any one sector. This is because it allows computers to learn from the data and improve the accuracy of their predictions by themselves but it has also been in the news lately due to the controversy around its use in predicting outcomes in the real world. But how good a machine learning model is generally depends on the data being used to train it. This is where the concept of bias vs variance tradeoff comes into the picture. 

So what basically is bias and variance in the ml world? There is a lot of confusion around these terms, so it’s important to understand them. Let's shed some light on these topics.

# BIAS

Bias in machine learning refers to the unintentional errors that are introduced into the predictions made by a machine learning algorithm due to the inherent limitations of the learning algorithm or the data. These errors can lead to incorrect predictions and can distort the overall performance of the machine learning algorithm.

In simple terms, bias is a measure of how close the predicted values are to the real training data. A high bias gives a large error in the training as well as the test data.
NOTE: High bias also generally implies underfiiting, which happens when the model is not trained enough or when the model is too simple( Like too shallow a tree in case of decision trees based algorithms)

Sum of squares error(SSE, also known as the residual sum of squares) is a good meaure of the biasness, where error is the difference between the observed values and the predicted values(predicted by the model). Error terms are squared so that the negative terms do not cancel out the positive ones.The larger the SSE value the more biasness there is in the model.


![Screenshot from 2022-06-20 18-35-30.png](attachment:c83a6533-51ab-4280-8349-2e29468efdc3.png)

**Reasons for bias**

* **Bias due to underfit**:

   High bias also generally implies underfiiting. What is a underfit model? When a model 
   doesn't perform well on the training as well as the test data, then it is called 
   underfitting, which happens when the model is not trained enough or when the model is too      simple( Like too shallow a tree in case of decision trees based algorithms)
   
   ![Screenshot from 2022-06-20 23-53-23.png](attachment:e1ad7a14-ad8d-42e8-8a32-806cce06d1ae.png)

* **Bias due to data:**

  One of the most common sources of bias in machine learning is the bias of the data. Data bias   refers to the way in which the data is selected, collected, or processed. Data can be biased   in many different ways, including by the nature of the data itself, the way the data is         labeled, and the way the data is used in the machine learning algorithm. 
  
  That's why the data used for the training of the ML algorithm should be the representative of   the whole population, it shouldn't be biased in any manner, be it gender, race, religion or     anything. It should be a random sample from the whole population.

  
* **Bias due to the limitation of the ml algorithm being used:**

  If the machine learning algorithm being used is not strong enough to capture the true           relationship in the data set, it leads to bias.
  
  ![Screenshot from 2022-06-20 19-39-16.png](attachment:1f401932-beb0-43f8-926f-9d0b044490be.png)
  
  For example, in the diagram above the datset is not linear and thus it's not possible to fit 
  a linear regression line on the data. So, if we will use a linear regression model then we 
  will have the problem of high bias.
  
  Thus, generally while training a model our aim should be to have a model with low bias, but 
  we shouldn't try to reduce the bias too much otherwise we will face the problem of 
  overfitting,which in turn will lead to the problem of high variance.
  So what is variance? Let's dive into it.
  


# Variance

While bias is a measure of how much a model's predictions vary from the true values, Variance is a measure of how much a model's predictions vary from each other.

In other words, a model with low bias is likely to produce more accurate predictions than a model with high bias whereas a model with low variance is likely to produce predictions that are more consistent with each other than a model with high variance.

![Screenshot from 2022-06-20 23-54-37.png](attachment:a098c4b6-5254-4695-b4b0-9b4b28b01bb5.png)


Thus, the difference in fits between different data sets is variance.

**Reasons for variance**

* **Variance due to overfit**:

  One of the most common reason for the variance is overfitting. Overfitting is when the model 
  performs really well on the training data but gives high error( or inaccurate predictions) 
  when used on the new dataset(test data) which the model hasn't seen before. And how the 
  problem of overfitting arises, yeah you guessed it right, when we train our model too much 
  (like with large number of training cycles) or when we use too complex a model( like using 
  too deep a tree in decision trees).
  There are two types of information in a dataset, signal and noise. The aim of any machine       learning algorithn should be to learn the signal only,without learning any noise, but it's     not possible. No matter how good your model is, it will always learn some kind of noise.
  But when we overtrain our model in the hope of learning signal(general pattern) only, it also 
  learns a large amount of random noise which is a problem.Noise can be thought of as patterns
  which don't really exist in the true population(training data is a sample of the true 
  population).
  
 
  
  ![Screenshot from 2022-06-20 18-35-19.png](attachment:3523f4f9-a19f-4487-a43e-485f667a903e.png)
  
  For example the model in the picture above is trained too much(or is too complex) and thus
  has high variance and almost zero bias for the training data. But is this a good model? No!
  Absolutely no. This is a disaster and a classic case of overfitting where the model works       almost perfect for the training data and on the datasets which are very similar to the 
  training data,but it seldom happens in the real world. It's very rare that any
  randomly selected data(test data) will have a high similarity with the training data.
  So, for any new dataset it will almost always give very inaccurate predictions.
  
  
  
  ![Screenshot from 2022-06-20 20-32-19.png](attachment:d49b496a-8e64-4510-9c87-a16d818f45de.png)
   
 * **Variance due to a small sample size:**
 
    A machine learning model can also inherit variance if the sample size used for the training     purpose is too small for it to learn any general patterns.
    
    ![Screenshot from 2022-06-20 22-14-40.png](attachment:fb5d9221-17bf-499c-bd30-b90d7fec03bf.png)
    
    Due to such a small sample(which is not a representative of the whole population), any type     of algorithm, irrespective of how good it is, will never be able to learn the true             relationship(general pattern in the dataset).
    
    Like in the picture below, the model has low bias, but it has high variance as it'll 
    perform completely differnt on the test data(or any data it comes across)
   
   ![Screenshot from 2022-06-20 22-46-22.png](attachment:5fece734-9856-4140-b352-3db386486f14.png)
    

   The real regression line should be the one shown as red in the picture below, but we'll get
   the one shown as black. 
   To solve this problem sometimes a bias is introduced in the model in the hope that it can 
   learn the true realationship(shown by the red regression line). This is caled good bias.
   Regularization is one such technique which deals with this problem in the case of linear 
   regression. This is a vast topic in itself. I can write a complete blog based on the            regularization. But for now, just know that there are two types of regularization              techniques:L1 or Lasso regularization and L2 or Ridge regularization, and both these 
   techniques penalize the loss function(sum of squares errors) with a slope term and in this
   way they introduce a little bias in the model.
 
   
  ![Screenshot from 2022-06-20 22-45-35.png](attachment:4dfa3c5b-9fd2-4d71-8ae6-9e0b708dde2d.png)
  
  
  # Bias Variance Tradeoff
  
  So, as we have seen above that zero bias(in case of overfitting) and sometimes even a low   bias(when sample size is too small) can be problematic(leads to high variance).But we also don't want a model with high bias(underfit). That's why we need some bias-variance tradeoff.
  
   But generally,the ideal model is one which can accurately learn the true relationship and has low bias and has low variabilty as well, which means it produces consistent good results across a vast range of datasets. To achieve this we need a model which is a sweet spot between a simple and complex model. There are a number of techniques which are used to find this sweet spot but the most common ones are regularization, bagging and boosting.
  
  
  The following diagram perfectly represents bias-variance tradeoff:
  
  ![Screenshot from 2022-06-20 23-36-18.png](attachment:c262edbb-24eb-4083-91b1-1ccd76b95d68.png)
  
 As model complexity increases(overfitting), the biasness decreases but at the same time,variance increases and also the total error(sum of variance and bias^2 increases.
 Also as the model complexity decreases(underfitting),the variance decreases and the bias increases.
 In practice, our main objective is to decrease the total error, we don't bother about the bias and variance individually.But as it is clear from the dagram, the total error is at its minima at the point where there is a right balanc between the variance and error term, and it is also the point where the model complex complexity is also at its optimum point(neither too simple, nor too complex)