## Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how can they be mitigated?

Overfitting: 

         When a model performs very well for training data but has poor performance with test data (new data), it is known as overfitting. In this case, the machine learning model learns the details and noise in the training data such that it negatively affects the performance of the model on test data. 
         Overfitting can happen due to low bias and high variance.

Reasons for Overfitting are as follows:

1) Data used for training is not cleaned and contains noise (garbage values) in it
2) The model has a high variance
3) The size of the training dataset used is not enough
4) The model is too complex 

Ways to Tackle Overfitting

1) Using K-fold cross-validation
2) Using Regularization techniques such as Lasso and Ridge
3) Training model with sufficient data
4) Adopting ensembling techniques.
         
Underfitting:
            When a model has not learned the patterns in the training data well and is unable to generalize well on the new data, it is known as underfitting. An underfit model has poor performance on the training data and will result in unreliable predictions. 
            
                 Underfitting occurs due to high bias and high variance.
            
Reasons for Underfitting:

1) Data used for training is not cleaned and contains noise (garbage values) in it
2) The model has a high bias
3) The size of the training dataset used is not enough
4) The model is too simple


Ways to Tackle Underfitting:

1) Increase the number of features in the dataset
2) Increase model complexity
3) Reduce noise in the data
4) Increase the duration of training the data

## Q2: How can we reduce overfitting? Explain in brief.

Example of Overfitting:

Let’s say we want to predict if a student will land a job interview based on her resume.
Now, assume we train a model from a dataset of 10,000 resumes and their outcomes.
Next, we try the model out on the original dataset, and it predicts outcomes with 99% accuracy… wow!
But when we run the model on a new (“unseen”) dataset of resumes, we only get 50% accuracy.
Our model doesn’t generalize well from our training data to unseen data.
When if our model does much better on the training set than on the test set, then we’re likely overfitting.

For example, it would be a big red flag if our model saw 99% accuracy on the training set but only 55% accuracy on the test set.


How to Prevent Overfitting in Machine Learning:

Cross-validation:
   
Cross-validation is a powerful preventative measure against overfitting.
Use your initial training data to generate multiple mini train-test splits. Use these splits to tune your model.
In standard k-fold cross-validation, we partition the data into k subsets, called folds. Then, we iteratively train the algorithm on k-1 folds while using the remaining fold as the test set (called the “holdout fold”).

Cross-validation allows you to tune hyperparameters with only your original training set. 
This allows you to keep your test set as a truly unseen dataset for selecting your final model.

Train with sufficient data:

It won’t work every time, but training with more data can help algorithms detect the signal better.
Of course, that’s not always the case. If we just add more noisy data, this technique won’t help. That’s why you should always ensure your data is clean and relevant.

Remove features:

Some algorithms have built-in feature selection.
For those that don’t, you can manually improve their generalizability by removing irrelevant input features.
There are several feature selection heuristics you can use.

Early stopping:

When you’re training a learning algorithm iteratively, you can measure how well each iteration of the model performs.
Up until a certain number of iterations, new iterations improve the model. After that point, however, the model’s ability to generalize can weaken as it begins to overfit the training data.
Early stopping refers stopping the training process before the learner passes that point.

Regularization:

Regularization refers to a broad range of techniques for artificially forcing your model to be simpler.
The method will depend on the type of learner you’re using. For example, you could prune a decision tree, use dropout on a neural network, or add a penalty parameter to the cost function in regression.
The egularization method is a hyperparameter as well, which means it can be tuned through cross-validation.

Ensembling:

Ensembles are machine learning methods for combining predictions from multiple separate models. There are a few different methods for ensembling, but the two most common are:

Bagging attempts to reduce the chance overfitting complex models.

    a) It trains a large number of “strong” learners in parallel.
    b) A strong learner is a model that’s relatively unconstrained.
    c) Bagging then combines all the strong learners together in order to “smooth out” their predictions.
    
Boosting attempts to improve the predictive flexibility of simple models.

    a) It trains a large number of “weak” learners in sequence.
    b) A weak learner is a constrained model (i.e. you could limit the max depth of each decision    tree).
    c) Each one in the sequence focuses on learning from the mistakes of the one before it.
    d) Boosting then combines all the weak learners into a single strong learner.
    
While bagging and boosting are both ensemble methods, they approach the problem from opposite directions.

Bagging uses complex base models and tries to “smooth out” their predictions, while boosting uses simple base models and tries to “boost” their aggregate complexity.

## Q3: Explain underfitting. List scenarios where underfitting can occur in ML.

Underfitting:
            Underfitting is a common problem in machine learning where a model is not able to capture the underlying patterns in the training data and therefore performs poorly on both the training data and new, unseen data. 
            In other words, the model is too simple to represent the complexity of the data and fails to capture important relationships between the input and output variables. 

List of scenario where underfitteng can occur:

1) Model complexity: 
            underfitted models don’t effectively capture the relationship between the input and output data because it is too simple.
            
2) Insufficient training data:
            A model may underfit the data if there is not enough training data to capture the underlying patterns. In such cases, the model may generalize poorly to new, unseen data.
 
3) Feature selection:
            If we select features that are not relevant or informative, the model may not be able to capture the underlying patterns in the data and may underfit the data.
            
4) Regularization:
            Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function, which discourages the model from using overly complex solutions. However, if the regularization parameter is set too high, the model may become too simple and underfit the data.
            
5) Preprocessing:
            Preprocessing the data before training the model is important to ensure that the data is in an appropriate format and that any outliers or noise are removed. If the data is not preprocessed properly, the model may underfit the data.


## Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?

To make predictions, our model will analyze our data and find patterns in it. Using these patterns, we can make generalizations about certain instances in our data. 
Our model job is to learn from the data i.e training data applies them to the test set to predict them.

Bias:
        Bias is the difference between our actual and predicted values. Bias is the simple assumptions that our model makes about our data to be able to predict new data.
        During training, it allows our model to ‘see’ the data a certain number of times to find patterns in it. If it does not work on the data for long enough, it will not find patterns and bias occurs. 
        
Low Bias : Low bias means diffrence between predicted and actual value is low i.e our model is capturing the training data very well and learning from them is also very good.i.e we are predicting very good way.

        line of best fit is a straight line that  pass through most of the data points

High Bias :  Here difference between actual value and predicted value having far difference.
    This ia caused because assumptions made by our model are too basic, the model can’t capture the important features of our data. This means that our model hasn’t captured patterns in the training data and hence cannot perform well on the testing data too. If this is the case, our model cannot perform on new data and cannot be sent into production. 
    
       line of best fit is a straight line that does not pass through any of the data points.
       High bias occur underfitting.
    
Variance:

 Variance is the very opposite of Bias. On the other hand, if our model is allowed to view the data too many times, it will learn very well for only that data. It will capture most patterns in the data,  but it will also learn from the unnecessary data present, or from the noise.

    We can define variance as the model’s sensitivity to fluctuations in the data. Our model may learn from noise. This will cause our model to consider trivial features as important
    
    Example : our model has learned extremely well for our training data, which has taught it to identify cats. But when given new data, such as the picture of a fox, our model predicts it as a cat, as that is what it has learned. This happens when the Variance is high, our model will capture all the features of the data given to it, including the noise, will tune itself to the data, and predict it very well but when given new data, it cannot predict on it as it is too specific to training data. 
    
    i.e high varience tries to fit each and every data point to the best fit line so it causes overfitting.
    
Low Varience :
            It is trained with the sufficient data so the model will give good accuracy on test data.
            
High Varience :
            It is trained too well including noise so it tries to fit each and every data point of the sample and caused high varience.
            
bias-variance tradeoff:
 
For any model, we have to find the perfect balance between Bias and Variance. 
This just ensures that we capture the essential patterns in our model while ignoring the noise present it in. This is called Bias-Variance Tradeoff. 
It helps optimize the error in our model and keeps it as low as possible.

An optimized model will be sensitive to the patterns in our data, but at the same time will be able to generalize to new data. In this, both the bias and variance should be low so as to prevent overfitting and underfitting.
        We need low bias and low varience.\





## Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models.How can you determine whether your model is overfitting or underfitting?

The easiest way to detect overfitting is to perform cross-validation.
The most commonly used method is known as k-fold cross validation 

How to check if the model is overfitting or underfitting:

There are several ways to detect over- or under-fitting in a machine learning model:

Plot the learning curves: 

        Learning curves show the model’s performance on training and validation data over time as the model is being trained. If the model is overfitting, you will see that the training error continues to decrease over time, while the validation error starts to increase after a certain point. This indicates that the model is beginning to memorise the training data and needs to be generalised well to new, unseen data.

Evaluate the model on a holdout set: 

        A holdout set is a subset of the data that is not used during training but is used to evaluate the model after training. If the model performs well on the training data but poorly on the holdout set, it may be overfitting the training data.
        
Use cross-validation: 

             Cross-validation is a technique where the data is divided into k-folds, and the model is trained and evaluated on each fold. If the model performs well on the training data but poorly on the validation data, it may need to be more balanced.
             
Regularise the model: 

    Regularization is a method that adds a penalty term to the loss function to stop the model from becoming too similar to the training data. By changing the regularisation parameter, you can control how hard the model is to understand and prevent it from becoming too simple.
    
Use simpler models:

        If your complex model is overfitting the data, you can use simpler models less prone to overfitting, such as linear models or decision trees with low depth.
        
    In general, it’s crucial to monitor the model’s performance during training and evaluation and to be aware of the trade-off between model complexity and generalisation performance.

## Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?

In [None]:
                                     Bias                               Varience
    
1) Model Complexity         High model complexity tend              High model complexity tend         
                                to low bias                         to high varience
    
2) Parametric and linear     High bias                               Low Varience
    Algorithm 
     
3)Non Parametric and            Low Bias                                   High Varience   
    Non linear algo
    
4)Causes                    High bias:Underfitting                  High Varience:Overfitting

5)Example                    Low Bias :KNN Decision Tree           Low Varience:Linear regression
                             High bias :Linear Regression           High Varience : SVM 

In [None]:
some examples of high bias and high variance models:

Algorithm                        Bias                                  Variance

Linear Regression                High Bias                       Less Variance
Decision Tree                    Low Bias                        High Variance
Bagging                          Low Bias                        High Variance (Less than Decision Tree)
Random Forest                    Low Bias                        High Variance (Less than Decision Tree and Bagging)

**high bias and high variance models, and how do they differ in terms of their performance**

A high bias model is one that has oversimplified the problem and has high bias towards a particular hypothesis. Such models typically underfit the data and may have low accuracy on both the training and testing datasets. They may miss important patterns or relationships in the data and make very basic assumptions about the data. For example, a linear regression model may have high bias if it tries to fit a linear line to a dataset that has a complex nonlinear relationship. This model will have low training accuracy and will also have low accuracy on new data.

A high variance model, on the other hand, is one that is overly complex and has high variance towards the training data. Such models typically overfit the data and have high accuracy on the training dataset, but may have poor accuracy on new data. They may fit noise in the data and are very sensitive to small fluctuations in the training dataset. For example, a decision tree with a large number of branches can be an example of a high variance model. This model can fit the training data very well, but can have poor accuracy on new data due to overfitting.

In general, a model that has high bias has low variance and is less sensitive to changes in the training data. Such models may be underfitting the data and need more complexity to fit the data well. A model that has high variance has low bias and is more sensitive to changes in the training data. Such models may be overfitting the data and need regularization techniques to reduce variance.

In terms of performance, a high bias model will have low training and testing accuracy, while a high variance model will have high training accuracy and low testing accuracy. The ideal model is one that has low bias and low variance, which means that it fits the data well and generalizes well to new data. Achieving this balance is known as the bias-variance tradeoff, and finding the optimal balance can be a challenging task in machine learning.


## Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.

With the hehp of regularization we can solve few common issues:

    1)minimizing model complexity
    2)Penalizing the loss function
    3)Reducing model overfitting
         
Regularization is a technique used to reduce the errors by fitting the function appropriately on the given training set and avoid overfitting.

Regularization is a technique that penalizes the coefficient.
In an overfit model, the coefficients are generally inflated.
Thus, Regularization adds penalties to the parameters and avoids them weigh heavily.

The coefficients are added to the cost function of the linear equation. Thus, if the coefficient inflates, the cost function will increase. And Linear regression model will try to optimize the coefficient in order to minimize the cost function.

The commonly used regularization techniques are : 

1) L2 regularization or Ridge Regression:
        
         In this model we have high varience and low bias as traning data it fir very well but varience is high while testing the dataset.so This regression model is overfit in nature.
         
         Solution: here we are incresing the bias and lowering the varience.This will done by changing the slope(best fit line)
          here model performance is little poor in case of training set but it will consistance inprove on the testing data set.
          
          Here slope has been reduced with ridge regression penalty and therefore model become less sensetive to changes in independent variable.
          L2 Regularization technique is also known as Ridge. In this, the penalty term added to the cost function is the summation of the squared value of coefficients. Unlike the LASSO term, the Ridge term uses squared values of the coefficient and can reduce the coefficient value near to 0 but not exactly 0. Ridge distributes the coefficient value across all the features.
          
             Least Sqaured Regression =Min (sum of squared residuals)
             
             
             Ridge Regression = Min (sum of squared residuals + alpha*slope**2       
                 
                          alpha*slope**2= penalty term
                          
          Alpha effect :1)Alpha increases the slope of rgression line is reduced and become more horizontal
                        2) Alpha increases become less sensetive to variation of independent variable
2) L1 regularization or LASSO  regularization:
        
        Losso is same as ridge regression but here bias term is absolute value of slope is added as penalty term
        
                    LOSSO Regression = Min (sum of squared residuals + alpha*|slope|) 
                 
                          alpha*slope**2= penalty term
                          
          Effect of LOSSO regression is same as Ridge regression.
          
          Alpha effect :1)Alpha increases the slope of rgression line is reduced and become more horizontal
                        2) Alpha increases become less sensetive to variation of independent variable
                        
                        
        L1 Regularization technique is also known as LASSO or Least Absolute Shrinkage and Selection Operator. In this, the penalty term added to the cost function is the summation of absolute values of the coefficients. Since the absolute value of the coefficients is used, it can reduce the coefficient to 0 and such features may completely get discarded in LASSO. Thus, we can say, LASSO helps in Regularization as well as Feature Selection.
    

        
3) ElasticNet Regression:

        ElasticNet Regression is a linear model built by applying both L1 and L2 penalty terms. 
        


1) L1 regularization or LASSO  regularization:

       Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the loss function(L). 
       
        ||W||1 = |W1|+|W2|+|W3|+...+|Wn|
        
        
2) L2 regularization or Ridge Regression:

        Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function(L). 
        
    ||W||2 = (|W1|**2+|W2|**2+|W3|**2+...+|Wn|**2)**0.5
    

**NOTE that during Regularization the output function(y_hat) does not change. The change is only in the loss function. 

The output function is :

    y^ =w1x1+w2x2+-------------+WnXn+b
    
**The loss function befor regularization 

        Loss = Error(y,y^)
        
**The loss function after regularization(L1)

        Loss = Error(y,y^)+lamba(summation of 1 to N)|Wi|
        
**The loss function after regularization(L2)

        Loss = Error(y,y^)+lamba(summation of 1 to N)|Wi|**2
        
**We define Loss function in Logistic Regression as : 

         L(y_hat,y) = y log y_hat + (1 - y)log(1 - y_hat)
         
**Loss function with no regularization : 

         L = y log (wx + b) + (1 - y)log(1 - (wx + b)) 
         
 Lets say the data overfits the above function.    


**Loss function with L1 regularization : 
            
            L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||1    
            
**L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||1    


             L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||22    

lambda is a Hyperparameter Known as regularization constant and it is greater than zero  
                            lambda > 0
                    