Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how
can they be mitigated?
- Overfitting: Overfitting occurs when a machine learning model learns the training data too well, capturing noise or random fluctuations in the data rather than the underlying patterns. This leads to a model that performs exceptionally well on the training data but poorly on unseen or new data.
- Underfitting: Underfitting happens when a model is too simple to capture the underlying patterns in the data. It results in poor performance both on the training data and unseen data.

Consequences:
- Overfitting: High training accuracy but low test accuracy, poor generalization, and a model that is too complex.
- Underfitting: Low training and test accuracy, inadequate model complexity.

Mitigation:
- Overfitting: Reduce model complexity, use more training data, employ regularization techniques, and consider feature selection.
- Underfitting: Increase model complexity, add more relevant features, and choose a more expressive algorithm.

Q2: How can we reduce overfitting? Explain in brief.
To reduce overfitting, you can:
1. Reduce model complexity by using simpler algorithms or reducing the number of features.
2. Increase the amount of training data to help the model generalize better.
3. Use regularization techniques like L1 or L2 regularization.
4. Employ cross-validation to tune hyperparameters effectively.
5. Early stopping: Monitor the model's performance on a validation set and stop training when performance starts to degrade.


Q3: Explain underfitting. List scenarios where underfitting can occur in ML.
Underfitting occurs when a machine learning model is too simple to capture the underlying patterns in the data. It can happen in scenarios like:
- Using a linear model for inherently non-linear data.
- Using a small or inadequate feature set.
- Using a model with very low complexity, like a simple linear regression on complex data.


Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?
The bias-variance tradeoff is a fundamental concept in machine learning:

- Bias: Bias refers to the error due to overly simplistic assumptions in the learning algorithm. High bias can lead to underfitting.
- Variance: Variance refers to the error due to too much complexity in the learning algorithm. High variance can lead to overfitting.

The relationship is that increasing model complexity reduces bias but increases variance, while decreasing complexity increases bias but reduces variance. Finding the right balance between bias and variance is crucial for achieving good model performance.


Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models. How can you determine whether your model is overfitting or underfitting?
Common methods for detecting overfitting and underfitting include:
- Learning Curves: Plotting training and validation/test performance as a function of the dataset size can reveal overfitting or underfitting trends.
- Cross-Validation: Using k-fold cross-validation helps assess model generalization on different data subsets.
- Validation Set Performance: Monitoring the performance on a separate validation set can reveal signs of overfitting.
- Visual Inspection: Visualizing the model's fit to the data can provide insights into overfitting or underfitting.

You can determine whether your model is overfitting or underfitting based on these methods and adjust the model or training process accordingly.

Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?
Bias and variance are two sources of error in machine learning models:
- High bias (underfitting): The model is too simple to capture the underlying patterns, resulting in poor performance on both training and test data.
- High variance (overfitting): The model is overly complex, capturing noise and random fluctuations in the training data, leading to excellent training performance but poor test performance.

High bias models are typically too simplistic, while high variance models are overly complex. The goal is to strike a balance between bias and variance to achieve the best generalization performance.

Examples:
- High bias: Linear regression applied to non-linear data.
- High variance: A deep neural network with too many layers and parameters for a small dataset.

Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.
Regularization in machine learning is a technique used to prevent overfitting by adding a penalty term to the model's loss function. It encourages the model to have smaller coefficients or weights, making it less likely to fit the noise in the data. Common regularization techniques include:

1. L1 Regularization (Lasso): Adds the absolute values of the coefficients as a penalty term, promoting sparsity by driving some coefficients to zero.
2. L2 Regularization (Ridge): Adds the squared values of the coefficients as a penalty term, preventing large coefficient values.
3. Elastic Net: Combines both L1 and L2 regularization to strike a balance between feature selection and coefficient shrinkage.
4. Dropout: Used in neural networks, it randomly drops a fraction of neurons during training, preventing the model from relying too heavily on specific neurons.
5. Early Stopping: Halts the training process when the validation loss starts increasing, preventing the model from overfitting.

Regularization techniques help in controlling the complexity of the model, reducing overfitting, and improving generalization to unseen data.