Q1: **Define overfitting and underfitting in machine learning. What are the consequences of each, and how can they be mitigated?**

**Overfitting:**
- **Definition:** Overfitting occurs when a model learns the details and noise in the training data to the extent that it negatively impacts the performance of the model on new, unseen data. Essentially, the model becomes too complex and captures noise along with the underlying patterns.
- **Consequences:** The model performs well on training data but fails to generalize to new data, leading to poor performance on unseen examples.
- **Mitigation:** Techniques to mitigate overfitting include:
  - **Cross-validation:** Use techniques like k-fold cross-validation to assess model performance on unseen data.
  - **Regularization:** Add penalties to the loss function that discourage overly complex models.
  - **Simplifying the model:** Reduce model complexity by reducing the number of features, pruning decision trees, or using simpler algorithms.

**Underfitting:**
- **Definition:** Underfitting occurs when a model is too simple to capture the underlying pattern of the data. The model fails to learn from the training data and performs poorly on both training and new data.
- **Consequences:** The model shows low accuracy and high error rates on both training and test datasets.
- **Mitigation:** Techniques to mitigate underfitting include:
  - **Increasing model complexity:** Add more features, use more complex models, or tune hyperparameters to allow the model to capture more patterns in the data.
  - **Adding more data:** Provide more training examples to help the model learn better.

Q2: **How can we reduce overfitting? Explain in brief.**

To reduce overfitting, we can:
- **Regularization:** Add penalties to the loss function (e.g., L1 or L2 regularization) to discourage overly complex models.
- **Cross-validation:** Use techniques like k-fold cross-validation to evaluate the model's performance on unseen data.
- **Reduce model complexity:** Simplify the model by reducing the number of features, pruning decision trees, or using simpler algorithms.
- **Early stopping:** Stop training the model when performance on a validation set starts to degrade.
- **Ensemble methods:** Combine predictions from multiple models to improve generalization.

Q3: **Explain underfitting. List scenarios where underfitting can occur in ML.**

**Underfitting:** Underfitting occurs when a model is too simple to capture the underlying patterns in the data.

**Scenarios where underfitting can occur:**
- **Simple models:** Models with insufficient complexity or not enough parameters.
- **Insufficient training:** Training the model on too few examples or for too few epochs.
- **High bias:** Models that are biased toward certain assumptions or have limited expressive power.
- **Ignoring relevant features:** When important features or relationships in the data are not considered by the model.

Q4: **Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?**

**Bias-variance tradeoff:**
- **Definition:** The bias-variance tradeoff refers to the dilemma in machine learning between bias (error from erroneous assumptions) and variance (sensitivity to small fluctuations in the training set).
- **Relationship:** Models with high bias have simplistic assumptions and may underfit the data, whereas models with high variance may overfit by capturing noise rather than true patterns.
- **Impact on model performance:** 
  - **High bias:** Leads to underfitting and poor performance on both training and test data.
  - **High variance:** Leads to overfitting and good performance on training data but poor generalization to new data.

Q5: **Discuss some common methods for detecting overfitting and underfitting in machine learning models. How can you determine whether your model is overfitting or underfitting?**

**Detecting Overfitting:**
- **Validation and test sets:** Evaluate the model's performance on separate validation and test datasets.
- **Learning curves:** Plot the model's performance on training and validation sets as a function of training data size or epochs.
- **Cross-validation:** Use k-fold cross-validation to assess model performance on multiple splits of the data.

**Detecting Underfitting:**
- **Model complexity:** Check if the model is too simple to capture the underlying patterns in the data.
- **Performance metrics:** Compare the model's performance on training and validation sets; underfit models perform poorly on both.

Q6: **Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?**

**Bias:**
- **Definition:** Bias measures how far off the predictions are from the true values. High bias models make strong assumptions and fail to capture complex patterns in the data.
- **Example:** Linear regression with few features is typically high bias.

**Variance:**
- **Definition:** Variance measures the sensitivity of the model to small fluctuations in the training data. High variance models are overly complex and capture noise as well as true patterns.
- **Example:** Decision trees with unlimited depth can exhibit high variance.

**Performance:**
- **High bias:** Underfits the data, leading to poor performance on both training and test datasets.
- **High variance:** Overfits the data, leading to excellent performance on training data but poor generalization to new data.

Q7: **What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.**

**Regularization:** Regularization is a technique used to prevent overfitting by adding a penalty term to the model's loss function.

**Common regularization techniques:**
- **L1 Regularization (Lasso):** Adds the absolute value of the magnitude of coefficients as a penalty term.
- **L2 Regularization (Ridge):** Adds the square of the magnitude of coefficients as a penalty term.
- **Elastic Net:** Combines L1 and L2 regularization to balance their strengths.
- **Dropout:** Used in neural networks, where randomly selected neurons are ignored during training to prevent reliance on certain neurons.