**1. What is Logistic Regression, and how does it differ from Linear Regression?**

* **Linear Regression:**
    * Predicts a continuous output variable.
    * Models the relationship between independent and dependent variables as a straight line.
    * Output range: (-∞, +∞).
* **Logistic Regression:**
    * Predicts a categorical output variable (typically binary, 0 or 1).
    * Models the probability of a certain outcome occurring.
    * Output range: (0, 1).
    * Uses the sigmoid function to map the linear combination of inputs to a probability.


**2. What is the mathematical equation of Logistic Regression?**

The equation is:

* `p(y=1|x) = 1 / (1 + e^(-z))`
    * Where:
        * `p(y=1|x)` is the probability of the output being 1 given the input features `x`.
        * `e` is Euler's number (approximately 2.71828).
        * `z = b0 + b1*x1 + b2*x2 + ... + bn*xn` is the linear combination of input features, where `b0` is the intercept and `b1`, `b2`, ..., `bn` are the coefficients.

**3. Why do we use the Sigmoid function in Logistic Regression?**

* The sigmoid function (also called the logistic function) squashes any real-valued number into a range between 0 and 1.
* This makes it ideal for representing probabilities.
* It provides a smooth, S-shaped curve that's differentiable, which is important for optimization.


**4. What is the cost function of Logistic Regression?**

* The cost function is typically the **cross-entropy** (or log loss) function.
* For binary classification:
    * `Cost(h(x), y) = -y * log(h(x)) - (1 - y) * log(1 - h(x))`
    * Where:
        * `h(x)` is the predicted probability (output of the sigmoid function).
        * `y` is the actual label (0 or 1).
* The goal is to minimize this cost function.

**5. What is Regularization in Logistic Regression? Why is it needed?**

* Regularization adds a penalty term to the cost function to prevent overfitting.
* Overfitting occurs when a model learns the training data too well, including noise, and performs poorly on unseen data.
* Regularization reduces the complexity of the model by shrinking the coefficients.

**6. Explain the difference between Lasso, Ridge, and Elastic Net regression.**

* **Ridge Regression (L2 Regularization):**
    * Adds the squared magnitude of coefficients to the cost function.
    * Shrinks coefficients towards zero but rarely makes them exactly zero.
    * Reduces the impact of less important features.
* **Lasso Regression (L1 Regularization):**
    * Adds the absolute magnitude of coefficients to the cost function.
    * Can shrink some coefficients to exactly zero, effectively performing feature selection.
    * Useful when you suspect many features are irrelevant.
* **Elastic Net Regression:**
    * Combines L1 and L2 regularization.
    * Provides a balance between feature selection and coefficient shrinkage.
    * It helps when there are highly correlated features.

**7. When should we use Elastic Net instead of Lasso or Ridge?**

* When you have a dataset with many features, some of which are correlated.
* It provides a balance between the strengths of Lasso and Ridge.
* It is more stable than Lasso when dealing with highly correlated features.

**8. What is the impact of the regularization parameter (λ) in Logistic Regression?**

* λ (lambda) controls the strength of regularization.
* A larger λ increases regularization, shrinking coefficients more aggressively.
* A smaller λ reduces regularization, allowing coefficients to be larger.
* When lambda is zero, then there is no regularization.


**9. What are the key assumptions of Logistic Regression?**

* Binary output variable.
* Independence of observations.
* Little to no multicollinearity among predictors.
* Linearity between the log-odds of the outcome and the predictor variables.
* Adequate sample size.

**10. What are some alternatives to Logistic Regression for classification tasks?**

* Support Vector Machines (SVMs)
* Decision Trees
* Random Forests
* Gradient Boosting Machines (e.g., XGBoost, LightGBM)
* Neural Networks

**11. What are Classification Evaluation Metrics?**

* **Accuracy:** Overall correctness.
* **Precision:** Correct positive predictions out of all positive predictions.
* **Recall (Sensitivity):** Correct positive predictions out of all actual positives.
* **F1-score:** Harmonic mean of precision and recall.
* **AUC-ROC:** Area under the Receiver Operating Characteristic curve, measuring the model's ability to distinguish between classes.
* **Confusion Matrix:** A table showing true positives, true negatives, false positives, and false negatives.
 


**12. How does class imbalance affect Logistic Regression?**

* Class imbalance occurs when one class has significantly more samples than the other.
* It can lead to biased models that favor the majority class.
* Techniques to handle it include:
    * Oversampling the minority class.
    * Undersampling the majority class.
    * Using class weights in the cost function.

**13. What is Hyperparameter Tuning in Logistic Regression?**

* Hyperparameter tuning involves finding the optimal values for parameters that are not learned from the data (e.g., λ, solver).
* Methods include:
    * Grid search.
    * Random search.
    * Cross-validation.

**14. What are different solvers in Logistic Regression? Which one should be used?**

* Solvers optimize the cost function. Common ones include:
    * `liblinear`: Suitable for small datasets.
    * `lbfgs`: Good for small to medium datasets.
    * `sag`: Suitable for large datasets.
    * `saga`: Suitable for large datasets and handles L1 regularization well.
* The choice depends on the dataset size and regularization type. 

**15. How is Logistic Regression extended for multiclass classification?**

* **One-vs-Rest (OvR) or One-vs-All (OvA):** Train a binary logistic regression classifier for each class against all other classes.
* **Softmax Regression (Multinomial Logistic Regression):** Directly models the probabilities of multiple classes.


**16. What are the advantages and disadvantages of Logistic Regression?**

* **Advantages:**
    * Easy to implement and interpret.
    * Efficient for training.
    * Provides probability estimates.
* **Disadvantages:**
    * Assumes linearity.
    * Sensitive to outliers.
    * May not perform well with complex relationships. 

**17. What are some use cases of Logistic Regression?**

* Medical diagnosis (e.g., predicting disease risk).
* Credit risk assessment.
* Spam detection.
* Customer churn prediction.

**18. What is the difference between Softmax Regression and Logistic Regression?**

* **Logistic Regression:** Binary classification.
* **Softmax Regression:** Multiclass classification.
* Logistic regression uses the sigmoid function, and Softmax uses the Softmax function.
* Softmax outputs a probability distribution over multiple classes.

**19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?**

* **Softmax:** Preferred when classes are mutually exclusive. More efficient in training.
* **OvR:** Can be used when classes are not mutually exclusive. Simpler to implement.
* For most cases, Softmax is generally the best option.

**20. How do we interpret coefficients in Logistic Regression?**

* Coefficients represent the change in the log-odds of the outcome for a one-unit change in the predictor, holding other predictors constant.
* A positive coefficient increases the log-odds (and thus the probability), while a negative coefficient decreases it.
* To get the odds ratio, you can exponentiate the coefficient (exp(coefficient)). The odds ratio can be interpreted as the multiplicative change in the odds of the outcome for a one-unit increase in the predictor.