# 1. What is Logistic Regression, and how does it differ from Linear Regression?
Logistic Regression is a statistical model used for binary or multi-class classification problems.  
It predicts the probability that an observation belongs to a certain class using the logistic (sigmoid) function.  
Difference from Linear Regression:  
- Linear Regression predicts continuous values, Logistic Regression predicts probabilities between 0 and 1.  
- Logistic Regression uses the sigmoid function to map inputs to probability, whereas Linear Regression uses a straight-line equation.

# 2. Explain the role of the Sigmoid function in Logistic Regression.
The sigmoid function transforms any real-valued number into a value between 0 and 1, making it suitable for probability estimation.  
Formula:  
sigmoid(z) = 1 / (1 + e^(-z))  
Here, z = b0 + b1X1 + b2X2 + ... + bnXn

# 3. What is Regularization in Logistic Regression and why is it needed?
Regularization is a technique to prevent overfitting by adding a penalty term to the loss function.  
It discourages large coefficient values, improving generalization.  
Types:  
- L1 (Lasso): adds absolute value penalty |β|  
- L2 (Ridge): adds squared value penalty β²

# 4. What are some common evaluation metrics for classification models, and why are they important?
- Accuracy: Proportion of correct predictions.  
- Precision: True Positives / (True Positives + False Positives) — important when false positives are costly.  
- Recall (Sensitivity): True Positives / (True Positives + False Negatives) — important when false negatives are costly.  
- F1-Score: Harmonic mean of Precision and Recall — balances both metrics.  
- ROC-AUC: Measures the model's ability to distinguish between classes.

# 5. Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
X_train, X_test, y_train, y_test = train_test_split(df, data.target, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))

# 6. Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.
model_l2 = LogisticRegression(penalty='l2', max_iter=1000)
model_l2.fit(X_train, y_train)
print("Coefficients:", model_l2.coef_)
print("Accuracy:", model_l2.score(X_test, y_test))

# 7. Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report.
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

iris = load_iris()
X_train_i, X_test_i, y_train_i, y_test_i = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
model_multi = LogisticRegression(multi_class='ovr', max_iter=1000)
model_multi.fit(X_train_i, y_train_i)
y_pred_i = model_multi.predict(X_test_i)
print(classification_report(y_test_i, y_pred_i))

# 8. Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.
from sklearn.model_selection import GridSearchCV

params = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}
grid = GridSearchCV(LogisticRegression(max_iter=1000), params, cv=5)
grid.fit(X_train, y_train)
print("Best Params:", grid.best_params_)
print("Best Score:", grid.best_score_)

# 9. Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling.
from sklearn.preprocessing import StandardScaler

# Without scaling
model_no_scale = LogisticRegression(max_iter=1000)
model_no_scale.fit(X_train, y_train)
print("Accuracy without scaling:", model_no_scale.score(X_test, y_test))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model_scale = LogisticRegression(max_iter=1000)
model_scale.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", model_scale.score(X_test_scaled, y_test))

# 10. Imagine you are working at an e-commerce company...
Steps:  
1) Data Handling:  
   - Handle missing values.  
   - Encode categorical variables.  
   - Remove irrelevant features.  
2) Feature Scaling:  
   - Standardize or normalize numerical features.  
3) Balancing Classes:  
   - Use SMOTE (Synthetic Minority Oversampling Technique) or class weighting in Logistic Regression.  
4) Hyperparameter Tuning:  
   - Use GridSearchCV to tune C, penalty, and solver parameters.  
5) Model Evaluation:  
   - Use Precision, Recall, F1-score, and ROC-AUC due to imbalanced data.  
6) Business Context:  
   - Prefer Recall to ensure most responders are captured, minimizing lost opportunities.
