# LogisticRegression

## `LogisticRegression` Parameters

| **Param**           | **Default** | **What it does**                                                                               |
|------------------------|------------------------|------------------------|
| `penalty`           | `'l2'`      | Type of regularization: `'l1'`, `'l2'`, `'elasticnet'`, or `'none'`. Helps avoid overfitting.  |
| `dual`              | `False`     | Use the **dual formulation** (only for `liblinear` + `l2`). Mostly leave it `False`.           |
| `tol`               | `1e-4`      | Tolerance for stopping criteria. Smaller = more precise, but slower.                           |
| `C`                 | `1.0`       | Inverse of regularization strength. Smaller = more regularization (like alpha in Ridge/Lasso). |
| `fit_intercept`     | `True`      | Whether to fit the bias term (intercept). Almost always `True`.                                |
| `intercept_scaling` | `1`         | Only matters when `solver='liblinear'` and `fit_intercept=True`.                               |
| `class_weight`      | `None`      | Handle imbalanced classes: `'balanced'` or dict.                                               |
| `random_state`      | `None`      | Seed for reproducibility. Only relevant for `saga`, `liblinear`, etc.                          |
| `solver`            | `'lbfgs'`   | Optimizer used to fit the model: `'liblinear'`, `'lbfgs'`, `'newton-cg'`, `'saga'`, `'sag'`.   |
| `max_iter`          | `100`       | Max iterations before giving up on convergence. Increase if model is stubborn.                 |
| `multi_class`       | `'auto'`    | `'ovr'` (One-vs-Rest) or `'multinomial'`. `'auto'` picks best based on solver.                 |
| `verbose`           | `0`         | Prints optimization info. Good for debugging.                                                  |
| `warm_start`        | `False`     | Reuse previous solution to speed up training (useful in cross-validation or loops).            |
| `n_jobs`            | `None`      | Parallel computation (only for `liblinear`). Set to `-1` to use all cores.                     |
| `l1_ratio`          | `None`      | Only for `penalty='elasticnet'` + `solver='saga'`. Controls balance between L1 and L2.         |

Example on 3 different datasets

-   Titanic

    ``` python
    import pandas as pd
    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
    import seaborn as sns
    import matplotlib.pyplot as plt

    # Load Titanic dataset (Seaborn provides it)
    titanic = sns.load_dataset('titanic')

    # Data Preprocessing
    titanic['age'].fillna(titanic['age'].median(), inplace=True)
    titanic.dropna(subset=['embarked'], inplace=True)

    # Encoding 'sex' using LabelEncoder (male = 0, female = 1)
    titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})

    # One-Hot Encoding 'embarked' and 'pclass'
    titanic = pd.get_dummies(titanic, columns=['embarked', 'pclass'], drop_first=True)

    # Features (X) and Target (y)
    X = titanic[['age', 'sibsp', 'parch', 'fare', 'sex', 'embarked_Q', 'embarked_S']]
    y = titanic['survived']

    # Feature Scaling
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Split Data
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    # Train Logistic Regression Model
    model = LogisticRegression()
    model.fit(X_train, y_train)

    # Predictions and Evaluation
    y_pred = model.predict(X_test)

    # Accuracy
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy * 100:.2f}%")

    # Confusion Matrix
    conf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False, xticklabels=['Not Survived', 'Survived'], yticklabels=['Not Survived', 'Survived'])
    plt.title('Confusion Matrix')
    plt.show()

    # Classification Report
    print(classification_report(y_test, y_pred))
    ```

-   Iris

    ``` python
    import pandas as pd
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
    import seaborn as sns
    import matplotlib.pyplot as plt

    # Load dataset
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['target'] = iris.target
    df['species'] = df['target'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

    # Features and target
    X = df[iris.feature_names]
    y = df['target']

    # Scale
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    # Train model
    model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=200)
    model.fit(X_train, y_train)

    # Predict and evaluate
    y_pred = model.predict(X_test)
    print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")

    # Confusion matrix
    conf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap(conf_matrix, annot=True, cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
    plt.title("Confusion Matrix")
    plt.xlabel("Predicted")
    plt.ylabel("Actual")
    plt.show()

    # Classification report
    print(classification_report(y_test, y_pred, target_names=iris.target_names))
    ```

-   Breast Cancer Wisconsin

    ``` python
    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
    import seaborn as sns
    import matplotlib.pyplot as plt

    # Load data
    df = pd.read_csv('breast_cancer.csv')
    df = df.loc[:, ~df.columns.str.contains('^Unnamed|id', case=False)]

    # Encode target
    df['diagnosis'] = df['diagnosis'].map({'M': 1, 'B': 0})

    # Features and target
    X = df.drop('diagnosis', axis=1)
    y = df['diagnosis']

    # Scaling
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

    # Train model
    model = LogisticRegression(max_iter=1000)
    model.fit(X_train, y_train)

    # Predict & evaluate
    y_pred = model.predict(X_test)

    print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")
    conf_matrix = confusion_matrix(y_test, y_pred)
    sns.heatmap(conf_matrix, annot=True, cmap='Blues', xticklabels=['Benign', 'Malignant'], yticklabels=['Benign', 'Malignant'])
    plt.title("Confusion Matrix")
    plt.xlabel("Predicted")
    plt.ylabel("Actual")
    plt.show()

    print(classification_report(y_test, y_pred, target_names=['Benign', 'Malignant']))
    ```

| Use Case              | How Logistic Regression Handles It                                 |
|------------------------------------|------------------------------------|
| Binary classification | ✅ Out-of-the-box                                                  |
| Multiclass (basic)    | ✅ With One-vs-Rest (OvR)                                          |
| Multiclass (better)   | ✅ With `multi_class='multinomial'` and `solver='lbfgs'` or `saga` |