|              | Linear Regression                                | Logistic Regression                                                     |
|--------------|-------------------------------------------------|-------------------------------------------------------------------------|
| Model Equation   | $Y = w_0 + w_1X_1 + w_2X_2 + ... + w_nX_n$                      | $P(Y=1) = \frac{1}{1 + e^{-(w_0 + w_1X_1 + w_2X_2 + ... + w_nX_n)}}$ |
| Output Range  | Continuous values                               | Probabilities (0 to 1)                                                  |
| Loss Function | Mean Squared Error (MSE)<br> $MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$ | Binary Cross-Entropy Loss (Log Loss)<br> $Log Loss = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$                                  |
| Assumptions   | Linear relationship between independent variables and target | Log-odds are linearly related to independent variables                   |
| Interpretation | Coefficients represent change in the target for a unit change | Coefficients represent change in the log-odds of the target              |
| Application    | Regression tasks predicting continuous values    | Binary classification tasks                                             |
| Optimization Objective | Minimize the sum of squared differences between predictions | Minimize the log loss between predicted and actual labels                 |

$y$ represents the true target values, $\hat{y}$ represents the predicted target values, and $n$ represents the number of data points.               |

In [2]:
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a random binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model with gradient descent optimization
clf = SGDClassifier(loss='log_loss', max_iter=1000, tol=1e-3, random_state=42)

# Train the model on the training set
clf.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = clf.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")

Accuracy: 0.835
