# ***Logistic Regression*** 

<hr>

### Overview 

a classification algorithm used to predict binary outcomes (e.g. 0 or 1, Yes or No). It models the probability that a given input belongs to a particular class


### Goal
the goal of binary logistic regression is to train a classifier that can make a binary decision about the class of a new input observation.
We want to find parameters w (weight vector) and b (bias term) such that the predicted probabilities y^ are as close as possible to the actual labels y ∈ {0,1}

### How it works

1. Linear Model

The core of logistic regression is based on a linear model, similar to linear regression. The model tries to find the best-fitting line (or hyperplane in higher dimensions) that separates the data points into two classes. However, instead of predicting a continuous output, logistic regression predicts probabilities that a given input belongs to a particular class

2. Sigmoid Function

The key distinction of logistic regression is the use of the sigmoid function (also called the logistic function). The sigmoid function converts any real-valued number into a value between 0 and 1, which can be interpreted as a probability

3. Decision Boundary

Once we have the probabilities from the sigmoid function, we classify the data based on a decision boundary. If the predicted probability P(y=1∣X) is greater than 0.5, the model classifies the instance as class 1; otherwise, it is classified as class 0

4. Model Training

The logistic regression model is trained by adjusting the weights w and bias b to minimize the difference between the predicted probabilities and the actual labels in the training data. This is typically done using Maximum Likelihood Estimation (MLE) and Gradient Descent

5. Cost Function 

The cost function used in logistic regression is called the log loss or binary cross-entropy loss. The goal is to minimize this cost function during training to improve the model's accuracy


### Training steps
1. Initialize parameters

start by setting initial values for the weights and bias. These are often initialized randomly or set to small values

2. Compute predictions

use the current weights and bias to compute the predicted probabilities for each data point using the sigmoid function

3. Calculate loss

compute the binary cross-entropy loss, which measures the difference between the predicted probabilities and the actual labels

4. Update parameters

adjust the weights and bias to minimize the loss function using optimization techniques like Gradient Descent. This involves computing the gradient of the loss function with respect to each parameter and adjusting the parameters in the direction that reduces the loss

5. Repeat

The gradient descent process continues iteratively:

* the weights and bias are updated based on the gradients
* the hypothesis (predictions) are recalculated
* the cost function is computed again

This loop repeats until:
* the parameters converge (i.e. further updates no longer reduce the cost significantly)
* a pre-defined number of iterations is reached
* a specific tolerance (threshold) is met (the change in cost function is very small)

### Prediction
Output is a probability between 0 and 1 (using sigmoid function) 


### Pros and cons
Pros:
+ Simple and fast
+ Easy to interpret (each weight shows how much a feature contributes)
+ Good baseline for binary classification

Cons:
- Assumes linear relationship between input features and the log-odds
- Doesn't capture complex patterns unless features are engineered or transformed
- Sensitive to outliers


### Things to Keep in Mind

* scale feautures. Logistic regression is sensitive to the scale of the input features

* it assumes that there is a linear relationship between the input features and the log-odds of the dependent variable. If the relationship is more complex, logistic regression might not perform well unless you engineer or transform features (e.g. by adding polynomial features or interaction terms)

* logistic regression is sensitive to outliers because it uses a linear decision boundary. If the data contains significant outliers, they can disproportionately affect the model's performance

* if the independent variables (features) are highly correlated with each other, it may affect the model's accuracy. Multicollinearity can lead to unstable coefficient estimates and inflated standard errors. Using techniques like Principal Component Analysis (PCA) or removing correlated features can help mitigate this issue

* remember about class imbalance. If the dataset contains imbalanced classes, logistic regression may be biased towards predicting the majority class. Consider using techniques such as class weighting, oversampling the minority class, or undersampling the majority class to handle imbalanced datasets



<hr>

# CODE

In [30]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import StandardScaler


data

In [31]:
# load data
iris = load_iris()
X = iris.data  
y = (iris.target == 0).astype(int) 

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target_names[iris.target]

df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [32]:
# scale data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

print(f"Number of samples in the training set: {X_train.shape[0]}")
print(f"Number of samples in the test set: {X_test.shape[0]}")
print(f"Number of features: {X_train.shape[1]}")

Number of samples in the training set: 105
Number of samples in the test set: 45
Number of features: 4


training

In [33]:
# create a logistic regression model
model = LogisticRegression()

# train the model on the training data
model.fit(X_train, y_train)

predictions

In [34]:
y_pred = model.predict(X_test) # przewidywane etykiety klas
probs = model.predict_proba(X_test)[:, 1]  # prawdopodobieństwo bycia klasą 1

evaluation

In [35]:
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

report = classification_report(y_test, y_pred, target_names=["Not Setosa", "Setosa"])
print("\nClassification Report:\n", report)

Accuracy: 100.00%

Classification Report:
               precision    recall  f1-score   support

  Not Setosa       1.00      1.00      1.00        26
      Setosa       1.00      1.00      1.00        19

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [36]:
results_df = pd.DataFrame({
    "True Label": y_test,
    "Predicted": y_pred,
    "Probability of Setosa": probs
})

print("\nSample Predictions:\n")
results_df.head(10)



Sample Predictions:



Unnamed: 0,True Label,Predicted,Probability of Setosa
0,0,0,0.01423
1,1,1,0.968898
2,0,0,1.9e-05
3,0,0,0.013948
4,0,0,0.003849
5,1,1,0.938806
6,0,0,0.072348
7,0,0,0.000976
8,0,0,0.001758
9,0,0,0.032311
