In [1]:
import pandas as pd
import numpy as np
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [None]:
!cp /content/drive/MyDrive/ml_scratch/sigmoid.png  sigmoid.png

## Sigmoid Function

The **Sigmoid function** is an S-shaped function that maps any real-valued number into the range \( (0, 1) \). It is often used as an activation function in machine learning, especially in logistic regression and neural networks.

The formula is:

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

### Explanation

- Input value: $x \in \mathbb{R}$
- Output: $\sigma(x) \in (0, 1)$
- The function "squashes" input values into a probability-like output
- It is smooth and differentiable, which is useful for optimization

The function is symmetric around \( x = 0 \), where $( \sigma(0) = 0.5 )$. For large positive inputs, the output approaches 1; for large negative inputs, it approaches 0.


![Sigmoid Function](sigmoid.png)



## Binary Cross-Entropy Loss

**Binary Cross-Entropy (BCE)**, also known as **log loss**, is a commonly used loss function for binary classification problems. It measures the difference between two probability distributions: the predicted probabilities and the actual labels.

The formula for binary cross-entropy loss is:

$$
BCE = -\left[ y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y}) \right]
$$


### Explanation:

- True label: $y \in \{0, 1\}$
- Predicted probability: $\hat{y} \in (0, 1)$
- The function penalizes confident but incorrect predictions more strongly


This function penalizes confident but incorrect predictions more heavily. It is well-suited for models that output probabilities (e.g., using a sigmoid activation).

For a dataset with multiple samples, the average BCE over all examples is typically computed:

$$
\mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \cdot \log(\hat{y}_i) + (1 - y_i) \cdot \log(1 - \hat{y}_i) \right]
$$

# Logistic Regression Explained (Simple and Clear)

**Logistic Regression** is one of the most popular algorithms in **machine learning**, especially for **binary classification problems**—where the goal is to classify data into one of two categories, such as spam or not spam, pass or fail, yes or no.

---

## 🔍 What Is Logistic Regression?

Despite the name, **logistic regression is used for classification**, not regression. It predicts the **probability** that a given input belongs to a particular class. For example, it might predict whether an email is spam (`1`) or not spam (`0`).

It works by applying a **logistic function (also called the sigmoid function)** to a linear combination of input features. This function squashes the output to a value between **0 and 1**, which can be interpreted as a probability.

---

## 🧮 The Math (Simplified)

Let’s break it down step by step.

### 1. Linear combination of features:

\[
z = w_1x_1 + w_2x_2 + \ldots + w_nx_n + b
\]

Where:
- \( x_1, x_2, \ldots, x_n \) are the input features.
- \( w_1, w_2, \ldots, w_n \) are the weights.
- \( b \) is the bias term.

### 2. Apply the sigmoid function:

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

This transforms the output into a **probability** between 0 and 1.

### 3. Make a prediction:

- If the output probability is > 0.5 → predict **class 1**
- If ≤ 0.5 → predict **class 0**

---

## ✅ Why Use Logistic Regression?

- Simple to **implement and interpret**
- Good for **linearly separable data**
- Outputs **probabilities**, not just labels

---

## 🧠 Training the Model

Logistic regression is trained using **maximum likelihood estimation**. The model learns weights and bias that **maximize the likelihood** of the observed data.

The loss function used is called **log loss** or **binary cross-entropy**, which penalizes incorrect and overconfident predictions.

---

## ⚠️ Limitations

- Assumes a **linear relationship** between features and output (in log-odds space)
- Not suitable for **complex, non-linear patterns** without transformations

---

## 📦 Real-World Use Cases

- Email **spam detection**
- **Loan default** prediction
- **Disease diagnosis** (e.g., predicting diabetes)
- **Customer churn** prediction

---

## 🧾 Conclusion

Logistic Regression is a powerful and beginner-friendly **classification algorithm**. While it may not capture deep patterns in complex data, its **simplicity, speed, and interpretability** make it an excellent first choice for many binary classification problems.


In [None]:
def compute_loss(y_true,y_pred):
  epsilon = 1e-9
  A=y_true*np.log(y_pred+epsilon)
  B=(1-y_true)*np.log(1-y_pred+epsilon)
  bce=-[A+B]
  return bce

# def

In [6]:
class LogisticRegression:
  def __init__(self,learning_rate=0.001, n_iters=1000):
    self.weights=None
    self.bias=None
    self.lr=learning_rate
    self.n_iters=n_iters
    self.losses=[]

  def sigmoid(self,x):
    return 1/(1+np.exp(-x))

  def compute_loss(self,y_true,y_pred):
    epsilon=1e-9
    y1=y_true*np.log(y_pred+epsilon)
    y2=(1-y_true)*np.log(1-y_pred+epsilon)
    return -np.mean(y1+y2)

  def feed_forward(self,X):
    Z=np.dot(X,self.weights)+self.bias
    A=self.sigmoid(Z)
    return A

  def fit(self,X,y):
    n_samples,n_features=X.shape

    ##init parameters
    self.weights=np.zeros(n_features)
    self.bias=0

    for _ in range(self.n_iters):
      A= self.feed_forward(X)
      self.losses.append(self.compute_loss(y,A))
      dz=A-y ## derivative of sigmoid

      dw=(1/n_samples)*np.dot(X.T,dz)
      db=(1/n_samples)*np.sum(dz)

      self.weights-=self.lr*dw
      self.bias-=self.lr*db

  def predict(self,X):
    threshold = .5
    y_hat=np.dot(X,self.weights)+self.bias
    y_predicted=self.sigmoid(y_hat)
    y_predicted_cls = [1 if i > threshold else 0 for i in y_predicted]

    return np.array(y_predicted_cls)

In [11]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
# from sklearn.linear_model import LogisticRegression
from sklearn import datasets
import numpy as np

dataset = datasets.load_breast_cancer()
X, y = dataset.data, dataset.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1234
)

regressor = LogisticRegression(learning_rate=0.0001, n_iters=1000)  # scikit-learn's LogisticRegression, not your custom one
regressor.fit(X_train, y_train)
predictions = regressor.predict(X_test)

cm = confusion_matrix(y_test, predictions)
accuracy = accuracy_score(y_test, predictions)
sens = recall_score(y_test, predictions)  # Sensitivity = Recall
precision = precision_score(y_test, predictions)
f_score = f1_score(y_test, predictions)

print("Test accuracy: {0:.3f}".format(accuracy))
print("Confusion Matrix:\n", cm)
print("Sensitivity (Recall): {0:.3f}".format(sens))
print("Precision: {0:.3f}".format(precision))
print("F1 Score: {0:.3f}".format(f_score))


Test accuracy: 0.930
Confusion Matrix:
 [[39  6]
 [ 2 67]]
Sensitivity (Recall): 0.971
Precision: 0.918
F1 Score: 0.944
