## Supervised Learning
### Definition: 
A type of machine learning where the model is trained on labeled data. The goal is to learn a mapping from input features to output labels.
### Examples:
- *Classification*: Predicting a categorical label. E.g., predicting whether a passenger survived the Titanic disaster.
- *Regression*: Predicting a continuous value. E.g., predicting house prices.
## Unsupervised Learning
### Definition: 
A type of machine learning where the model is trained on unlabeled data. The goal is to find patterns or structure in the data.
### Examples:
- *Clustering*: Grouping similar instances. E.g., grouping customers based on purchasing behavior.
- *Dimensionality Reduction*: Reducing the number of features while retaining important information. E.g., Principal Component Analysis (PCA).

## Classification with Logistic Regression
### Concept of Classification
*Classification*: The process of predicting the categorical label of new instances based on past observations. In the context of the Titanic dataset, classification can be used to predict whether a passenger survived (Survived = 1) or did not survive (Survived = 0). It can be understood as finding a boundary that separates the classes in the feature space.

### Logistic Regression for Binary Classification
*Logistic Regression*: A statistical method for binary classification that models the probability of a binary outcome using the logistic function (*sigmoid function*).

### Logistic Function (Sigmoid Function)
- Formula: σ(z)= 1 / (1+e<sup>−z</sup>), 
where z = b<sub>0 </sub>+ b<sub>1</sub>x<sub>1</sub> + b<sub>2</sub>x<sub>2</sub> + ...... + b<sub>n</sub>x<sub>n</sub>

- Explanation:

  - 𝑧 is a linear combination of the input features 𝑥<sub>1</sub> ,𝑥<sub>2</sub> , … , 𝑥<sub>𝑛</sub> and the coefficients 𝑏<sub>0</sub>, 𝑏<sub>1</sub>, … , 𝑏<sub>𝑛</sub>
  - The logistic function squashes the output to a range between 0 and 1, representing the probability of the positive class (e.g., Survived = 1).
  - 𝑏<sub>0</sub> (Intercept): This is the bias term, a constant that shifts the decision boundary.
  - b<sub>1</sub> , b<sub>2</sub> , ..., b<sub>n</sub>  (Coefficients): These are the weights assigned to each feature, determining the influence of each feature on the prediction.

- More info on intercpts and coefficients
  - Think of the intercept as the starting point of a journey. It’s the baseline or initial value before any of the features come into play.
  -  In the context of logistic regression b <sub> 0 </sub>is a constant term that adjusts the baseline probability of the outcome when all feature values are zero. It shifts the decision boundary up or down without depending on any input features.
  - Each coefficient represents the importance and impact of each factor (feature) on your outcome.
  - In logistic regression, each coefficient 𝑏<sub>𝑖</sub> represents the weight or influence of a corresponding feature x <sub> i</sub> on the log-odds of the outcome. A positive coefficient increases the probability of the outcome, while a negative coefficient decreases it.





### Example Calculation

*Given Data*:

- Let's assume we have two features, 
𝑥 <sub>1</sub> (age) and 𝑥 <sub>2</sub>  (fare), and the logistic regression model coefficients are 𝑏<sub>0</sub> =−3 , 𝑏<sub>1</sub> =0.05, and 𝑏<sub>2</sub> =0.1.

*Linear Combination*:
- z = −3 + 0.05⋅x<sub>1</sub> + 0.1⋅x<sub>2</sub>
​

*Logistic Function:*
- σ(z)= 1 / (1+e <sup> −(−3 + 0.05⋅x<sub>1</sub> + 0.1⋅x<sub>2</sub>) </sup>)

*Prediction Example*​
- For a passenger with x<sub>1</sub>=25 (age) and x<sub>2</sub>=50 (fare):
    - z=−3+0.05⋅25+0.1⋅50=−3+1.25+5=3.25
    - σ(3.25) = $\frac{1}{1+e ^ (−3.25) }$ ≈ 0.962
    - The predicted probability of survival is 0.962, indicating a high likelihood of survival.



## Logistic Regression for Multiclass Classification
### When to Use: 
- When the target variable has more than two classes.
### Approaches:
- *One-vs-Rest (OvR)*: Train a separate binary classifier for each class, treating all other classes as the negative class.
- *One-vs-One (OvO)*: Train a binary classifier for every pair of classes.
- *Softmax Regression (Multinomial Logistic Regression)*: Extends logistic regression to handle multiple classes directly by modeling the probability distribution over multiple classes.



Application on Titanic Dataset:

- Objective: Predict whether a passenger survived (Survived) based on features like class, sex, age, fare, etc.
- Features: pclass, sex, age, sibsp, parch, fare, embarked, etc.
- Label: survival

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler


In [20]:

# Generate dummy data
np.random.seed(0)
data_size = 100

# Features
X = np.random.rand(data_size,5)
print(X)

[[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]
 [0.64589411 0.43758721 0.891773   0.96366276 0.38344152]
 [0.79172504 0.52889492 0.56804456 0.92559664 0.07103606]
 [0.0871293  0.0202184  0.83261985 0.77815675 0.87001215]
 [0.97861834 0.79915856 0.46147936 0.78052918 0.11827443]
 [0.63992102 0.14335329 0.94466892 0.52184832 0.41466194]
 [0.26455561 0.77423369 0.45615033 0.56843395 0.0187898 ]
 [0.6176355  0.61209572 0.616934   0.94374808 0.6818203 ]
 [0.3595079  0.43703195 0.6976312  0.06022547 0.66676672]
 [0.67063787 0.21038256 0.1289263  0.31542835 0.36371077]
 [0.57019677 0.43860151 0.98837384 0.10204481 0.20887676]
 [0.16130952 0.65310833 0.2532916  0.46631077 0.24442559]
 [0.15896958 0.11037514 0.65632959 0.13818295 0.19658236]
 [0.36872517 0.82099323 0.09710128 0.83794491 0.09609841]
 [0.97645947 0.4686512  0.97676109 0.60484552 0.73926358]
 [0.03918779 0.28280696 0.12019656 0.2961402  0.11872772]
 [0.31798318 0.41426299 0.0641475  0.69247212 0.56660145]
 [0.26538949 0

In [21]:

# Binary target variable
y = (np.sum(X, axis=1) > 2.5).astype(int) # If the sum of all 5 features is > 2.5, label as 1 else 0


In [4]:

# Create a DataFrame
columns = ['Feature1', 'Feature2', 'Feature3', 'Feature4', 'Feature5']
df = pd.DataFrame(X, columns=columns)
df['Target'] = y

print("First 5 rows of the DataFrame:")
print(df.head())


First 5 rows of the DataFrame:
   Feature1  Feature2  Feature3  Feature4  Feature5  Target
0  0.548814  0.715189  0.602763  0.544883  0.423655       1
1  0.645894  0.437587  0.891773  0.963663  0.383442       1
2  0.791725  0.528895  0.568045  0.925597  0.071036       1
3  0.087129  0.020218  0.832620  0.778157  0.870012       1
4  0.978618  0.799159  0.461479  0.780529  0.118274       1


In [5]:

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[columns], df['Target'], train_size=0.2, random_state=0)
X_test.head()
y_test.head()

In [6]:

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [7]:

# Create and train the logistic regression model
log_reg = LogisticRegression()
log_reg.fit(X_train_scaled, y_train)


In [8]:

# Get the coefficients (b0, b1, ..., bn)
intercept = log_reg.intercept_[0]
coefficients = log_reg.coef_[0]

print("\nIntercept (b0):", intercept)
print("Coefficients (b1, b2, ..., bn):", coefficients)



Intercept (b0): -0.26845513640664465
Coefficients (b1, b2, ..., bn): [1.24838117 1.69707095 1.34220839 1.45224051 1.53986514]


In [9]:

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))



In [10]:
# Calculate the linear combination z
z = intercept + np.dot(X_test_scaled, coefficients)

# np.dot(X_test_scaled, coefficients): This computes the dot product of the standardized test set features and the coefficients. 
# The dot product sums the product of each feature value and its corresponding coefficient for each instance.


# Apply the sigmoid function to get the probabilities
probabilities = sigmoid(z)

# Display the probabilities
print("\nFirst 5 predicted probabilities:")
print(probabilities[:5])


First 5 predicted probabilities:
[0.10428142 0.37936845 0.84841636 0.99980969 0.92966092]
