<a href="https://colab.research.google.com/github/shash365/iitkgp-aiml/blob/main/Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Logistic regression** is a statistical and machine learning method used for binary classification tasks, where the target variable has two possible outcomes (e.g., 0 or 1, true or false). It models the probability of an event occurring by applying the logistic (sigmoid) function to a linear combination of input features. The output is a value between 0 and 1, interpreted as the likelihood of belonging to a specific class. Logistic regression is widely used in fields like healthcare for predicting outcomes such as disease presence.

Logistic Regression takes the following types of inputs:

1. **Numerical Features**: Continuous variables such as age, income, or temperature. These are directly used in the model.

2. **Categorical Features**: Variables like gender or city, which are converted into numerical form using encoding techniques (e.g., one-hot encoding or label encoding).

3. **Binary Features**: Variables that represent true/false or 0/1 scenarios, such as whether a customer has a membership.

4. **Normalized/Scaled Features**: Although not strictly required, input features are often normalized or scaled for better model performance and faster convergence.

Ensure inputs are preprocessed appropriately for optimal results.

**Benefits of Logistic Regression over Linear Regression**:

Logistic Regression offers several benefits over Linear Regression, particularly for classification problems:

1. **Handles Binary Outputs**: Logistic Regression is designed for mainly binary classification, providing probabilities, whereas Linear Regression is unsuitable for such tasks.

2. **Bounded Output**: Logistic Regression outputs probabilities (values between 0 and 1), unlike Linear Regression, which can produce values outside this range.

3. **Non-Linear Decision Boundaries**: While the model itself is linear, Logistic Regression can approximate non-linear boundaries by combining features or using transformations.

4. **Better for Categorical Targets**: Logistic Regression directly models categorical outcomes without additional encoding steps.

5. **Probabilistic Interpretation**: Provides the likelihood of class membership, aiding in decision-making.

6. **Robustness to Outliers**: Logistic Regression is less sensitive to outliers compared to Linear Regression, especially in classification tasks.

Regression predicts continuous numerical values, while classification focuses on categorizing data into discrete classes. Linear regression algorithms are commonly used for regression tasks, whereas logistic regression is employed for classification problems.

**Applications of Logistic Regression**: Logistic Regression is widely used in various fields for classification tasks. Here are some common applications:

1. **Healthcare**: Predicting disease presence (e.g., diabetes, cancer) based on patient data.
2. **Finance**: Assessing credit risk or predicting loan defaults.
3. **Marketing**: Classifying customers for targeted campaigns (e.g., likelihood of purchase or churn).
4. **Human Resources**: Predicting employee attrition or recruitment success.
5. **E-commerce**: Recommending products by predicting user preferences.
6. **Social Sciences**: Modeling survey responses or voting behavior.
7. **Fraud Detection**: Identifying fraudulent transactions or activities.
Its simplicity, interpretability, and efficiency make it ideal for binary classification problems.

Let us start by importing the required module.

In [1]:
import numpy as np

In [2]:
# We have some tumor size details as feature and Label as 0/1 to show if tumor is benign or malegnant
# Input values along with output labels

In [5]:
#X represents the size of a tumor in centimeters.
X = np.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)

#Note: X has to be reshaped into a column from a row for the LogisticRegression() function to work.
#y represents whether or not the tumor is cancerous (0 for "No", 1 for "Yes").
y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

In [7]:
# Import model

from sklearn.linear_model import LogisticRegression
logr = LogisticRegression()

In [8]:
logr.fit(X,y)

In [23]:
# Lets predict for a tumor size input by user

tum_size = float(input('Enter Tumor size'))
y_pred = logr.predict(np.array([tum_size]).reshape(-1,1))      # By using reshape(-1,1) we are making column vector
if y_pred == 0:
  print('Tumor is benign')
else:
  print('Tumor is malignant')

Enter Tumor size5.3
Tumor is malignant


In [27]:
# Lets get the coefficients

log_odds = logr.coef_
print(log_odds)

# Convert coeff. value to cm s
odds = np.exp(log_odds)
print(odds)

[[1.39514829]]
[[4.03557295]]


This means that with change in tumor by size 1 cm, it increases the odds to 4x for it to get mailgnant  

In [28]:
# Lets calculate the probabilites wrt tumor size

In [29]:
# Create a function to calculate the probabilities

def logit2prob(logr, x):
  logr_odds = logr.coef_ * x + logr.intercept_      # This is y = mx+c , LinearRegression equation
  odds = np.exp(logr_odds)
  prob = odds / (1 + odds)
  return prob

print(logit2prob( logr, X))

[[0.60749168]
 [0.19267555]
 [0.12774788]
 [0.00955056]
 [0.08037781]
 [0.0734485 ]
 [0.88362857]
 [0.77901203]
 [0.88924534]
 [0.81293431]
 [0.57718238]
 [0.96664398]]


In [35]:
print(X)

[[3.78]
 [2.44]
 [2.09]
 [0.14]
 [1.72]
 [1.65]
 [4.92]
 [4.37]
 [4.96]
 [4.52]
 [3.69]
 [5.88]]


The prob. correspond to size of tumor i.e. if size is 3.78 cm then it has 60.75 % chance of being cancerous