In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/hr-data/HR_comma_sep.csv


# Logistic Regression

## 1. Introduction
- **Logistic Regression** is a supervised learning algorithm used for binary classification problems (where the target variable has two possible outcomes, such as 0 or 1).
- Unlike **Linear Regression**, which predicts continuous values, **Logistic Regression** predicts the probability of a certain class or event, such as pass/fail, win/loss, or spam/ham.

## 2. Logistic Function (Sigmoid Function)
- The key difference between **Logistic Regression** and **Linear Regression** is the use of the **logistic function** (also called the **sigmoid function**) to predict probabilities.

### **Sigmoid Function:**
$
\sigma(z) = \frac{1}{1 + e^{-z}}
$

Where:
- $z$ = the input to the function, which is a linear combination of the input features and their coefficients ($z = \beta_0 + \beta_1 \cdot x_1 + \beta_2 \cdot x_2 + \dots + \beta_n \cdot x_n$).
- $e$ = Euler’s number, approximately $2.718$.
- $\sigma(z)$ = the output of the sigmoid function, which is a value between 0 and 1.

### **Interpretation:**
- The sigmoid function maps any real-valued number to a value between 0 and 1. This makes it suitable for predicting probabilities.
- The closer $\sigma(z)$ is to 1, the more likely the positive class (1) is. The closer it is to 0, the more likely the negative class (0) is.

## 3. Hypothesis for Logistic Regression
- In **Logistic Regression**, instead of directly predicting $y$ as in **Linear Regression**, we predict the probability of $y$ being 1, given the input features.

### **Hypothesis Equation:**
$
h_\theta(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 \cdot x_1 + \dots + \beta_n \cdot x_n)}}$

Where:
- $h_\theta(x)$ = predicted probability that $y = 1$.
- $\beta_0, \beta_1, \dots, \beta_n$ = coefficients of the model.
- $x_1, x_2, \dots, x_n$ = independent variables (features).

### **Thresholding:**
- To make predictions, we can apply a threshold:
  - If $h_\theta(x) \geq 0.5$, predict class 1.
  - If $h_\theta(x) < 0.5$, predict class 0.

## 4. Cost Function for Logistic Regression
- The cost function used in **Logistic Regression** is different from **Linear Regression** because we are dealing with probabilities. Instead of using Mean Squared Error (MSE), **Logistic Regression** uses the **log-loss** or **cross-entropy loss**.

### **Log-Loss (Cost Function):**
$
J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(h_\theta(x_i)) + (1 - y_i) \log(1 - h_\theta(x_i)) \right]
$

Where:
- $m$ = number of training examples.
- $y_i$ = actual label (0 or 1) for the $i^{th}$ training example.
- $h_\theta(x_i)$ = predicted probability for the $i^{th}$ training example.

### **Explanation:**
- When the actual label $y = 1$, the first term $y_i \log(h_\theta(x_i))$ dominates the cost.
- When the actual label $y = 0$, the second term $(1 - y_i) \log(1 - h_\theta(x_i))$ dominates the cost.
- This cost function penalizes incorrect predictions heavily, especially when the predicted probability is far from the actual label.


## 5. Gradient Descent in Logistic Regression
- Just like in **Linear Regression**, we can use **Gradient Descent** to minimize the cost function and find the optimal parameters $\beta_0, \beta_1, \dots, \beta_n$.
- The key difference is that in **Logistic Regression**, the cost function is non-linear, but **Gradient Descent** still works effectively.

### **Gradient Descent Update Rule:**
For each parameter $\beta_j$, we update it as follows:
$
\beta_j = \beta_j - \alpha \cdot \frac{\partial}{\partial \beta_j} J(\beta)
$

Where:
- $\alpha$ = learning rate (controls the step size in each iteration).
- $J(\beta)$ = cost function (log-loss).
- $\frac{\partial}{\partial \beta_j} J(\beta)$ = partial derivative of the cost function with respect to $\beta_j$.

### **Gradient Descent for Logistic Regression:**
For Logistic Regression, the partial derivatives are calculated as:
$
\frac{\partial}{\partial \beta_j} J(\beta) = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x_i) - y_i) \cdot x_{ij}
$

Where:
- $x_{ij}$ is the $j^{th}$ feature of the $i^{th}$ training example.

## 6. Key Concepts

### **Decision Boundary:**
- The **decision boundary** is a threshold that separates the two classes (0 and 1). In Logistic Regression, this boundary is linear if there is only one feature, and it can become more complex with multiple features.
- The decision boundary is where the predicted probability $h_\theta(x) = 0.5$.

### **Overfitting and Underfitting:**
- Like other machine learning models, **Logistic Regression** can suffer from **overfitting** (when the model fits the training data too well and performs poorly on unseen data) and **underfitting** (when the model is too simple to capture the relationship between the features and the target).

### **Regularization:**
- **Regularization** techniques like **L2 regularization (Ridge)** or **L1 regularization (Lasso)** can help prevent overfitting by adding a penalty term to the cost function.
- **L2 regularization** adds the sum of squared coefficients to the cost function:
  $
  J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(h_\theta(x_i)) + (1 - y_i) \log(1 - h_\theta(x_i)) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \beta_j^2
  $
- **L1 regularization** adds the absolute values of the coefficients:
  $
  J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(h_\theta(x_i)) + (1 - y_i) \log(1 - h_\theta(x_i)) \right] + \frac{\lambda}{m} \sum_{j=1}^{n} |\beta_j|
  $
- $\lambda$ is a hyperparameter that controls the amount of regularization.

## 7. Conclusion
- **Logistic Regression** is a simple yet powerful algorithm for binary classification problems. It is widely used because of its interpretability and efficiency.
- The **sigmoid function** enables us to model probabilities, and the **log-loss** cost function ensures that the model focuses on minimizing classification errors.
- **Gradient Descent** helps optimize the model parameters, and regularization techniques can be applied to prevent overfitting.



In [2]:
import pandas as pd
hr_data = pd.read_csv('/kaggle/input/hr-data/HR_comma_sep.csv')
hr_data.head()

Unnamed: 0,satisfaction_level,last_evaluation,number_project,average_montly_hours,time_spend_company,Work_accident,left,promotion_last_5years,Department,salary
0,0.38,0.53,2,157,3,0,1,0,sales,low
1,0.8,0.86,5,262,6,0,1,0,sales,medium
2,0.11,0.88,7,272,4,0,1,0,sales,medium
3,0.72,0.87,5,223,5,0,1,0,sales,low
4,0.37,0.52,2,159,3,0,1,0,sales,low
