### <div align="center">Supervised Machine Learning - Regression</div>

##### 3.1: Simple Linear Regression
##### 3.2: Multiple Linear Regression
- Simple Linear Regression (One independent variable): y = mx + c (Slope (m) - coefficient, Intercept (c) - line cut at y axis, y is dependent variable).
- Linear regression helps to establish the relationship between dependent and independent variables.
- Multiple Linear Regression (Multiple independent variable): y=m1⋅x1+m2⋅x2+…+b ; m1,m2: coefficients, x1,x2: independent variables and b: intercept
- Slop is also called Gradient

##### 3.5: Cost Function
- Best fit line is finding the appropriate line for slope m and intercept b.
- Understanding the cost function is essential for understanding Gradient Descent.
- Gradient Descent is the most important concept in the world of Supervised Machine Learning.
- Line, Slope, Intercept are the essential basic terms.
- Error/Loss: Difference between the predicted and actual Y value.
- Mean Absolute Error (MAE): The average of errors, disregarding their direction. It is the average differences between prediction and actual observation.
- Mean Squared Error (MSE): The average of squared differences between prediction and actual observation. It effectively highlights larger errors.

##### 3.6: Derivatives and Partial Derivatives
- Slope of a line at a given point is Derivative.
- Derivative: y′ = x0 + x0**n−1.
- Slope is used for linear equations, whereas Derivative is used for non-linear equations.
- Slope is constant, whereas Derivative is a function.
- The purpose of a Partial Derivative is to measure how a function changes as one of its variables is varied while keeping the other variables constant.

##### 3.7: Chain Rule
- Chain rule: A technique used to compute the derivative of a function composed of multiple functions.
- Application: Chain rule will be used in the Gradient Descent Technique.

##### 3.10: Gradient Descent Theory
- Gradient Descent: An optimization method used in linear regression to find the best-fit line by iteratively adjusting the slope (m) and intercept (b) to minimize the cost function, usually the mean squared error (MSE).
- Efficiency: Since testing every combination for MSE is impractical, gradient descent efficiently minimizes MSE with fewer iterative adjustments.
- Mean Squared Error (MSE): MSE= 1/n ∑[yi − (mxi + b)] ** 2

##### 3.11: Gradient Descent Implementation
- In a role as a Data Scientist or AI Engineer, your daily tasks will not typically involve implementing Gradient Descent. Instead, you will use ML libraries. However, a solid understanding of this concept is beneficial for both your work and potential interviews.
- Adjusting the learning rate and epochs based on observed outputs (m, L) will enable you to obtain the desired outcomes in the Gradient Descent Implementation.

##### 3.12: Why MSE (Not MAE)
- Mean Squared Error (MSE) is our go-to for calculating Gradient Descent because:
  1. It's sensitive to outliers.
  2. It's continuously differentiable.
- In rare scenarios, Mean Absolute Error (MAE) is our pick for Gradient Descent when we're dealing with lots of outliers.
- Even though these principles are basic, in real-world scenarios, you'll be using machine learning libraries directly.

##### 3.13: Model Evaluation (Train, Test Split)
- Just as obtaining a driver's license involves passing a test, not just learning to drive, machine learning models require splitting the dataset into training and testing parts and evaluating the model's precision.
- We utilize the train_test_split() function from the sklearn library for this purpose.


##### 3.14: Model Evaluation - Metrics
- To evaluate the performance of a Machine Learning model, we can use metrics such as MSE, MAE, or R2 score.
- The R2 score is easier to interpret, compared to other metrics.
- The parameter random_state is used in the train_test_split() function to ensure reproducibility.


##### 3.18: Data preprocessing - One hot encoding
- Data Science/AI Project Stages
  Data Collection -> Data Preprocessing (Cleaning bad data, creating new features - feature engineering called one hot encoding) -> Model Training & Evaluation -> Model Tuning
- Multicollinearity occurs when two or more independent variables are highly correlated, making it difficult to distinguish their individual effects on the dependent variable.
- In simple terms, computers understand numbers, not text. The process of turning text into numbers is called encoding.
- We use label encoding for ordinal categories that have a specific order. For nominal categories, which don't have an order, we use one-hot encoding.
- One-hot encoding transforms categorical data into a binary vector format that's easier for machine learning to understand. Each category is represented by a binary vector, with a "1" at the position corresponding to the category and "0"s everywhere else.
- Multicollinearity is a situation where two or more independent variables are closely related. It makes it hard to differentiate their separate effects.
- To handle multicollinearity, we remove one of the columns after/during the one-hot encoding process.
- The Pandas library includes a built-in function named get_dummies() for implementing one-hot encoding. By using the drop_first parameter, we can eliminate the first dummy column.

##### 3.20: Polynomial Regression
- Simple linear regression models a straight-line (y=b0+b1x) relationship between a dependent variable y and an independent variable x. Polynomial regression extends this by including higher powers of x for complex, non-linear relationships.
- y = b0 + b1*X + b2*X**2 + b3*X**3 + .... + bnX**n ; b0, b1,…,bn - coefficients ; n = degree
- Polynomial Regression with degree=1 is nothing but a Linear Regression.
- Deciding the degree will be based on trial and error, as well as domain knowledge.

##### 3.23: Overfitting and Underfitting
- Overfitting: Occurs when a model learns too much detail and noise from the training data, affecting its performance on new data.
- Under fitting: Happens when a model is too simple and cannot learn the data pattern, leading to poor performance on all data.
- Balanced Fit: Achieved when a model accurately learns the training data's patterns and performs well on unseen data.

##### 3.24: Reasons and Remedies for Over fitting/Under fitting
- Overfitting can happen due to any one or combination of the following points:
  1. Reason: Poor model, hyper parameters selection: Solution: Better model, hyper parameters selection (Where we try to cover each point and line becomes zigzag instead curve).
  2. Reason: Insufficient training data, Solution: Sufficient training data
  3. Reason: Poor feature selection, Solution: Careful feature selection
  4. Reason: Inadequate validation, Solution: Adequate validation (If train and test datasets are from same area then points available in different area will not fit into derived model).
  5. Reason: Lack of regularization, Solution: Apply regularization (It is a technique used to reduce overfitting but if we apply too much regularization then our model will go from one extream to another extream means overfit to underfit).
- Under fitting can happen due to any one or combination of the following points:
  1. Reason: Too simple model: Solution: Use a complex model that can capture data patterns
  2. Reason: Insufficient training data, Solution: Sufficient training data
  3. Reason: Insufficient features / Poor feature engineering: Solution: Better feature selection/engineering
  4. Reason: Insufficient training time, Solution: Sufficient training time (No of epoch/iterations should be more).
  5. Reason: Inadequate validation, Solution: Adequate validation
  6. Reason: Excessive regularization, Solution: Adequate regularization

##### 3.25: L1 and L2 Regularization
- L1 and L2 regularization are effective tools for minimizing over fitting. When L2 regularization is applied to Linear Regression, it transforms into Ridge Regression. In below use case we have added/increased the penalty and improved the regularization.
- In the same vein, L1 Regularization leads to what we commonly call Lasso Regression.
- Linear Regression too can encounter over fitting issues if the number of features is excessive.
- The choice between Ridge or Lasso Regression and parameters like Alpha is contingent on multiple factors and is typically determined through a process of trial and error.
- Linear regression also can have overfitting when there are too many feature and regularization is used to reduce the problem.

##### 3.26: Bias Variance Trade off
- Bias is a measurement of how accurately a model can capture a pattern in a training dataset.
- Bias occurs when an algorithm misses significant patterns in the data due to its simplicity, while Variance occurs when an algorithm changes significantly based on minor differences in the training data.
- BUVO: Bias-Underfitting (High bias often leads to underfitting), Variance-Overfitting.
- Overfitting: Training Error - Low, Test Error - High (The model has learned too much detail from the training data).

### Linear Regression Use case
You are a data scientist / AI engineer at a healthcare consulting firm. You have been provided with a dataset named **`"patient_health_data.csv"`**, which includes records of various health indicators for a group of patients. The dataset comprises the following columns:

- `age:` The age of the patient.
- `bmi:` Body Mass Index of the patient.
- `blood_pressure:` The blood pressure of the patient.
- `cholesterol:` Cholesterol levels of the patient.
- `glucose:` Glucose levels of the patient.
- `insulin:` Insulin levels of the patient.
- `heart_rate:` Heart rate of the patient.
- `activity_level:` Activity level of the patient.
- `diet_quality:` Quality of diet of the patient.
- `smoking_status:` Whether the patient smokes (Yes or No).
- `alcohol_intake:` The amount of alcohol intake by the patient.
- `health_risk_score:` A composite score representing the overall health risk of a patient.

Your task is to use this dataset to build a linear regression model to predict the health risk score based on the given predictor variables. Additionally, you will use L1 (Lasso) and L2 (Ridge) regularization techniques to improve the model's performance. 

#### Import necessary libraries

In [3]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

#### Step 1: Data Preparation and Exploration
1. Import the data from the **`"patient_health_data.csv"`** file and store it in a variable df.
2. Display the number of rows and columns in the dataset.
3. Display the first few rows of the dataset to get an overview.
4. Check for any missing values in the dataset and handle them appropriately.
5. Encode the categorical variable `'smoking_status'` by converting 'Yes' to 1 and 'No' to 0.

In [4]:
# Import the data from the "patient_health_data.csv" file and store it in a variable 'df'
df = pd.read_csv("../data/patient_health_data.csv")

# Display the number of rows and columns in the dataset
print("Number of rows and columns:", df.shape)

# Display the first few rows of the dataset to get an overview
print("First few rows of the dataset:")
df.head()

Number of rows and columns: (250, 12)
First few rows of the dataset:


Unnamed: 0,age,bmi,blood_pressure,cholesterol,glucose,insulin,heart_rate,activity_level,diet_quality,smoking_status,alcohol_intake,health_risk_score
0,58,24.865215,122.347094,165.730375,149.289441,22.306844,75.866391,1.180237,7.675409,No,0.824123,150.547752
1,71,19.103168,136.852028,260.610781,158.584646,13.869817,69.481114,7.634622,8.933057,No,0.85291,160.32035
2,48,22.316562,137.592457,177.342582,178.760166,22.849816,69.386962,7.917398,3.501119,Yes,4.740542,187.487398
3,34,22.196893,153.164775,234.594764,136.351714,15.140336,95.348387,3.19291,2.745585,No,2.226231,148.773138
4,62,29.837173,92.768973,276.106498,158.753516,17.228576,77.680975,7.044026,8.918348,No,3.944011,170.609655


In [6]:
# Check for any missing values in the dataset and handle them appropriately
print("Missing values in the dataset:")
df.isna().sum()

Missing values in the dataset:


age                  0
bmi                  0
blood_pressure       0
cholesterol          0
glucose              0
insulin              0
heart_rate           0
activity_level       0
diet_quality         0
smoking_status       0
alcohol_intake       0
health_risk_score    0
dtype: int64

In [10]:
# Encode the categorical variable 'smoking_status' by converting 'Yes' to 1 and 'No' to 0.
df['smoking_status'] = df['smoking_status'].apply(lambda x: 1 if x == 'Yes' else 0)

#### Step 2: Train Linear Regression Models
1. Select the features and the target variable for modeling.
2. Split the data into training and test sets with a test size of 25%.
3. Initialize and train a Linear Regression model, and evaluate its performance using R-squared.
4. Initialize and train a Lasso Regression model with various alpha values provided in a list: [0.01, 0.1, 1.0, 10.0], and evaluate its performance using R-squared.
5. Initialize and train a Ridge Regression model with various alpha values provided in a list: [0.01, 0.1, 1.0, 10.0], and evaluate its performance using R-squared.

In [11]:
df.head()

Unnamed: 0,age,bmi,blood_pressure,cholesterol,glucose,insulin,heart_rate,activity_level,diet_quality,smoking_status,alcohol_intake,health_risk_score
0,58,24.865215,122.347094,165.730375,149.289441,22.306844,75.866391,1.180237,7.675409,0,0.824123,150.547752
1,71,19.103168,136.852028,260.610781,158.584646,13.869817,69.481114,7.634622,8.933057,0,0.85291,160.32035
2,48,22.316562,137.592457,177.342582,178.760166,22.849816,69.386962,7.917398,3.501119,1,4.740542,187.487398
3,34,22.196893,153.164775,234.594764,136.351714,15.140336,95.348387,3.19291,2.745585,0,2.226231,148.773138
4,62,29.837173,92.768973,276.106498,158.753516,17.228576,77.680975,7.044026,8.918348,0,3.944011,170.609655


In [13]:
# Select the features and target variable for modeling
X = df.drop(['health_risk_score'], axis=1)
y = df['health_risk_score']

# Split the data into training and test sets with a test size of 25%
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [16]:
# Initialize and train a Linear Regression model, and evaluate its performance using R-squared
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
linear_r2 = linear_model.score(X_test, y_test)
print("Linear Regression R-squared:", linear_r2)

Linear Regression R-squared: 0.7643620906757489


In [17]:
# Initialize and train a Lasso Regression model with various alpha values provided in a list, and evaluate its performance using R-squared
lasso_alphas = [0.01, 0.1, 1.0, 10.0]
for alpha in lasso_alphas:
    lasso_model = Lasso(alpha=alpha)
    lasso_model.fit(X_train, y_train)
    lasso_r2 = lasso_model.score(X_test, y_test)
    print(f"Lasso Regression R-squared (alpha={alpha}):", lasso_r2)

Lasso Regression R-squared (alpha=0.01): 0.7645437646395714
Lasso Regression R-squared (alpha=0.1): 0.7660509914802165
Lasso Regression R-squared (alpha=1.0): 0.781976368357514
Lasso Regression R-squared (alpha=10.0): 0.7873364302158368


- Lasso regression with alpha `1.0` gives a little improvement in accuracy score, `0.78` as opposed to `0.76` with plain Linear regression.

In [18]:
# Initialize and train a Ridge Regression model with various alpha values provided in a list, and evaluate its performance using R-squared
ridge_alphas = [0.01, 0.1, 1.0, 10.0]
for alpha in ridge_alphas:
    ridge_model = Ridge(alpha=alpha)
    ridge_model.fit(X_train, y_train)
    ridge_r2 = ridge_model.score(X_test, y_test)
    print(f"Ridge Regression R-squared (alpha={alpha}):", ridge_r2)

Ridge Regression R-squared (alpha=0.01): 0.7643631589390539
Ridge Regression R-squared (alpha=0.1): 0.7643727707489341
Ridge Regression R-squared (alpha=1.0): 0.7644686367656156
Ridge Regression R-squared (alpha=10.0): 0.7654030812954538


- Ridge regression performs similar to linear regression. However, for complex datasets where there is a possibility of overfitting both `lasso and ridge` regression will give improvements over vanilla models