# Machine Problem: Treatment Efficacy Prediction Engine

## 1. Project Overview

**Objective:**
Develop a machine learning regression system that predicts a patient's **Improvement Score** (0-10) based on their demographic profile, medical condition, and prescribed treatment plan.

**The Problem:**
Doctors currently prescribe medication based on general guidelines. However, patient responses vary wildly. By predicting the "Improvement Score" *before* treatment begins, this tool aims to help physicians choose the most effective treatment plan (Drug + Dosage + Duration) for a specific individual, effectively creating a "Personalized Medicine" recommender.

In [2]:
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Load data
df = pd.read_csv("real_drug_dataset.csv")
df = df.drop(columns=['Patient_ID']) # Patient_ID has no particular use.

In [19]:
# Test if csv can be read successfully
df.head(10)

Unnamed: 0,Age,Gender,Condition,Drug_Name,Dosage_mg,Treatment_Duration_days,Side_Effects,Improvement_Score
0,56,Male,Infection,Ciprofloxacin,50,9,Nausea,8.5
1,69,Male,Hypertension,Metoprolol,500,24,Tiredness,8.7
2,46,Female,Depression,Bupropion,100,25,Dry mouth,5.4
3,32,Male,Diabetes,Glipizide,850,44,Low blood sugar,6.4
4,60,Male,Depression,Bupropion,850,35,Anxiety,5.3
5,25,Female,Infection,Ciprofloxacin,850,50,Dizziness,6.7
6,78,Male,Diabetes,Glipizide,250,40,Nausea,6.5
7,38,Male,Pain Relief,Paracetamol,100,15,Liver issues,8.2
8,56,Male,Depression,Escitalopram,850,56,Nausea,9.0
9,75,Male,Diabetes,Metformin,850,19,Nausea,9.1


### Phase 1: Exploratory Data Analysis (EDA)

1. **Univariate Analysis:** Plot histograms of `Improvement_Score`. Is it a Bell Curve (Normal Distribution) or skewed?

In [None]:
# Histogram


2. **Bivariate Analysis:**
* Does `Age` correlate with `Improvement_Score`? (Scatter plot).
* Do certain `Drugs` consistently perform better for certain `Conditions`? (Box plots).

In [None]:
# Scatter plot


# Box plots

3. **Correlation Matrix:** Use a Heatmap to see if `Dosage` and `Duration` are correlated.

In [None]:
# Heatmap 

---

### Phase 2: Data Preprocessing & Feature Engineering
1. **Encoding:** Convert `Gender`, `Condition`, and `Drug_Name` into numbers using **One-Hot Encoding** (`pd.get_dummies`).

In [21]:
# Get all the values and its distribution for Gender, Condition, and Drug Name
print(df.Gender.value_counts().sort_index())
print(df.Condition.value_counts().sort_index())
print(df.Drug_Name.value_counts().sort_index())

Gender
Female    477
Male      523
Name: count, dtype: int64
Condition
Depression      176
Diabetes        207
Hypertension    194
Infection       215
Pain Relief     208
Name: count, dtype: int64
Drug_Name
Amlodipine          74
Amoxicillin         66
Azithromycin        70
Bupropion           66
Ciprofloxacin       79
Escitalopram        55
Glipizide           67
Ibuprofen           64
Insulin Glargine    78
Losartan            66
Metformin           62
Metoprolol          54
Paracetamol         62
Sertraline          55
Tramadol            82
Name: count, dtype: int64


In [None]:
# One hot encoding
print(f"Original Data:\n{df}\n")

df_encoded = pd.get_dummies(df, columns=['Gender', 'Condition', 'Drug_Name'], drop_first=True)
print(f"One-Hot Encoded Data using Pandas:\n{df_encoded}\n")

Original Data:
     Age  Gender     Condition      Drug_Name  Dosage_mg  \
0     56    Male     Infection  Ciprofloxacin         50   
1     69    Male  Hypertension     Metoprolol        500   
2     46  Female    Depression      Bupropion        100   
3     32    Male      Diabetes      Glipizide        850   
4     60    Male    Depression      Bupropion        850   
..   ...     ...           ...            ...        ...   
995   18    Male  Hypertension       Losartan        100   
996   35  Female     Infection   Azithromycin         50   
997   49  Female    Depression     Sertraline        850   
998   64    Male    Depression   Escitalopram        850   
999   66  Female  Hypertension     Metoprolol        500   

     Treatment_Duration_days     Side_Effects  Improvement_Score  
0                          9           Nausea                8.5  
1                         24        Tiredness                8.7  
2                         25        Dry mouth                5.

2. **Scaling:** Normalize `Dosage_mg` and `Age` using **MinMax Scaler** or **Standard Scaler** so large numbers don't confuse the model.

In [None]:
# Code

3. **Feature Engineering (The "Secret Sauce"):**
* Create a new feature: `Total_Drug_Exposure = Dosage_mg * Treatment_Duration_days`.
* Create an interaction feature: `Age_Group` (e.g., Young, Middle, Senior).

In [None]:
# Code


---

### Phase 3: Model Development
1. **Baseline Model:** Train a simple **Linear Regression**. Calculate the R2 Score. (Note: It will likely be low/poor. This is your baseline to beat).

In [None]:
# Linear Regression 


2. **Advanced Model 1:** Train a **Decision Tree Regressor**. This handles non-linear data better (e.g., maybe high dosage is good for young people but bad for old people).

In [None]:
# Decision Trree Regressor


3. **Advanced Model 2 (Champion):** Train a **Random Forest Regressor** or **Gradient Boosting Regressor**. These combine many trees to reduce errors.

In [None]:
# Random Forest


# Gradient Boosting



---

### Phase 4: Evaluation & Interpretation

1. **Metrics:** Report **MAE** (Mean Absolute Error) and **RMSE** (Root Mean Squared Error).
2. **Feature Importance:** Extract which factors mattered most. Was it the *Drug Name* or the *Duration*? 

*Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas congue quam vitae pretium aliquam. Sed fermentum blandit est, fringilla ultricies ligula venenatis id. Suspendisse pretium quam sed nibh lacinia mattis sit amet vitae dui. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin in pharetra orci. Donec at placerat elit. Duis tristique mollis tristique. Nam eu leo efficitur, fermentum nibh at, tincidunt tortor. Pellentesque id quam tortor. Duis velit libero, sagittis rutrum lectus in, aliquet pharetra magna. In pharetra mollis sagittis. Sed malesuada quam lorem. Nam dictum magna vel tellus ornare, nec sodales erat pulvinar. Curabitur volutpat, lacus tincidunt bibendum pharetra, erat diam semper diam, id venenatis metus est non magna. Etiam vehicula sollicitudin hendrerit. Nullam porttitor dui at sem euismod blandit.*