# DIABETES PATIENT RE-ADMISSION PREDICTION

### Introduction
Hospital re-admission can indicate issues in patient care and may reveal areas where medical treatment could be improved. Predicting whether a patient will be readmitted allows healthcare providers to adapt treatments proactively, potentially reducing readmission rates and improving patients' outcome. This project focuses on identifying factors associated with readmissions in diabetic patients and aims to build a model to predict readmission risks.

### Problem Statement
In this dataset, I aim to predict one of three possible patient outcomes:

**1. No Readmission:** The patient was not readmitted after discharge.

**2. Readmission in Less than 30 Days:** Early readmission suggests potential issues with the initial treatment, which could be adjusted to reduce recurrence.

**3. Readmission in More than 30 Days:** While less severe than early readmission, this still suggests potential follow-up care adjustments based on the patient's condition.

The objective of the project is to develop models that predict these readmission categories, enabling more proactive and effective patient care strategies.

## Data Description
The dataset spans 10 years of clinical data (1999-2008) from 130 hospitals and healthcare networks across the United States. It includes over 50 features that represent various aspects of patient health, demographics, and hospital treatment details. The dataset focuses exclusively on inpatient diabetic encounters with the following criteria:

- Hospital admission (inpatient encounter).
- Diagnosis of diabetes during the encounter.
- Length of stay ranging from 1 to 14 days.
- Laboratory tests and medication administration during the encounter.

**Key Attributes in the Dataset**

The data contains a range of attributes, including:
   - Patient Demographics: Patient ID, race, gender, age.
    
   - Admission Details: Admission type, duration of stay, medical specialty of the admitting physician.
    
   - Clinical Tests and Results: Number of lab tests, HbA1c test results, primary and secondary diagnoses.
    
   - Medication Data: Number of medications administered, diabetic medications prescribed.
    
   - Hospital Visit History: Number of outpatient, inpatient, and emergency visits in the year before the current hospitalization.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

In [2]:
# Load data
df = pd.read_csv('diabetic_data.csv')
df.head(3)

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),?,6,25,1,1,...,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),?,1,1,7,3,...,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),?,1,1,7,2,...,No,No,No,No,No,No,No,No,Yes,NO


In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101766 entries, 0 to 101765
Data columns (total 50 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   encounter_id              101766 non-null  int64 
 1   patient_nbr               101766 non-null  int64 
 2   race                      101766 non-null  object
 3   gender                    101766 non-null  object
 4   age                       101766 non-null  object
 5   weight                    101766 non-null  object
 6   admission_type_id         101766 non-null  int64 
 7   discharge_disposition_id  101766 non-null  int64 
 8   admission_source_id       101766 non-null  int64 
 9   time_in_hospital          101766 non-null  int64 
 10  payer_code                101766 non-null  object
 11  medical_specialty         101766 non-null  object
 12  num_lab_procedures        101766 non-null  int64 
 13  num_procedures            101766 non-null  int64 
 14  num_

In [47]:
def data_structure(data):
    """
    Function provides an overview of the dataset by showing:
    - List of column names
    - Shape of the dataframe
    - Unique number of patients
    - Unique races and genders
    - Missing values per column
    
    Arguments:
    data : pd.DataFrame
        The dataset to analyze.
    
    Returns:
    None
    """
    # List of column names
    columns = data.columns
    print(f"\nList of columns in dataframe:\n {columns}")
    
    # Shape of data
    print(f"\nShape of the dataframe: {data.shape}")
    
    # Unique patients
    unique_patients = data['patient_nbr'].nunique()
    print(f"\nUnique number of patients: {unique_patients}")
    
    # Unique races
    unique_races = data['race'].unique()
    print(f"\nUnique races: {unique_races}")
    
    # Unique genders
    unique_genders = data['gender'].unique()
    print(f"\nUnique genders: {unique_genders}")
    
    # Missing values per column
    missing_values = data.isna().sum()[data.isna().sum() > 0]
    missing_percentage = (missing_values / len(data)) * 100
    print(f"\nColumns with missing values:\n{missing_values}\n Missing values in percentage\n{missing_percentage.round(2)}%)\n")

In [48]:
data_structure(df)


List of columns in dataframe:
 Index(['encounter_id', 'patient_nbr', 'race', 'gender', 'age', 'weight',
       'admission_type_id', 'discharge_disposition_id', 'admission_source_id',
       'time_in_hospital', 'payer_code', 'medical_specialty',
       'num_lab_procedures', 'num_procedures', 'num_medications',
       'number_outpatient', 'number_emergency', 'number_inpatient', 'diag_1',
       'diag_2', 'diag_3', 'number_diagnoses', 'max_glu_serum', 'A1Cresult',
       'metformin', 'repaglinide', 'nateglinide', 'chlorpropamide',
       'glimepiride', 'acetohexamide', 'glipizide', 'glyburide', 'tolbutamide',
       'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol', 'troglitazone',
       'tolazamide', 'examide', 'citoglipton', 'insulin',
       'glyburide-metformin', 'glipizide-metformin',
       'glimepiride-pioglitazone', 'metformin-rosiglitazone',
       'metformin-pioglitazone', 'change', 'diabetesMed', 'readmitted'],
      dtype='object')

Shape of the dataframe: (101766, 50)

In [58]:
df['weight'].unique()

array(['?', '[75-100)', '[50-75)', '[0-25)', '[100-125)', '[25-50)',
       '[125-150)', '[175-200)', '[150-175)', '>200'], dtype=object)

In [59]:
null_weight = df[df['weight'] == '?']
null_weight

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),?,6,25,1,1,...,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),?,1,1,7,3,...,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),?,1,1,7,2,...,No,No,No,No,No,No,No,No,Yes,NO
3,500364,82442376,Caucasian,Male,[30-40),?,1,1,7,2,...,No,Up,No,No,No,No,No,Ch,Yes,NO
4,16680,42519267,Caucasian,Male,[40-50),?,1,1,7,1,...,No,Steady,No,No,No,No,No,Ch,Yes,NO
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
101761,443847548,100162476,AfricanAmerican,Male,[70-80),?,1,3,7,3,...,No,Down,No,No,No,No,No,Ch,Yes,>30
101762,443847782,74694222,AfricanAmerican,Female,[80-90),?,1,4,5,5,...,No,Steady,No,No,No,No,No,No,Yes,NO
101763,443854148,41088789,Caucasian,Male,[70-80),?,1,1,7,1,...,No,Down,No,No,No,No,No,Ch,Yes,NO
101764,443857166,31693671,Caucasian,Female,[80-90),?,2,3,7,10,...,No,Up,No,No,No,No,No,Ch,Yes,NO
