# ClinicRecall AI – No-Show Prediction Model
This notebook contains the implementation of a logistic regression model to predict patient no-shows using a real-world medical appointment dataset.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt
import seaborn as sns

## Load Dataset

In [2]:
df = pd.read_csv('/Users/gnanreddybobba/Desktop/KaggleV2-May-2016.csv')
df.head()

Unnamed: 0,PatientId,AppointmentID,Gender,ScheduledDay,AppointmentDay,Age,Neighbourhood,Scholarship,Hipertension,Diabetes,Alcoholism,Handcap,SMS_received,No-show
0,29872500000000.0,5642903,F,2016-04-29T18:38:08Z,2016-04-29T00:00:00Z,62,JARDIM DA PENHA,0,1,0,0,0,0,No
1,558997800000000.0,5642503,M,2016-04-29T16:08:27Z,2016-04-29T00:00:00Z,56,JARDIM DA PENHA,0,0,0,0,0,0,No
2,4262962000000.0,5642549,F,2016-04-29T16:19:04Z,2016-04-29T00:00:00Z,62,MATA DA PRAIA,0,0,0,0,0,0,No
3,867951200000.0,5642828,F,2016-04-29T17:29:31Z,2016-04-29T00:00:00Z,8,PONTAL DE CAMBURI,0,0,0,0,0,0,No
4,8841186000000.0,5642494,F,2016-04-29T16:07:23Z,2016-04-29T00:00:00Z,56,JARDIM DA PENHA,0,1,1,0,0,0,No


## Data Cleaning & Feature Engineering

In [3]:
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
df['DaysBetween'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days
df = df[df['DaysBetween'] >= 0]
df['No-show'] = df['No-show'].map({'No': 0, 'Yes': 1})
df = df.drop(columns=['PatientId', 'AppointmentID', 'ScheduledDay', 'AppointmentDay', 'Neighbourhood'])

## Encode Gender

In [4]:
df['Gender'] = LabelEncoder().fit_transform(df['Gender'])  # F=0, M=1

## Train Logistic Regression Model

In [5]:
X = df.drop(columns='No-show')
y = df['No-show']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

## Evaluate Model

In [6]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

Accuracy: 0.7151195108393552

Classification Report:
               precision    recall  f1-score   support

           0       0.72      1.00      0.83     10301
           1       0.30      0.00      0.00      4091

    accuracy                           0.72     14392
   macro avg       0.51      0.50      0.42     14392
weighted avg       0.60      0.72      0.60     14392


Confusion Matrix:
 [[10285    16]
 [ 4084     7]]
