In [27]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import re
from scipy.stats import ttest_ind

This data comes from :
    https://www.kaggle.com/joniarroba/noshowappointments

In [28]:
path = ("https://raw.githubusercontent.com/gurkpet/Thinkful-Lessons/890653404caeea1894d89761cd3032eeade46209/"
"Thinkful%201.5.1%20-%20Narrative%20Analytics%20Capstone/No-show-Issue-Comma-300k.csv")
df = pd.read_csv(path, low_memory = False)

For this experiment we have 2 groups of data, patients who recieved SMS reminders that they have a doctors appointment and patients who did not.  We hypothesize that patients who recieved the SMS reminder will be more likely to show up to their appointment than patients who did not recieve the SMS reminder.

In [29]:
df.head()

Unnamed: 0,Age,Gender,AppointmentRegistration,ApointmentData,DayOfTheWeek,Status,Diabetes,Alcoolism,HiperTension,Handcap,Smokes,Scholarship,Tuberculosis,Sms_Reminder,AwaitingTime
0,19,M,2014-12-16T14:46:25Z,2015-01-14T00:00:00Z,Wednesday,Show-Up,0,0,0,0,0,0,0,0,-29
1,24,F,2015-08-18T07:01:26Z,2015-08-19T00:00:00Z,Wednesday,Show-Up,0,0,0,0,0,0,0,0,-1
2,4,F,2014-02-17T12:53:46Z,2014-02-18T00:00:00Z,Tuesday,Show-Up,0,0,0,0,0,0,0,0,-1
3,5,M,2014-07-23T17:02:11Z,2014-08-07T00:00:00Z,Thursday,Show-Up,0,0,0,0,0,0,0,1,-15
4,38,M,2015-10-21T15:20:09Z,2015-10-27T00:00:00Z,Tuesday,Show-Up,0,0,0,0,0,0,0,1,-6


In [31]:
#Check for unique items in Sms_Reminder Column
df['Sms_Reminder'].unique()

array([0, 1, 2], dtype=int64)

In [32]:
#It seems for some reason in the boolean data section for SMS Reminder there are 3 option where there should be 2
#The third option '2' is a relatively small portion of the data, so we can drop it.
df.groupby('Sms_Reminder').count()

Unnamed: 0_level_0,Age,Gender,AppointmentRegistration,ApointmentData,DayOfTheWeek,Status,Diabetes,Alcoolism,HiperTension,Handcap,Smokes,Scholarship,Tuberculosis,AwaitingTime
Sms_Reminder,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,128547,128547,128547,128547,128547,128547,128547,128547,128547,128547,128547,128547,128547,128547
1,170654,170654,170654,170654,170654,170654,170654,170654,170654,170654,170654,170654,170654,170654
2,799,799,799,799,799,799,799,799,799,799,799,799,799,799


In [33]:
#drop the data where SMS_reminder equals 2
df = df[df.Sms_Reminder != 2]

In [34]:
#check that we correctly dropped the data we want to drop
df['Sms_Reminder'].unique()

array([0, 1], dtype=int64)

In [35]:
#replace show up with 1, and no show with 0 in the status column
df['Status'] = np.where(df['Status']=='Show-Up', 1, 0)

In [39]:
#Check the data to make sure we converted the Show Up and No-show data to 1s and 0s.
df.Status.unique()

array([1, 0], dtype=int64)

In [37]:
#separate the data into our two groups.
recieved_sms = df[df['Sms_Reminder']==1]
no_sms = df[df['Sms_Reminder']==0]

In [38]:
#run a ttest on the two groups for their porbability of showing up.
print(ttest_ind(recieved_sms['Status'], no_sms['Status']))

Ttest_indResult(statistic=0.393983868910256, pvalue=0.6935932295794931)


The high pvalue of our ttest indicates that the two groups 

<h1>Conclusion from the Data</h1>

It seems that the SMS reminder has no effect on whether or not patients show up as the no show rate for patients who recieved and didn't recieve reminders is the same.  

This causes an obvious problem where the medical professionals are left dealing with the time constraint consequences of no show patients.  

A potential solution to this problem would be to allow patients to cancel their appointments via text, as it seems patients who don't intended on going to their appointments have decided this prior or regardless of the text reminder.

To test this we could split the patients up into two groups randomly.  One group recieves a text reminding them of their appointment but also allowing them to cancel their appointment by responding with a SMS message saying 'cancel'.  The other group will recieve no information whatsoever, because existing data shows SMS reminders have no effect on attendance of appointments.  This test will need do be carried out for at least a week as the show up rate varies over the course of a week.

If the group that is given the chance to canel their appointment sees a decrease in no shows for uncanceled appointments then we can conclude that the effects of this test are positive.  