#### Statistical Analysis Graded Project

Context - HealthCare Plus is a multi-specialty hospital that provides medical consultations,
treatments, and diagnostic services. The hospital management wants to use statistical
analysis to optimize operations, improve patient care, and make data-driven decisions.
To achieve this, HealthCare Plus has collected data on patient admission times,
recovery durations, patient satisfaction scores, effectiveness of different treatments,
hospital expenses, and staff efficiency. The goal is to analyze this data and provide
insights that can help improve hospital operations, enhance patient satisfaction, and
reduce unnecessary expenses.


In [36]:
## import necessary libraries

import pandas as pd
import numpy as np
from scipy.stats import skew, kurtosis

#### Section A

1.HealthCare Plus recorded the daily number of patient admissions for the past 10 days:
[32, 28, 35, 30, 29, 27, 31, 34, 33, 30]
● Compute the mean, median, and mode of patient admissions.
● Which measure best represents patient admissions?
● If the hospital increases its admission capacity by 10%, how will this affect the
measures of central tendency?


In [8]:
## Load the data

admissions_data = [32, 28, 35, 30, 29, 27, 31, 34, 33, 30]

## convert into a pandas Series

admissions_series = pd.Series(admissions_data)

## statistical calculations

mean_admissions = admissions_series.mean()
median_admissions = admissions_series.median()
mode_admissions = admissions_series.mode()

In [10]:
## Display the results

print(f"The mean of the data series is: {mean_admissions}")
print(f"The median of the data series is : {median_admissions}")
print(f"The mode of the data series is: {mode_admissions}")

The mean of the data series is: 30.9
The median of the data series is : 30.5
The mode of the data series is: 0    30
dtype: int64


Q- Which measure best represents patient admissions? 

Answer - As there are no outliers and data is fairly uniformly distributed we can consider the mean to be the best number in this case

Q - If the hospital increases its admission capacity by 10%, how will this affect the
measures of central tendency?

Answer - If the capacity is increased by 10%. And we see a 10% increase in daily admissions as well. then the central tendency figures will also increase by the same proportions that is 10%.

#### Question 2

2. The recovery duration (in days) of 10 patients who underwent the same surgery is
recorded as follows:
[5, 7, 6, 8, 9, 5, 6, 7, 8, 6]
● Calculate the range, variance, and standard deviation.
● What does the standard deviation indicate about variability in recovery times?
● If two new patients take 4 and 10 days to recover, how will this impact the
standard deviation?

In [22]:
# Load the data

recovery_duration = [5, 7, 6, 8, 9, 5, 6, 7, 8, 6]

## Create a pandas Series

recovery_series = pd.Series(recovery_duration)

## Calculate the range of data as the difference between max and min

recovery_range = recovery_series.max() - recovery_series.min()

## calculate the variance in data range

recovery_variance = recovery_series.var()

## Calculate the Standard Deviation of the data

recovery_std = recovery_series.std()

In [26]:
## Display the data

print(f"The range is: {recovery_range}")
print(f"The variance is : {recovery_variance}")
print(f"The standard deviation is: {recovery_std}")

The range is: 4
The variance is : 1.788888888888889
The Standard Deviation is: 1.3374935098492586


##### Question -What does the standard deviation indicate about variability in recovery times?

Answer - the standard deviation is 1.33, it means that the recovery time could be 1.33 above or below the mean value. Since it is quite small that means the recovery durations are very consistent, no huge variations

##### Question - If two new patients take 4 and 10 days to recover, how will this impact the standard deviation?

In [31]:
new_data_set = [5, 7, 6, 8, 9, 5, 6, 7, 8, 6,4,10]

new_series = pd.Series(new_data_set)

new_std = new_series.std()

print(f"The new standard deviation is: {new_std}")

The new standard deviation is: 1.764549903980152


Answer - the standard deviation has increased to 1.73 days. Most probably due to one patient taking 10 days to recover

##### Question 3

3. Patient satisfaction scores (on a scale of 1 to 10) collected from a hospital survey
are:
[8, 9, 7, 8, 10, 7, 9, 6, 10, 8, 7, 9]
● Compute skewness and kurtosis.
● Interpret the results—does the data suggest a normal distribution?
● If the hospital implements a new customer service initiative and satisfaction
scores shift higher, what type of skewness change would you expect?

In [40]:
##  Load the data

satisfaction_scores = [8, 9, 7, 8, 10, 7, 9, 6, 10, 8, 7, 9]

## Convert the list to a panda series

satisfaction_scores_series = pd.Series(satisfaction_scores)

# Compute skewness and kurtosis

skewness = skew(satisfaction_scores_series)
kurtosis = kurtosis(satisfaction_scores_series)

## Display the results

print(f"The skewness is : {skewness:}")
print(f"The kurtosis is: {kurtosis:}")

The skewness is : -0.04146734005998014
The kurtosis is: -1.0145959416162336


Slightly negative skew, but value is very less -0.04 so indicates normal distribution

##### Question 4

HealthCare Plus wants to analyze the relationship between nurse staffing levels and
patient recovery time. Data from 6 hospital departments is provided:

Compute the correlation coefficient between nurse staffing and patient recovery
time.
● If the hospital increases the number of nurses by 5 per department, how will this
affect the recovery time based on the trend?

In [46]:
## Load the data

nurses = [10, 12, 15, 18, 20, 22]
recovery_days = [8, 7, 6, 5, 4, 3]

## create a pandas dataframe

df = pd.DataFrame({'number_nurses': nurses, 'recovery_days': recovery_days})

In [48]:
df

Unnamed: 0,number_nurses,recovery_days
0,10,8
1,12,7
2,15,6
3,18,5
4,20,4
5,22,3


In [52]:
## Find correlation
df.corr()

Unnamed: 0,number_nurses,recovery_days
number_nurses,1.0,-0.996757
recovery_days,-0.996757,1.0


Very strong negative correlation - it shows that as number of nurses increase, patient recovery time decreases

Question - If the hospital increases the number of nurses by 5 per department, how will this
affect the recovery time based on the trend?

Answer -  the recovery time will reduce as number of nurses increase

#### Section B

The hospital claims that the average patient wait time in the emergency department
is 30 minutes. A sample of 10 patient wait times (in minutes) is recorded:
[32, 29, 31, 34, 33, 27, 30, 28, 35, 26]
● Test whether the hospital’s claim is valid at a 5% significance level.
● State the null and alternative hypotheses.
● If the wait time significantly exceeds 30 minutes, what changes should the
hospital implement to reduce waiting time?


In [61]:
# Load the data

wait_times = [32, 29, 31, 34, 33, 27, 30, 28, 35, 26]

# Calculate the sample size

n = len(wait_times)

# Calculate the mean and the standard deviation

avg_wait_time = np.mean(wait_times)
std = np.std(wait_times, ddof=1)


# The mean from null hypothesis

null_hypothesis_mean = 30

# Calculate the t-statistic value

t_stat = (avg_wait_time - null_hypothesis_mean) / (std / np.sqrt(n))

print(t_stat)

0.5222329678670935
