<a href="https://colab.research.google.com/github/m-rafiul-islam/HealthCare-Analytics-Disease-Prediction/blob/main/sacb_survey_study_git.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reseearch Questions
Minority populations utilize joint replacement procedures 20-30% less than their white counterparts, even if they have health insurance. We conducted a survey of the community around the UIW School of Osteopathic Medicine to better understand the thinking around joint replacement procedures (Q17 block in our survey). Do they fear the procedure? Do they believe it is a good treatment for end stage knee osteoarthritis? Do faith or financial considerations come into play? If a friend or family member has had a bad experience, does that affect their decision-making? We also asked one qualitative question related to survey question 17.4, which relates to fear of the joint replacement procedure.

Individuals were asked to fill out the first part of the survey. Then, answer whether they have had knee or hip pain for a month or more. If they answered yes, then they were asked to fill out the Q17 block of questions, that covers the topic of joint replacement procedures.

Our research question was: Do Hispanic subjects demonstrate different attitudes towards knee replacement procedures than white counterparts? And if so, are their any demographic, faith, educational, financial, or patient-doctor alignment issues (Q16 block and free response questions in our survey) that correlate with these attitudes?



Males vs. Females; run the **same pairwise comparisons, correlations, and logistic regressions**. These analyses would be independent of ethnicity or race. We want to see how attitudes related to sex. We have a sense of this but want to your results.

Good evening. I found some mistakes in the results document that might need a minor clarification. A few of the correlation results are listed by question # and then a small description of the question is attached. In a couple cases the question # does not match the description of the question. So I am left wondering if the result is for the question # or the question description.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
path = '/content/drive/My Drive/UIW_Research/SA Survey Data Analysis'
os.chdir(path)

In [None]:
import pandas as pd
# Load the Excel file
file_path = "UtilizationSurveyStudy_v15_CA_raw.xlsx"
xls = pd.ExcelFile(file_path)

# Load the main sheet
df = xls.parse('Sheet1')

# Define sex column and attitude variables (Q17 block)
sex_col = 'Q6'  # 1 = Male, 2 = Female
attitude_vars = [col for col in df.columns if col.startswith('Q17_')]

# Filter for valid male/female entries
df_sex = df[df[sex_col].isin([1.0, 2.0])].copy()

# Group by sex and calculate mean for each attitude item
desc_stats = df_sex.groupby(sex_col)[attitude_vars].mean().T
desc_stats.columns = ['Male (1)', 'Female (2)']
desc_stats


Unnamed: 0,Male (1),Female (2)
Q17_4,2.72,2.966387
Q17_5,3.04,2.768595
Q17_6,3.28,3.330579
Q17_7,3.040816,3.1
Q17_8,4.265306,4.45
Q17_9,3.48,3.760331
Q17_10,4.4,4.613445
Q17_11,4.12,4.570248


#### 1. **Group Means (Likert-Scale: 1 = Strongly Disagree to 5 = Strongly Agree)**

* Females slightly more fearful of surgery (Q17\_4).
* Males more likely to report being influenced by others' surgical experiences (Q17\_5).
* Little to no difference in belief about financial barriers (Q17\_6), surgery avoidance (Q17\_7), or following doctor’s advice (Q17\_8).




In [None]:
import pandas as pd
from scipy.stats import ttest_ind
# Initialize list to store t-test results
results = []

# Perform independent t-tests for each attitude variable
for var in attitude_vars:
    male_vals = df_sex[df_sex[sex_col] == 1.0][var].dropna()
    female_vals = df_sex[df_sex[sex_col] == 2.0][var].dropna()

    if len(male_vals) > 1 and len(female_vals) > 1:
        t_stat, p_val = ttest_ind(male_vals, female_vals, equal_var=False)  # Welch's t-test
        results.append({
            "Variable": var,
            "T-Statistic": round(t_stat, 4),
            "P-Value": round(p_val, 4),
            "Mean (Male)": round(male_vals.mean(), 3),
            "Mean (Female)": round(female_vals.mean(), 3),
            "N (Male)": len(male_vals),
            "N (Female)": len(female_vals)
        })

# Create a DataFrame with results
ttest_df = pd.DataFrame(results)
ttest_df

Unnamed: 0,Variable,T-Statistic,P-Value,Mean (Male),Mean (Female),N (Male),N (Female)
0,Q17_4,-0.7996,0.4259,2.72,2.966,50,119
1,Q17_5,0.9093,0.3656,3.04,2.769,50,121
2,Q17_6,-0.1603,0.873,3.28,3.331,50,121
3,Q17_7,-0.1934,0.8471,3.041,3.1,49,120
4,Q17_8,-0.8846,0.379,4.265,4.45,49,120
5,Q17_9,-1.0118,0.3145,3.48,3.76,50,121
6,Q17_10,-1.0394,0.3019,4.4,4.613,50,119
7,Q17_11,-1.8981,0.0617,4.12,4.57,50,121


#### 2. **T-Tests**

* None of the pairwise differences in attitude scores between males and females reached statistical significance (all p-values > 0.05).
* Suggests that while numerical differences exist, they are not strong enough to rule out chance.

In [None]:
# @title Logistic Regression
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Load Excel file
file_path = "UtilizationSurveyStudy_v15_CA_raw.xlsx"
xls = pd.ExcelFile(file_path)
df = xls.parse('Sheet1')

# Define sex and attitude variables
sex_col = 'Q6'  # 1 = Male, 2 = Female
attitude_vars = [col for col in df.columns if col.startswith('Q17_')]

# Filter valid male/female entries and drop missing attitude data
df_sex = df[df[sex_col].isin([1.0, 2.0])].copy()
df_logit = df_sex[attitude_vars + [sex_col]].dropna()

# Convert sex to binary: Male = 1, Female = 0
df_logit['is_male'] = df_logit[sex_col].apply(lambda x: 1 if x == 1.0 else 0)

# Build logistic regression formula
formula = 'is_male ~ ' + ' + '.join(attitude_vars)

# Fit logistic regression model
logit_model = smf.logit(formula=formula, data=df_logit).fit()

# Summarize model results: Odds Ratios and P-Values
logit_summary = pd.DataFrame({
    "Variable": logit_model.params.index,
    "Odds Ratio": logit_model.params.apply(lambda x: round(np.exp(x), 3)),
    "P-Value": logit_model.pvalues.round(4)
}).reset_index(drop=True)

# Output
print(logit_summary)


Optimization terminated successfully.
         Current function value: 0.581556
         Iterations 5
    Variable  Odds Ratio  P-Value
0  Intercept       2.522   0.3379
1      Q17_4       0.893   0.3332
2      Q17_5       1.186   0.1493
3      Q17_6       0.971   0.7634
4      Q17_7       1.005   0.9681
5      Q17_8       0.992   0.9606
6      Q17_9       0.913   0.4646
7     Q17_10       0.944   0.7278
8     Q17_11       0.750   0.0396


#### 3. **Logistic Regression**

* No attitude variable significantly predicted sex (all p > 0.05).
* Directionally:

  * Higher fear of surgery (Q17\_4) → slightly more likely to be female.
  * Influence by others’ experiences (Q17\_5) → slightly more likely to be male.
  * Financial barrier beliefs (Q17\_6) → not predictive.