# Lab 10 - Predicting Patient Experience LTR

Hospitals are required to survey a certain number of patients and gather feedback on the experience they had with their visit:
* Was the hospital clean? (1-10)
* Did the staff communicate well with you? (1-10)
* Were the staff friendly? (1-10)
* Did you receive care in a timely manner? (1-10)
* Where the instructions you were discharged with clear to you? (1-10)
* **Would you recommend this facility to friends and family?** <- The "likely to recommend" (LTR): Yes or No

That last part is the most important. It determines federal reimbursement rates, gets published by rating services, and drives future patient volume. Anything you can do to improve LTR is a good thing.

But what factors influence LTR the most? We have some more detailed data in the other questions. We can use logistic regression to take the detailed ratings and build a model that predicts the LTR outcome: Yes or No.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import streamlit as st


## 1. Read the data

In [None]:
px = pd.read_csv('patient_experience.csv')

## 2. Explore and Visualize the Data

Useful things will be correlation scatterplots, histograms, by LTR series

In [None]:
# Seaborn pairplot is perfect for this
#   * hue='LTR' to show the LTR data sets as separate colors
#   * diag_kind='kde' helps us see if the inputs are normalized or need to be transformed

sns.pairplot(px, hue='LTR', diag_kind='kde')
plt.suptitle("Patient Experience Variables by LTR", y=1.02)
plt.show()

## 3. Split and Model

1. Scale the data
2. Split the data into training (70%) and test (30%)
3. Fit a Logistic Regression model

In [None]:
# separate independent and dependent variables
X = px[['cleanliness','communication','staff_friendliness','timeliness','discharge_clarity']]
y = px['LTR']

# apply a standard scaler to normlize the values of each input variable
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


In [None]:
# Keep 30% for testing
# Pick any value you want for the random_state, but this keeps the work repeatable
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)

## 4. Review the Model

Take a minute to look at the model coefficients and odds_ratios. What do they mean? Provide and interpretation.

In [None]:
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)

# Create dataframe of coefficients
coef_df = pd.DataFrame({
    'Variable': X.columns,
    'Coefficient': model.coef_[0],
    'Odds_Ratio': np.exp(model.coef_[0])
})
print("\nCoefficient Summary:")
print(coef_df.sort_values('Coefficient', ascending=False))

What does this tell us?

## 5. Evaluate the Model

Use tools like a confusion matrix and ROC/AUC curve to show how useful the model is.

In [None]:
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:,1]

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

In [None]:
fpr, tpr, _ = roc_curve(y_test, y_prob)
auc = roc_auc_score(y_test, y_prob)

plt.figure()
plt.plot(fpr, tpr, label=f"ROC curve (AUC = {auc:.2f})")
plt.plot([0,1], [0,1], linestyle='--', color='gray')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()

## 6. Predicted Probabilities

We can also visualize how each variable influences the LTR outcome as separate logit charts

In [None]:
px.describe()

In [None]:
# The value range for each input is 1 to 10
value_range = np.linspace(1, 10, 50)

# We'll provide a default value of 7 for the other variables being held constant
default_val = [7]*50
starter = pd.DataFrame({
    'cleanliness': default_val,
    'communication': default_val,
    'staff_friendliness': default_val,
    'timeliness': default_val,
    'discharge_clarity': default_val
})


# Loop through all fields and create a predictions for all possible inputs
for field in starter.keys():
    example = pd.DataFrame()
    for c in starter.keys():
        if c == field:
            example[c] = value_range
        else:
            example[c] = default_val

    example_scaled = scaler.transform(example)
    example['predicted_prob'] = model.predict_proba(example_scaled)[:,1]

    sns.lineplot(x=field, y='predicted_prob', data=example)
    plt.ylim(0,1)
    plt.title(f"Effect of {field} on Probability of LTR")
    plt.ylabel("Predicted Probability (LTR=1)")
    plt.show()

## 7. Making Predictions for New Patients

Now that I have a trained model, I can use it to predict whether a new patient 
would recommend the hospital based on their survey responses.

**Business Question:** Would this patient recommend our facility? (Yes or No)

This is the ultimate goal - using the detailed ratings to predict the binary LTR outcome 
that drives federal reimbursement, public ratings, and patient volume.

In [None]:
#setup a sample new patient data frame
new_patient_ratings_df = pd.DataFrame({
    'cleanliness': [8],
    'communication': [9],
    'staff_friendliness': [7],
    'timeliness': [6],
    'discharge_clarity': [8]
})

#show it
st.write("### New Patient Survey Responses")
st.dataframe(new_patient_ratings_df)

### Prediction Process

To make a prediction, I need to:
1. Scale the new patient data using the SAME scaler from training
2. Apply the fitted logistic regression model
3. Get both the binary prediction (Yes/No) and the probability

In [None]:
#transform (scale) the new patient data ratings sample
new_patient_scaled = scaler.transform(new_patient_ratings_df)

#show
st.write("### Scaled Features")
st.write("The model requires standardized inputs (mean=0, std=1)")
scaled_df = pd.DataFrame(
    new_patient_scaled,
    columns=new_patient_ratings_df.columns
)
st.dataframe(scaled_df)

In [None]:
#prediction model
predicted_ltr = model.predict(new_patient_scaled)[0]
predicted_probability = model.predict_proba(new_patient_scaled)[0, 1]

#show
st.write("### Prediction Results")
st.metric(
    label="Will this patient recommend the hospital?",
    value="YES ✓" if predicted_ltr == 1 else "NO ✗"
)
st.metric(
    label="Model Confidence",
    value=f"{predicted_probability:.1%}"
)

### Interpretation

My model predicts whether this patient will say **"Yes"** to recommending the facility 
based on their experience ratings.

**Key Insights:**
- **Binary Prediction**: The Yes/No answer that determines reimbursement impact
- **Probability**: How confident the model is (higher = more certain)
- **Threshold**: By default, probability ≥ 0.5 → "Yes", < 0.5 → "No"

**Business Impact:**
- Predictions help identify at-risk patients before they submit final surveys
- Hospital can intervene with patients predicted to say "No"
- Track which factors are dragging down individual patient scores

In [None]:
#I'll explore contributing factors to wrap up our analysis

#start by building a contributing factors dataframe
contribution_df = pd.DataFrame({
    'Factor': new_patient_ratings_df.columns,
    'Patient_Rating': new_patient_ratings_df.iloc[0].values,
    'Model_Coefficient': model.coef_[0],
    'Scaled_Input': new_patient_scaled[0]
})

contribution_df['Contribution_to_Prediction'] = (
    contribution_df['Scaled_Input'] * contribution_df['Model_Coefficient']
)

#clean up the dataframe we just created
contribution_df = contribution_df.sort_values(
    'Contribution_to_Prediction', 
    ascending=False
)

#show the contributing factors
st.write("### Factor Contributions to This Prediction")
st.write("Which ratings helped/hurt this patient's likelihood to recommend?")
st.dataframe(
    contribution_df[['Factor', 'Patient_Rating', 'Contribution_to_Prediction']]
    .style.background_gradient(subset=['Contribution_to_Prediction'], cmap='RdYlGn')
)


### Multiple Patient Predictions

Let's predict for several different patient profiles to see how the model performs.


In [None]:
#I'll try to predict for multiple patients

#start by building a multi-patient prediction dataframe with various survey results
multiple_patients_df = pd.DataFrame({
    'cleanliness': [9, 5, 7, 10, 3],
    'communication': [9, 4, 8, 9, 4],
    'staff_friendliness': [10, 6, 7, 10, 5],
    'timeliness': [8, 5, 6, 9, 2],
    'discharge_clarity': [9, 5, 8, 10, 4]
})

#transform and predict on those survey results
multiple_patients_scaled = scaler.transform(multiple_patients_df)
multiple_predictions = model.predict(multiple_patients_scaled)
multiple_probabilities = model.predict_proba(multiple_patients_scaled)[:, 1]

#build the results dataframe
results_df = multiple_patients_df.copy()
results_df['Predicted_LTR'] = ['Yes' if p == 1 else 'No' for p in multiple_predictions]
results_df['Probability'] = [f"{p:.1%}" for p in multiple_probabilities]

#show the recommendation prediction based on the survey results provided
st.write("### Batch Predictions for Multiple Patients")
st.dataframe(results_df)

In [None]:
#build our confidence plot
fig, ax = plt.subplots(figsize=(10, 6))

colors = ['green' if p == 1 else 'red' for p in multiple_predictions]
bars = ax.bar(range(len(multiple_probabilities)), multiple_probabilities, color=colors, alpha=0.6)

ax.axhline(y=0.5, color='black', linestyle='--', label='Decision Threshold (0.5)')
ax.set_xlabel('Patient ID')
ax.set_ylabel('Predicted Probability of LTR = Yes')
ax.set_title('Prediction Confidence for Multiple Patients')
ax.set_ylim(0, 1)
ax.legend()

#plot my predictions
for i, (prob, pred) in enumerate(zip(multiple_probabilities, multiple_predictions)):
    ax.text(i, prob + 0.02, f"{prob:.2f}", ha='center', fontsize=9)

#show the plot
st.pyplot(fig)


## Key Takeaways for Hospital Leadership

### Answering "Would This Patient Recommend Us?"

**Yes, this can be predicted!** The model uses the five experience ratings to forecast LTR.

### Practical Applications:

1. **Early Warning System**: Identify patients likely to say "No" before final survey
2. **Targeted Interventions**: Reach out to at-risk patients to address concerns
3. **Resource Allocation**: Focus on factors with highest coefficients for maximum impact
4. **Performance Monitoring**: Track predicted vs. actual LTR rates over time

### Recommendations:

- Deploy model to score patients in real-time after discharge
- Set up alerts when predicted probability < 0.5
- A/B test interventions on low-scoring patients
- Monitor model performance and retrain quarterly

### Final Thoughts:

- Logistic regression learns the relationship between those detailed ratings and the LTR outcome.
- Then it uses those patterns to predict whether a new patient will say "Yes" or "No" to recommending the hospital.
- LTR it perfect for this analysis due to the binary nature of the outcome ("Yes" or "No").
- LTR gives interpretable coefficients telling us which factors matter most.
- LTR provides probabilities indicating how confident I am in each prediction.
- LTR also shows odds ratios which help to quantify the impact of each factor.

✅ **YES, using Logistic Regression to transform those five detailed experience ratings into 
a recommendation prediction makes perfect sense!**