# How Resilience Relates to Life Satisfaction during COVID‑19

*EECS 398 Final Project Showcase*


### 1 — Why this project?

In April 2020—right after colleges closed for COVID‑19—**763** U.S. students answered a short anonymous survey about how they were feeling. Many of us were left terribly affected by this otherwise unnecessarily-inflicted pandemic.

The seven survey questions I use are simple 1‑to‑5 ratings:

| Question column | What it asks (1 = low, 5 = high) |
|---|---|
| **life_sat** | "How satisfied are you with your life right now?" |
| **resilient** | "I bounce back quickly after hard times." |
| **resilient_covid** | "I can cope with stress caused by COVID‑19." |
| **energy** | "I feel energetic today." |
| **happiness** | "I feel happy today." |
| **hopeful** | "I feel hopeful today." |
| **self_esteem** | "I feel good about myself today." |

**Main question:**  
If a student is more resilient, do they also report higher life satisfaction?


#### Quick look at the raw numbers

In [1]:
import pandas as pd
df = pd.read_csv('final_project_data.csv')[['life_sat','resilient','energy','happiness','hopeful','self_esteem']]
df.head(10)

Unnamed: 0,life_sat,resilient,energy,happiness,hopeful,self_esteem
0,1.0,5.0,2.0,2.0,2.0,2.0
1,2.0,5.0,2.0,2.0,2.0,2.0
2,2.0,5.0,2.0,2.0,4.0,1.0
3,3.0,4.0,3.0,2.0,3.0,3.0
4,1.0,5.0,1.0,1.0,1.0,1.0
5,4.0,5.0,3.0,4.0,4.0,4.0
6,3.0,4.0,2.0,4.0,4.0,3.0
7,2.0,2.0,2.0,2.0,2.0,2.0
8,2.0,4.0,2.0,2.0,2.0,2.0
9,3.0,5.0,2.0,3.0,3.0,4.0



### 2 — Cleaning the data

* Empty strings were turned into **NaN** (Python's "missing value").  
* Between **9%** and **14%** of answers are missing, depending on the question.

**Why I chose *median* imputation**

* The ratings are on a 1‑to‑5 scale and most students pick 4's.  
* Filling blanks with the *mean* would pull the data upward because of a few 5's.  
* Filling with the *mode* would turn many answers into 4's and hide real variety.  
* Keeping rows with missing values would throw away up to 14% of students.

Using the median keeps the "middle" answer and leaves the overall shape of the data almost the same.


In [2]:
missing_pct = (df.isna().mean()*100).round(1)
missing_pct

life_sat       14.0
resilient       9.4
energy         14.0
happiness      14.0
hopeful        13.9
self_esteem    14.2
dtype: float64

### 3 — What do the numbers look like?

In [3]:
import plotly.express as px, plotly.io as pio, numpy as np
pio.renderers.default = "iframe_connected"
fig_hist = px.histogram(df, x='life_sat', nbins=7,
                        title='How students rated their life satisfaction (1 = low, 5 = high)',
                        labels={'life_sat':'life_sat'})
fig_hist

*Most students choose 4. Scores of 1 or 2 are rare, so predicting very low satisfaction may be tough.*

In [4]:
mask = df['resilient'].notna() & df['life_sat'].notna()
fig_scatter = px.scatter(df[mask], x='resilient', y='life_sat',
                         trendline='ols',
                         title='Do more resilient students feel more satisfied?',
                         labels={'resilient':'resilient','life_sat':'life_sat'},
                         opacity=0.6)
fig_scatter

*The upward sloping line says **yes**: on average, a 1‑point jump in resilience lines up with about a 0.16‑point jump in life satisfaction.*

In [5]:
df['resilience_group'] = pd.qcut(df['resilient'].rank(method='first'), 3,
                                 labels=['Low','Medium','High'])
agg = df.groupby('resilience_group', observed=False)['life_sat'].agg(['mean','count']).round(2)
agg

Unnamed: 0_level_0,mean,count
resilience_group,Unnamed: 1_level_1,Unnamed: 2_level_1
Low,2.71,227
Medium,2.73,211
High,2.86,218


*Students in the **High** resilience group average the highest life‑satisfaction, confirming the trend.*


### 4 — Turning the question into a prediction task

I treat **life_sat** as a number to predict (*regression*).

* **Baseline model** – uses only **resilient** and **energy**.  
* **Final model** – adds the other four "mood" questions **plus** an interaction term and an average mood score.

**Why Ridge Regression?**  
Think of Ridge as ordinary linear regression with a "seat‑belt" that keeps the coefficients from swinging wildly when the inputs are correlated (and they are!). One knob, called **alpha (α)**, tightens or loosens the belt; I pick the best α with cross‑validation (try a few values, keep the one that predicts best on unseen data).

**Metric** – *Root Mean Squared Error (RMSE)*. In the same 1‑to‑5 units, how far off my predictions are on average. I report RMSE rather than MSE because RMSE has the same 1-to-5 units as the survey scores, so "RMSE = 0.63" literally means "on average my prediction is off by about half a scale point." MSE squares the error, giving numbers in "points²," which are less intuitive to interpret; RMSE keeps the same sensitivity to large errors but remains human-readable. 


### 5 — How well do the models do?

| Model | RMSE (lower = better) | R² (higher = better) |
|---|---|---|
| Baseline | **0.82** | 0.18 |
| Final Ridge | **0.63** | 0.52 |

**Meaning**

* The final model's guesses are off by **0.63 points** on average—about half a step on the 1‑to‑5 scale.  
* It explains **52%** of the differences we see between students, almost three times better than the baseline.



#### Quick residual check

Plotting prediction errors (not shown here) shows no strong pattern: the line of best fit stays flat and the dots spread out evenly. That means a straight‑line model is good enough for now.



### 6 — Results?

1. **Resilient students really do feel better.** Boosting resilience by 1 point is linked to a ~0.16‑point rise in life satisfaction.  
2. **Daily mood matters too.** Adding simple "I feel happy/energetic/hopeful" questions nearly halves prediction error.  

*Next steps:* Collect new data over the semester to see how these relationships change over time.
