# Wellbeing Analysis Project

This notebook examines the relationship between lifestyle factors and wellbeing factors through three research questions and what we hope to find:
- Q1: Days without social media vs. Wellbeing Score
- Q2: Social media platform user demographics and screen time
- Q3: Predictors of stress levels

## Table of Contents

1. [Setup & Imports](#1-setup--imports)
2. [Data Loading](#2-data-loading)
3. [Data Understanding](#3-data-understanding)
4. [Data Cleaning](#4-data-cleaning)
5. [Question 1: Days Without Social Media and Wellbeing](#5-question-1)
6. [Question 2: Social Media Platform Demographics](#6-question-2)
7. [Question 3: Stress Level Predictors](#7-question-3)
8. [Summary and Conclusions](#8-summary-and-conclusions)

## 1. Setup & Imports

In [None]:
# Data manipulation
import pandas as pd
import numpy as np

# Visualisation
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning







## 2. Data Loading

In [None]:
# Load the dataset
#df = pd.read_csv('our_dataset.csv')



## 3. Data Understanding

### 3.1 First Look at the Data

In [None]:
# Display first few rows


### 3.2 Dataset Information

In [None]:
# Get basic information about the dataset


In [None]:
# Column names


### 3.3 Statistical Summary

In [None]:
# Statistical summary of numerical columns


## What This Check Tells Us:
### What Each Check Means:

- count, min max std etc through our columns

### 3.4 Check for Data Quality Issues

In [None]:
# Check for missing values


In [None]:
# Check for duplicate rows



# Check value ranges for key variables
Checking min max for each:
Days_Without_Social_Media: 
Daily_Screen_Time: 
Age:

## 4. Data Cleaning

### 4.1 Create a Working Copy

In [None]:
# Create a copy to preserve original data
df_clean = df.copy()


### 4.2 Create Wellbeing Score Composite

In [None]:
# Create composite Wellbeing Score
# Wellbeing Score = f(Happiness_Index, Sleep_Quality, Exercise_Frequency, Stress_Level)



### 4.6 Save Cleaned Data

In [None]:
# Save cleaned dataset
df_clean.to_csv('wellbeing_data_cleaned.csv', index=False)


## Data cleaning - what we've done so far

Missing values if any 
Duplicates removed if any 
Outliers if any
Wellbeing Score created  
Cleaned data saved

# start from here

* resume work with cleaned data start here:*

In [None]:
# Load cleaned data
# df_clean = pd.read_csv('wellbeing_data_cleaned.csv')
# create a working copy
df_working = df_clean.copy() # so you don't mess up the cleaned data

## 5. Question 1: Days Without Social Media and Wellbeing

**Research Question:** To what extent is the number of Days_Without_Social_Media related to an individual's overall Wellbeing_Score?

### 5.1 analysis

In [None]:
# Distribution of Days Without Social Media

In [None]:
# Visualise distributions


# Days Without Social Media


# Wellbeing Score




### 5.2 Analysis

In [None]:
# Calculate correlation


### 5.3 

In [None]:
# which ever charts you want to plot


# same format as 5.2




### 5.4 Analysis

In [None]:
# Analyse relationship with each wellbeing component


In [None]:
# Visualise 


### 5.5 Key Findings - Question 1

**Summary:**



## 6. Question 2: Social Media Platform Demographics

**Research Question:** What is the typical Age and Gender profile for users of different Social_Media_Platform, and how does their average Daily_Screen_Time compare?

### 6.1 Platform Overview

In [None]:
# Get unique platforms and user counts


### 6.2 Age Profile by Platform

In [None]:
# Calculate age statistics by platform



In [None]:
# Visualise age distribution by platform


### 6.3 Gender Profile by Platform

In [None]:
# Gender distribution by platform


In [None]:
# Visualise gender distribution


### 6.4 Daily Screen Time by Platform

In [None]:
# Calculate screen time statistics by platform


### 6.5 Visualisation: Screen Time Comparison

In [None]:
# Box plot of screen time by platform


In [None]:
# Bar chart of average screen time


### 6.6 Combined Demographics Summary

In [None]:
# 

### 6.7 Key Findings - Question 2

**Summary:**



## 7. Question 3: Stress Level Predictors

**Research Question:** What are the most significant predictors of a user's Stress_Level, and can a model accurately predict whether a user falls into the high or low stress categories based on lifestyle factors?

### 7.1 Create Target Variable Categories

In [None]:
# Create binary stress categories (High/Low)
# Adjust threshold based on our data

### 7.2 Select Predictor Variables

In [None]:
# Define lifestyle factor predictors
    'Age',
    'Daily_Screen_Time',
    'Days_Without_Social_Media',
    'Exercise_Frequency',
    'Sleep_Quality',
    'Happiness_Index'
    # Add other relevant lifestyle factors


### 7.3 Exploratory Data Analysis for Stress

In [None]:
# Compare predictors across stress categories

In [None]:
# Visualise predictor distributions by stress category


### 7.4 Correlation Analysis

In [None]:
# Calculate correlations with stress level


In [None]:
# Visualise correlation heatmap


### 7.5 Prepare Data for Machine Learning

## either add your imports here or at the top of notebook

In [None]:
# Prepare feature matrix and target variable
X = df_clean[predictors]
y = df_clean['Stress_Category'].map({'Low': 0, 'High': 1})

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



In [None]:
# Standardise features


### 7.6 Model Building - Logistic Regression

In [None]:
# Train logistic regression model


# Make predictions


In [None]:
# Evaluate chosen model


### 7.7 Model Building - Random Forest

In [None]:
# Train random forest model


# Make predictions


In [None]:
# Evaluate random forest


### 7.8 Feature Importance Analysis

In [None]:
# Get feature importance from random forest

 

In [None]:
# Visualise feature importance


### 7.9 Model Visualisation - Confusion Matrix

In [None]:
# Plot confusion matrices


# Logistic Regression


# Random Forest


### 7.10 Model Visualization - ROC Curves

In [None]:
# Plot ROC curves


# Logistic Regression ROC


# Random Forest ROC


# Diagonal reference line


### 7.11 Model Comparison Summary

In [None]:
# Create comparison table


# Identify best model


### 7.12 Key Findings - Question 3

**Summary:**



## 8. Summary and Conclusions

### 8.1 Project Overview

same as start of notebook

### 8.2 Key Findings Summary

**Question 1: Days Without Social Media and Wellbeing**


**Question 2: Platform Demographics**


**Question 3: Stress Predictors**
-

### 8.3 Cross-Question Insights



### 8.4 Limitations



### 8.5 Recommendations

**For Individuals:**


**For Future Research:**


## credits