The Health and Lifestyle Survey conducted by XYZ Corp collected a comprehensive dataset from 4,200 participants to analyze health behaviors and lifestyle choice patterns. The survey aimed to understand correlations between physical activity, dietary habits, sleep quality, stress levels, and overall health assessments among diverse individuals. This analysis will use measures of central tendency and measures of Dispersion to summarize and interpret the dataset, providing insights into the general health trends within the population.

The dataset contains the following features:

* **Participant ID**: a Unique identifier for each participant.
* **Physical Activity (hours/week)**: Represents the average number of hours the participant engages in weekly physical activity.
* **Fruits/Veg Servings (servings/day)**: Represents the average number of servings of fruits and vegetables the participant consumes daily.
* **Sleep Quality (score)**: Participants rated their average sleep quality on a scale from 1 (poor) to 10 (excellent).
* **Stress Level (score)**: Participants rated their average stress level on a scale from 1 (low) to 10 (high).
* **Overall Health (score)**: Participant's overall self-assessment of their health on a scale from 1 (poor) to 10 (excellent).
* **Diet Type:** Represents the type of diet the participant follows { 'Vegetarian', 'Vegan', 'Meat-Eater', 'Pescatarian', 'Flexitarian' }.
* **Exercise Type Preferred:** Represents the main type of physical activity the participant engages in { 'Cardio', 'Strength Training', 'Yoga/Pilates', 'Sports', 'None'}
* **Work Environment:** Represents the type of work environment of the participant. {'Office', 'Remote', 'Fieldwork', 'Mixed', 'Student'.}
* **Living Area:** Represents the type of area the participant lives in { 'Urban', 'Suburban', 'Rural'}


### **Q1: Load the Health and Lifestyle Survey dataset from "health_lifestyle.csv"**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/HealthLifestyle.csv')
df.head(5)

Unnamed: 0,Participant ID,Physical Activity,Fruits/Veg Servings,Sleep Quality,Stress Level,Overall Health,Diet Type,Exercise Type Preferred,Work Environment,Living Area
0,1,4.37,8.65,9,9,2,Pescatarian,Strength Training,Mixed,Suburban
1,2,9.56,2.77,2,3,9,Meat-Eater,Sports,Mixed,Suburban
2,3,7.59,2.13,5,4,8,Vegetarian,Sports,Mixed,Rural
3,4,6.39,9.67,9,5,5,Pescatarian,Strength Training,Remote,Suburban
4,5,2.4,1.97,2,10,1,Pescatarian,,Fieldwork,Suburban


### **Q2: Calculate the mean and median of the "Physical Activity (hours/week)" column**

In [None]:
physical_mean = df['Physical Activity'].mean()
physical_median = df['Physical Activity'].median()

print(f'Mean of Physical Activity (hours/week): {physical_mean}\n\
Median of Physical Activity (hours/week): {physical_median}')

Mean of Physical Activity (hours/week): 5.475707142857143
Median of Physical Activity (hours/week): 5.51


Mean of Physical Activity (hours/week): 5.475707142857143
Median of Physical Activity (hours/week): 5.51


### **Q3: Calculate the mean and median of the "Fruits/Veg Servings (servings/day)" column**

In [None]:
servings_mean = df['Fruits/Veg Servings'].mean()
servings_median = df['Fruits/Veg Servings'].median()

print(f'Mean of Fruits/Veg Servings (servings/day): {servings_mean}\n\
Median of Fruits/Veg Servings (servings/day): {servings_median}')

Mean of Fruits/Veg Servings (servings/day): 5.426183333333333
Median of Fruits/Veg Servings (servings/day): 5.37


Mean of Fruits/Veg Servings (servings/day): 5.426183333333333
Median of Fruits/Veg Servings (servings/day): 5.37


### **Q4: Identify the mode of the "Diet Type", "Exercise Type Preferred", "Work Environment", and "Living Area" columns?**

In [None]:
diet_mode = df['Diet Type'].mode()[0]
exercise_mode = df['Exercise Type Preferred'].mode()[0]
work_mode = df['Work Environment'].mode()[0]
area_mode = df['Living Area'].mode()[0]

In [None]:
print(f'Mode of Diet Type: {diet_mode}\n\
Mode of Exercise Type Preferred: {exercise_mode}\n\
Mode of Work Environment: {work_mode}\n\
Mode of Living Area: {area_mode}')

Mode of Diet Type: Vegan
Mode of Exercise Type Preferred: None
Mode of Work Environment: Student
Mode of Living Area: Rural


Mode of Diet Type: Vegan
Mode of Exercise Type Preferred: None
Mode of Work Environment: Student
Mode of Living Area: Rural


### **Q5: Calculate the range, variance, and standard deviation for the "Physical Activity (hours/week)" column. Discuss what these measures tell you about participants' variability in physical activity levels**

In [None]:
physical_range = df['Physical Activity'].max() - df['Physical Activity'].min()
physical_var = df['Physical Activity'].var()
physical_std = df['Physical Activity'].std()

In [None]:
print(f'Range of Physical Activity (hours/week): {physical_range}\n\
Variance of Physical Activity (hours/week): {physical_var}\n\
Standard Deviation of Physical Activity (hours/week): {physical_std}')

Range of Physical Activity (hours/week): 9.0
Variance of Physical Activity (hours/week): 6.81453381752458
Standard Deviation of Physical Activity (hours/week): 2.6104662069302065


Range of Physical Activity (hours/week): 9.0
Variance of Physical Activity (hours/week): 6.81453381752458
Standard Deviation of Physical Activity (hours/week): 2.6104662069302065


### **Q6: Calculate the range, variance, and standard deviation for the "Fruits/Veg Servings (servings/day)" column**

In [None]:
servings_range = df['Fruits/Veg Servings'].max() - df['Fruits/Veg Servings'].min()
servings_var = df['Fruits/Veg Servings'].var()
servings_std = df['Fruits/Veg Servings'].std()

In [None]:
print(f'Range of Fruits/Veg Servings (servings/day): {servings_range}\n\
Variance of Fruits/Veg Servings (servings/day): {servings_var}\n\
Standard Deviation of Fruits/Veg Servings (servings/day): {servings_std}')

Range of Fruits/Veg Servings (servings/day): 9.0
Variance of Fruits/Veg Servings (servings/day): 6.688632274073192
Standard Deviation of Fruits/Veg Servings (servings/day): 2.586239021063829


Range of Fruits/Veg Servings (servings/day): 9.0
Variance of Fruits/Veg Servings (servings/day): 6.688632274073192
Standard Deviation of Fruits/Veg Servings (servings/day): 2.586239021063829


### **Q7: Compute the interquartile range (IQR) for the "Sleep Quality (score)" column**

In [None]:
sleep_q1 = df['Sleep Quality'].quantile(0.25)
sleep_q3 = df['Sleep Quality'].quantile(0.75)
sleep_iqr = sleep_q3 - sleep_q1

In [None]:
print(f'Interquartile Range (IQR) of Sleep Quality (score): {sleep_iqr}')

Interquartile Range (IQR) of Sleep Quality (score): 5.0


Interquartile Range (IQR) of Sleep Quality (score): 5.0


### **Q8: Calculate the coefficient of variation (CV) for the "Overall Health (score)" column. Discuss how the CV provides insight into the relative variability of the participants' health assessments compared to their average health score**

In [None]:
health_mean = df['Overall Health'].mean()
health_std = df['Overall Health'].std()
health_coef_var = (health_std / health_mean) * 100

In [None]:
print(f'Coefficient of Variation (CV) of Overall Health (score): {health_coef_var}')

Coefficient of Variation (CV) of Overall Health (score): 55.75678807958001


In [None]:
# Insights

''' The value of 55% indicates the ratio of std to mean.
This means that the std is lower than the mean.
The value is relatively low, meaning the values of std are not as dispersed around the mean.
This shows that a good amount of participants' [Overall Health] is around the average score & not too far off.
'''

Coefficient of Variation (CV) of Overall Health (score): 55.75678807958001


### **Q9: Evaluate the diversity in the "Exercise Type Preferred" column by calculating the frequency of each exercise type and then determining the range between the most and least popular types. Discuss how this range helps in understanding the diversity of exercise preferences among participants**


In [None]:
exercise_freq = df['Exercise Type Preferred'].value_counts()
exercise_freq

None                 1354
Sports                858
Cardio                828
Strength Training     779
Yoga/Pilates          381
Name: Exercise Type Preferred, dtype: int64

None                 1354
Sports                858
Cardio                828
Strength Training     779
Yoga/Pilates          381
Name: Exercise Type Preferred, dtype: int64

In [None]:
activity_freq_range = exercise_freq.max() - exercise_freq.min()
activity_freq_range

973

973

In [None]:
# Insights

''' The greater the range, the less the diversity.
However in our case, the range is quite large, indicating a lower diversity.
We can derive that the diversity in [Exercise Type Preferred] is not quite high.
There are WAY less people preferring Yoga/Pilates for example over those preferring None.
'''

### **Q10: Explore the variation in "Living Area" types among participants. Discuss how living in different areas (Urban, Suburban, Rural) might contribute to variations in other lifestyle factors like physical activity or dietary habits**


In [None]:
# frequency
area_freq = df['Living Area'].value_counts()
area_freq

Rural       1428
Urban       1395
Suburban    1377
Name: Living Area, dtype: int64

Rural       1428
Urban       1395
Suburban    1377
Name: Living Area, dtype: int64

In [None]:
# grouping with physical activity
grouped0 = df.groupby('Living Area')['Physical Activity'].mean()
print(grouped0)

Living Area
Rural       5.509545
Suburban    5.462912
Urban       5.453699
Name: Physical Activity, dtype: float64


Living Area
Rural       5.509545
Suburban    5.462912
Urban       5.453699
Name: Physical Activity, dtype: float64

In [None]:
# grouping with diet
grouped1 = df.groupby('Living Area')['Fruits/Veg Servings'].mean()
grouped1

Living Area
Rural       5.421996
Suburban    5.469927
Urban       5.387290
Name: Fruits/Veg Servings, dtype: float64

Living Area
Rural       5.421996
Suburban    5.469927
Urban       5.387290
Name: Fruits/Veg Servings, dtype: float64

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4200 entries, 0 to 4199
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Participant ID           4200 non-null   int64  
 1   Physical Activity        4200 non-null   float64
 2   Fruits/Veg Servings      4200 non-null   float64
 3   Sleep Quality            4200 non-null   int64  
 4   Stress Level             4200 non-null   int64  
 5   Overall Health           4200 non-null   int64  
 6   Diet Type                4200 non-null   object 
 7   Exercise Type Preferred  4200 non-null   object 
 8   Work Environment         4200 non-null   object 
 9   Living Area              4200 non-null   object 
dtypes: float64(2), int64(4), object(4)
memory usage: 328.2+ KB


In [None]:
# Insights

''' By the viewing the average phys activity & servings per living area,
we can deduce the avg living conditions in each of these areas.
From the insights we've derived, all 3 [Rual], [Suburban], and [Urban]
areas share a similar avg for physical activity AND servings per day.

This means that in this specific data set, there seem to be very few
factors seperating them from one another as the conditions are indredibly close.

This also calls that there is no diversity in the data and there's heavy bias.
'''