# Workout Data Analysis


In [27]:
# Not-exhaustive project checklist
# Note: this is not a writing assignment, but you should still incorporate comments or headers throughout
# -- This will help others who review your work, but will also help you keep track of what you're doing
# -- Just think about what would help you understand a peer's project when you have to grade theirs later


# 1) Dataset included
# 1a) Dataset explained/documented when used
# 2) Analysis performed
# 2a) Analysis steps documented, explained briefly
# 3) Visualizations/outputs
# 4) Conclusion
# ---Did you answer your questions?
# ---Did you find anything else?
# ---What problems did you run into?
# ---Any suggestions for future research?

# Critical checks before submission
# - Does the noteboook run without error? (Kernel>restart & run all => no errors at all? No long periods of processing?)
# - Do you have your dataset included here? You shouldn't be connecting to any external data
# - - Even if external datasets work here, they WILL NOT work for peer-grading, and you may get a 0.

## The Data

This data set comes from [Kaggle](https://www.kaggle.com/datasets/valakhorasani/gym-members-exercise-dataset).


<blockquote>About the Dataset:


This dataset provides a detailed overview of gym members' exercise routines, physical attributes, and fitness metrics. It contains 973 samples of gym data, including key performance indicators such as heart rate, calories burned, and workout duration. Each entry also includes demographic data and experience levels, allowing for comprehensive analysis of fitness patterns, athlete progression, and health trends.


Key Features:

<ul>
    <li>Age: Age of the gym member.</li>
    <li>Gender: Gender of the gym member (Male or Female).</li>
    <li>Weight (kg): Member’s weight in kilograms.</li>
    <li>Height (m): Member’s height in meters.</li>
    <li>Max_BPM: Maximum heart rate (beats per minute) during workout sessions.</li>
    <li>Avg_BPM: Average heart rate during workout sessions.</li>
    <li>Resting_BPM: Heart rate at rest before workout.</li>
    <li>Session_Duration (hours): Duration of each workout session in hours.</li>
    <li>Calories_Burned: Total calories burned during each session.</li>
    <li>Workout_Type: Type of workout performed (e.g., Cardio, Strength, Yoga, HIIT).</li>
    <li>Fat_Percentage: Body fat percentage of the member.</li>
    <li>Water_Intake (liters): Daily water intake during workouts.</li>
    <li>Workout_Frequency (days/week): Number of workout sessions per week.</li>
    <li>Experience_Level: Level of experience, from beginner (1) to expert (3).</li>
    <li>BMI: Body Mass Index, calculated from height and weight.</li>
</ul>
</blockquote>

## Analysis Questions
1. Which workouts burn the most calories?
2. What workout is most prevelant with each gender?
3. Do specific workouts lead to a better BMI?
4. Is there a correlation between calories burn and BMI?

## Exlporing the Data

In [14]:
import pandas as pd
import numpy as np

In [11]:
#loading the csv file into a data frame
df = pd.read_csv('gym_members_exercise_tracking.csv')

In [12]:
# checking the top 5 entries of the data frame
df.head()

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
0,56,Male,88.3,1.71,180,157,60,1.69,1313.0,Yoga,12.6,3.5,4,3,30.2
1,46,Female,74.9,1.53,179,151,66,1.3,883.0,HIIT,33.9,2.1,4,2,32.0
2,32,Female,68.1,1.66,167,122,54,1.11,677.0,Cardio,33.4,2.3,4,2,24.71
3,25,Male,53.2,1.7,190,164,56,0.59,532.0,Strength,28.8,2.1,3,1,18.41
4,38,Male,46.1,1.79,188,158,68,0.64,556.0,Strength,29.2,2.8,3,1,14.39


In [13]:
#checking the last 5 entries of the data frame
df.tail()

Unnamed: 0,Age,Gender,Weight (kg),Height (m),Max_BPM,Avg_BPM,Resting_BPM,Session_Duration (hours),Calories_Burned,Workout_Type,Fat_Percentage,Water_Intake (liters),Workout_Frequency (days/week),Experience_Level,BMI
968,24,Male,87.1,1.74,187,158,67,1.57,1364.0,Strength,10.0,3.5,4,3,28.77
969,25,Male,66.6,1.61,184,166,56,1.38,1260.0,Strength,25.0,3.0,2,1,25.69
970,59,Female,60.4,1.76,194,120,53,1.72,929.0,Cardio,18.8,2.7,5,3,19.5
971,32,Male,126.4,1.83,198,146,62,1.1,883.0,HIIT,28.2,2.1,3,2,37.74
972,46,Male,88.7,1.63,166,146,66,0.75,542.0,Strength,28.8,3.5,2,1,33.38


In [15]:
#we want to check the data frame for any missing entries
np.where(pd.isnull(df))

(array([], dtype=int64), array([], dtype=int64))

In [22]:
#checking the columns we're working with
# 1. Which workouts burn the most calories?
q_one_cols = df[["Workout_Type", "Calories_Burned"]]
display(q_one_cols)

Unnamed: 0,Workout_Type,Calories_Burned
0,Yoga,1313.0
1,HIIT,883.0
2,Cardio,677.0
3,Strength,532.0
4,Strength,556.0
...,...,...
968,Strength,1364.0
969,Strength,1260.0
970,Cardio,929.0
971,HIIT,883.0


In [23]:
#2. What workout is most prevelant with each gender?
q_two_cols = df[["Workout_Type", "Gender"]]
display(q_two_cols)

Unnamed: 0,Workout_Type,Gender
0,Yoga,Male
1,HIIT,Female
2,Cardio,Female
3,Strength,Male
4,Strength,Male
...,...,...
968,Strength,Male
969,Strength,Male
970,Cardio,Female
971,HIIT,Male


In [24]:
#3. Do specific workouts lead to a better BMI?
q_three_cols = df[["Workout_Type", "BMI"]]
display(q_three_cols)

Unnamed: 0,Workout_Type,BMI
0,Yoga,30.20
1,HIIT,32.00
2,Cardio,24.71
3,Strength,18.41
4,Strength,14.39
...,...,...
968,Strength,28.77
969,Strength,25.69
970,Cardio,19.50
971,HIIT,37.74


In [26]:
#4. Is there a correlation between calories burn and BMI?
# want to also include workout type since that's the main driver 
# for calories being burned. Thus we might want to expand on our 
# initial question and see if a certain workout type is better
# than the rest. 
q_four_cols = df[["Workout_Type", "Calories_Burned", "BMI"]]
display(q_four_cols)

Unnamed: 0,Workout_Type,Calories_Burned,BMI
0,Yoga,1313.0,30.20
1,HIIT,883.0,32.00
2,Cardio,677.0,24.71
3,Strength,532.0,18.41
4,Strength,556.0,14.39
...,...,...,...
968,Strength,1364.0,28.77
969,Strength,1260.0,25.69
970,Cardio,929.0,19.50
971,HIIT,883.0,37.74


**NOTE:** Typically we would also check if each column had outliers, data that represents missing data (i.e. -99, No Gender, etc.), but this data already came preclean from kaggle so we don't have to worry about that.

## Analysis