# Analysis of Meal plan & Exercise Schedule (Gender ,Goal ,BMI)

### Author: Joanna Bieri

This notebook uses data from the Kaggle mean plan and exercise schedule data set:

https://www.kaggle.com/datasets/kavindavimukthi/meal-plan-and-exercise-schedule-gender-goal-bmi

it was accessed on (9/6/2025).

Dataset was created and curated by Kavinda Vimukthi for building a fitness recommendation demo and ML experiments. It is under the Attribution-ShareAlike 4.0 International license: https://creativecommons.org/licenses/by-sa/4.0/

The original had 5 variables and 80000 observations.

## Understand the breakdown of Goals and BMI Categories across Genders

The goal of this notebook is to understand if or how the goals and BMI categories differ across genders. This is an initial data exploration. Because the Exercise Schedule and Meal Plan information is unnecessary for this specific inquiry, it has been dropped from the original data set. The data is saved in the file

    'my_exercise_data.csv'



In [1]:
import os
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.defaule = 'colab'

import kagglehub

In [2]:
df = pd.read_csv('my_exercise_data.csv', index_col=0)
df.shape

(80000, 3)

In [3]:
df.head()

Unnamed: 0,Gender,Goal,BMI Category
0,Female,muscle_gain,Normal weight
1,Male,fat_burn,Underweight
2,Male,muscle_gain,Normal weight
3,Male,muscle_gain,Overweight
4,Female,muscle_gain,Normal weight


In [11]:
df.describe()

Unnamed: 0,Gender,Goal,BMI Category
count,80000,80000,80000
unique,2,2,4
top,Female,muscle_gain,Underweight
freq,40680,41020,20940


## Basic EDA

We will begin by finding some value counts for each column to get a measure of how the overall data falls into different categories and to see what categories are available. We can also use bar plots to show the data.


In [4]:
for k in df.keys():
    print(df[k].value_counts())
    print('-------')

Gender
Female    40680
Male      39320
Name: count, dtype: int64
-------
Goal
muscle_gain    41020
fat_burn       38980
Name: count, dtype: int64
-------
BMI Category
Underweight      20940
Normal weight    19920
Overweight       19840
Obesity          19300
Name: count, dtype: int64
-------


In [5]:
for k in df.keys():
    df_counts = df[k].value_counts().reset_index()
    df_counts.columns = [k,'Count']
    fig = px.bar(df_counts, x=k, y='Count', title=k)
    # Show the plot
    fig.show()

#### Analysis

The data is pretty balanced between Male and Female gender. It is also fairly balanced between the categories of goals and BMI. It will be interesting to see if the graphs change if we focus into a single gender.

## EDA grouped by gender

Next we will explore these counts to see if there is any difference by gender. We start by grouping the data into gender and plotting each of the columns as bar plots. Here we have to loop over the items in df_gender so that we can look at the groups independently.

In [6]:
# Group the data by gender
df_gender = df.groupby('Gender')
columns = ['Goal','BMI Category']
# Then look at the value counts for each gender
for gender, group_df in df_gender:
    print(f"Group: {gender}")
    for k in columns:
        df_counts = group_df[k].value_counts().reset_index()
        fig = px.bar(df_counts, x=k, y='count', title=k + ':  ' + gender)
        # Show the plot
        fig.show()
        

Group: Female


Group: Male


### Analysis

These graphs are not very enlightening. The y-axis scales are the same and from visual inspection it is hard to see any difference. It might be better to look at the gender breakdown on the same graph so we can more easily compare. 

## Continued EDA - Gender Plots

The goal here is to plot the goal and BMI categories, but with a break down of the genders on the same plot for comparison. This time we will group in a different way. First we will group by Goal and Gender and then we will group by BMI and gender. Each time we will plot the results to see if any difference is apparent.

Originally when these graphs were created it was hard to tell if the differences were caused only because there were more females in the dataset or if it was a real difference between the genders. So in each case I calculated the percent of the gender that falls into each group and added it to the bars.

In [7]:
total_by_gender = df.groupby("Gender").size().reset_index(name='Total')


In [8]:
df_counts = df.groupby(["Goal", "Gender"]).size().reset_index(name="Count")
df_counts = df_counts.merge(total_by_gender, on="Gender")
df_counts["Percent"] = df_counts["Count"] / df_counts["Total"] * 100

fig = px.bar(
    df_counts,
    x="Goal",        # x-axis categories
    y="Count",       # height of bars
    color="Gender",  # color bars by Gender
    barmode="group", # side-by-side bars for each Gender
    title="Goal Counts by Gender",
    text=df_counts["Percent"].apply(lambda x: f"{x:.1f}%")  # show % on bar
)

fig.show()

In [9]:
df_counts = df.groupby(["BMI Category", "Gender"]).size().reset_index(name="Count")
df_counts = df_counts.merge(total_by_gender, on="Gender")
df_counts["Percent"] = df_counts["Count"] / df_counts["Total"] * 100

fig = px.bar(
    df_counts,
    x="BMI Category",        # x-axis categories
    y="Count",       # height of bars
    color="Gender",  # color bars by Gender
    barmode="group", # side-by-side bars for each Gender
    title="Goal Counts by Gender",
    text=df_counts["Percent"].apply(lambda x: f"{x:.1f}%")  # show % on bar
)

fig.show()

#### Analysis

Slightly more people of the people with a goal of fat burning are female, while the muscle gain group is very evenly split. In each case more of the populations are interested in muscle gain. In the BMI categories we see things are also fairly balanced with very little difference between the genders.

## How do the Goals change based on the BMI Category

As an additional question I was wondering how the goal might depend on the BMI category. IN this case I grouped by Goal and BMI and then plotted the outcomes.

In [10]:
df_counts = df.groupby(["Goal", "BMI Category"]).size().reset_index(name="Count")

fig = px.bar(
    df_counts,
    x="Goal",        # x-axis categories
    y="Count",       # height of bars
    color="BMI Category",  # color bars by Gender
    barmode="group", # side-by-side bars for each Gender
    title="Goal Counts by BMI Category"
)

fig.show()

### Analysis

Here we see that of the people interested in fat_burn, slightly more of there are underweight. Otherwise the data is pretty evenly split.