##### Data Dictionary

| Column Name           | Description                                   |
| --------------------- | --------------------------------------------- |
| date                  | The specific day of data recording            |
| steps                 | Total daily step count                        |
| weight                | Body weight measurement (in kg)               |
| resting_heart_rate    | Heart beats per minute while at complete rest |
| sleep_hours           | Total daily sleep duration                    |
| active_minutes        | Total time spent in physical activity         |
| total_calories_burned | Total daily energy expenditure                |
| fat_burn_minutes      | Time in 50-69% of max heart rate zone         |
| cardio_minutes        | Time in 70-84% of max heart rate zone         |
| peak_minutes          | Time in 85%+ of max heart rate zone           |
| workout_type          | Category of exercise performed                |
| workout_duration      | Length of exercise session in minutes         |
| workout_calories      | Energy expended during workout                |
| workout_avg_hr        | Mean heart rate during exercise               |
| workout_max_hr        | Highest heart rate during exercise            |

Import the required Python libraries - ```pandas``` for data analysis and ```plotly.express``` for interactive data visualization

In [1]:
import pandas as pd
import plotly.express as px

#### Exploratory Data Analysis (EDA)

Read in the ```fitness_data.csv``` file using pandas, and print out the first 5 coloumns to observe a sample of the data in the dataset.

In [16]:
fitness_data = pd.read_csv('fitness_data.csv')
fitness_data.head()

Unnamed: 0,date,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
0,7/1/24,75.248357,6829.0,63.68,7.03,66.0,664.0,Strength Training,68.28,641.12,143.67,171.17,33.0,19.0,13.0
1,7/2/24,74.914564,7741.0,58.63,7.85,66.0,660.0,Rest,0.0,0.0,0.0,0.0,33.0,19.0,13.0
2,7/3/24,75.14,8227.0,58.47,7.53,87.0,873.0,Strength Training,34.34,366.564833,138.63,163.86,43.0,26.0,17.0
3,7/4/24,75.2,13262.0,72.51,4.63,111.0,1113.0,Running,62.74,662.04,152.04,168.87,55.0,33.0,22.0
4,7/5/24,75.27,9377.0,73.32,5.38,79.0,790.0,Strength Training,77.88,677.01,144.06,167.28,39.0,23.0,15.0


As a first step of cleaning up the dataset, check for any missing values.

In [17]:
# Check for missing values in the dataset
missing_values = fitness_data.isnull().sum()
missing_values

date                     0
weight                   6
steps                    6
resting_heart_rate       6
sleep_hours              6
active_minutes           6
total_calories_burned    6
workout_type             6
workout_duration         6
workout_calories         6
workout_avg_hr           6
workout_max_hr           6
fat_burn_minutes         6
cardio_minutes           6
peak_minutes             6
dtype: int64

Some missing values were identified, check where these are in the dataset, to see if there are any patterns, etc.

In [18]:
# Show rows with missing values
rows_with_missing_values = fitness_data[fitness_data.isnull().any(axis=1)]
rows_with_missing_values

Unnamed: 0,date,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_type,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
150,11/28/24,,,,,,,,,,,,,,
151,11/29/24,,,,,,,,,,,,,,
152,11/30/24,,,,,,,,,,,,,,
176,12/24/24,,,,,,,,,,,,,,
177,12/25/24,,,,,,,,,,,,,,
178,12/26/24,,,,,,,,,,,,,,


Drop the rows with missing data, to clean up the dataset.

In [19]:
fitness_data = fitness_data.dropna(axis=0)

To explore the dataset further, and look for any outliers, etc., print out some summary statistics.

In [23]:
# Describe dataset
summary_statistics = fitness_data.describe()
summary_statistics

Unnamed: 0,weight,steps,resting_heart_rate,sleep_hours,active_minutes,total_calories_burned,workout_duration,workout_calories,workout_avg_hr,workout_max_hr,fat_burn_minutes,cardio_minutes,peak_minutes
count,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0
mean,74.838842,8476.927374,64.926538,7.511456,85.128492,853.396648,38.227039,384.143224,129.950615,148.408547,42.284916,25.100559,16.636872
std,1.817232,2350.241475,4.76561,1.061339,22.242287,222.456798,19.778319,202.061761,37.68899,42.916258,11.133124,6.703257,4.442768
min,71.11,3207.0,50.38,3.17,30.0,303.0,0.0,0.0,0.0,0.0,15.0,9.0,6.0
25%,73.13,6823.0,61.755,6.995,71.0,711.5,26.81,262.48,130.79,148.85,35.0,21.0,14.0
50%,75.269587,8524.0,64.71,7.59,84.0,841.0,34.34,350.91,140.42,159.56,42.0,25.0,16.0
75%,76.53,9704.5,67.61,8.055,98.5,988.0,46.85,493.37,146.915,168.08,49.0,29.0,19.0
max,78.17,15724.0,80.99,9.78,157.0,1573.0,94.29,1072.2,160.24,187.55,78.0,47.0,31.0


Check data types of each coloumn.

In [24]:
# Print data types
data_types = fitness_data.dtypes
data_types

date                      object
weight                   float64
steps                    float64
resting_heart_rate       float64
sleep_hours              float64
active_minutes           float64
total_calories_burned    float64
workout_type              object
workout_duration         float64
workout_calories         float64
workout_avg_hr           float64
workout_max_hr           float64
fat_burn_minutes         float64
cardio_minutes           float64
peak_minutes             float64
dtype: object

The date coloumn as an object will prevent datetime manipulations, so convert this coloumn into a datetime type.

In [28]:
# Convert date coloumn to datetime type
fitness_data['date'] = pd.to_datetime(fitness_data['date'], format='%m/%d/%y')
fitness_data.dtypes

date                     datetime64[ns]
weight                          float64
steps                           float64
resting_heart_rate              float64
sleep_hours                     float64
active_minutes                  float64
total_calories_burned           float64
workout_type                     object
workout_duration                float64
workout_calories                float64
workout_avg_hr                  float64
workout_max_hr                  float64
fat_burn_minutes                float64
cardio_minutes                  float64
peak_minutes                    float64
dtype: object