<a href="https://colab.research.google.com/github/mannenamratha/Apple-Fitness-Watch/blob/main/Apple_Fitness.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fitness Watch Data Analysis using Python**



Fitness Watch Data Analysis involves analyzing the data collected by fitness wearables or smartwatches to gain insights into users’ health and activity patterns. These devices track metrics like steps taken, energy burned, walking speed, and more.

# **Process to follow**


Fitness Watch Data Analysis is a crucial tool for businesses in the health and wellness domain. By analyzing user data from fitness wearables, companies can understand user behaviour, offer personalized solutions, and contribute to improving users’ overall health and well-being.




Below is the process to follow while working on the problem of Fitness Watch Data Analysis:



-Collect data from fitness watches, ensuring it’s accurate and reliable.

-Perform EDA to gain initial insights into the data.
-Create new features from the raw data that might provide more meaningful insights.
-Create visual representations of the data to communicate insights effectively.
-Segment user’s activity based on time intervals or the level of fitness metrics and analyze their performance.

In [None]:
# Import the necessary libraries

import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"
import plotly.express as px


In [None]:
import warnings
warnings.filterwarnings('ignore')
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
file_path = '/content/drive/My Drive/Datasets/Apple-Fitness-Data.csv'
data = pd.read_csv(file_path)

data.head()

Unnamed: 0,Date,Time,Step Count,Distance,Energy Burned,Flights Climbed,Walking Double Support Percentage,Walking Speed
0,2023-03-21,16:01:23,46,0.02543,14.62,3,0.304,3.06
1,2023-03-21,16:18:37,645,0.40041,14.722,3,0.309,3.852
2,2023-03-21,16:31:38,14,0.00996,14.603,4,0.278,3.996
3,2023-03-21,16:45:37,13,0.00901,14.811,3,0.278,5.04
4,2023-03-21,17:10:30,17,0.00904,15.153,3,0.281,5.184


In [None]:
data.head()

Unnamed: 0,Date,Time,Step Count,Distance,Energy Burned,Flights Climbed,Walking Double Support Percentage,Walking Speed
0,2023-03-21,16:01:23,46,0.02543,14.62,3,0.304,3.06
1,2023-03-21,16:18:37,645,0.40041,14.722,3,0.309,3.852
2,2023-03-21,16:31:38,14,0.00996,14.603,4,0.278,3.996
3,2023-03-21,16:45:37,13,0.00901,14.811,3,0.278,5.04
4,2023-03-21,17:10:30,17,0.00904,15.153,3,0.281,5.184


In [None]:
data.isnull().sum()

Unnamed: 0,0
Date,0
Time,0
Step Count,0
Distance,0
Energy Burned,0
Flights Climbed,0
Walking Double Support Percentage,0
Walking Speed,0


***As we can see the dataset has no null values***

In [None]:
# Analyzing the step count over time

fig1 = px.line(data, x="Time", y="Step Count", title="Step Count over time")
fig1.show()

In [None]:
# Analyzing the distance covered over time

fig2 = px.line(data, x="Time", y="Distance", title="Distance Covered over time")
fig2.show()

In [None]:
# Analyzing energy burned over time

fig3 = px.line(data, x="Time", y="Energy Burned", title="Energy Burned over time")
fig3.show()

In [None]:
# Analyzing walking speed over time

fig4 = px.line(data, x="Time", y="Walking Speed", title="Walking Speed over time")
fig4.show()

In [None]:
# Calculating average step count per day

average_step_count_per_day = data.groupby("Date")["Step Count"].mean().reset_index()

fig5 = px.bar(average_step_count_per_day, x="Date", y="Step Count", title="Average Step Count per Day")
fig5.update_xaxes(type='category')
fig5.show()

In [None]:
# Calculating Walking Efficiency

data["Walking Efficiency"] = data["Distance"] / data["Step Count"]

fig6 = px.line(data, x="Time", y="Walking Efficiency", title="Walking Efficiency over time")
fig6.show()

***Now, let’s have a look at the step count and walking speed variations by time intervals:***

In [None]:
# Create Time Intervals

time_intervals = pd.cut(pd.to_datetime(data["Time"]).dt.hour, bins=[0,12,18,24], labels=["Morning", "Afternoon", "Evening"], right=False)
data["Time Interval"] = time_intervals

In [None]:
# Variations ins Step Count & Walking Speed by Time Interval

fig7 = px.scatter(data, x="Step Count", y="Walking Speed", color="Time Interval", title="Step Count and Walking Speed Variations by Time Interval", trendline='ols')
fig7.show()

***Now, let’s compare the daily average of all the health and fitness metrics:***

In [None]:
daily_avg_metrics = data.groupby("Date").agg({
    "Step Count": "mean",
    "Distance": "mean",
    "Energy Burned": "mean",
    "Flights Climbed": "mean",
    "Walking Double Support Percentage": "mean",
    "Walking Speed": "mean"
}).reset_index()

daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=["Date"], value_vars=["Step Count", "Distance", "Energy Burned", "Flights Climbed", "Walking Double Support Percentage", "Walking Speed"])

In [None]:
# Treemap of Daily Averages for Different Metrics Over Several Weeks
fig = px.treemap(daily_avg_metrics_melted, path=["variable"], values="value", color="variable", hover_data=["value"], title="Daily Averages for Different Mterics")
fig.show()

The above graph represents each health and fitness metric as a rectangular tile. The size of each tile corresponds to the value of the metric and the colour of the tiles represents the metric itself. Hover data displays the exact average value for each metric when interacting with the visualization.

The Step Count metric dominates the visualization due to its generally higher numerical values compared to other metrics, making it difficult to visualize variations in the other metrics effectively.

# ***As the value of step count is higher than the value of all other metrics, let’s have a look at this visualization again without step counts.***

In [None]:
# Select metrics excluding Step Count

metrics_to_visualize = ["Distance", "Energy Burned", "Flights Climbed", "Walking Double Support Percentage", "Walking Speed"]
metrics_to_visualize

['Distance',
 'Energy Burned',
 'Flights Climbed',
 'Walking Double Support Percentage',
 'Walking Speed']

In [None]:
# Re-shape data for treemap

daily_avg_metrics_melted = daily_avg_metrics.melt(id_vars=["Date"], value_vars= metrics_to_visualize)
daily_avg_metrics_melted

Unnamed: 0,Date,variable,value
0,2023-03-21,Distance,0.086225
1,2023-03-22,Distance,0.230261
2,2023-03-23,Distance,0.075796
3,2023-03-24,Distance,0.042067
4,2023-03-25,Distance,0.080747
5,2023-03-26,Distance,0.06876
6,2023-03-27,Distance,0.032664
7,2023-03-28,Distance,0.102727
8,2023-03-29,Distance,0.115884
9,2023-03-30,Distance,0.252494


In [None]:
# Treemap of Daily Averages for Different Metrics Over Several Weeks excluding Step Count
fig = px.treemap(daily_avg_metrics_melted, path=["variable"], values="value", color="variable", hover_data=["value"], title="Daily Averages for Different Metrics (Excluding Step Count)")
fig.show()

### **So, this is how we can analyze and work with fitness data using Python**