# Bellabeat Project: Smart Device Usage Analysis

## Business Task
Analyze FitBit smart device data to identify usage trends and provide marketing recommedations for Bellabeat's Leaf tracker to enhance its market strategy

## Key Questions
* What are the most common usage patterns vary overtime?
* How do activily and sleep patters vary over time?
* Are there distinct user segements based on activity levels?

## Stakeholders
* Urška Sršen (Chief Creative Officer)
* Sando Mur (Cofounder)
* Bellabeat Marketing Analytics Team

## Explore the Dataset
#### Load and inspect data

In [2]:
import sys
print(sys.executable)

/Users/johnathanduran/Library/jupyterlab-desktop/jlab_server/bin/python


In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
# Load datasets
daily_activity = pd.read_csv("/kaggle/input/fitbit/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
sleep_data = pd.read_csv("/kaggle/input/fitbit/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")

daily_activity.columns = daily_activity.columns.str.strip().str.replace(" ", "_")
sleep_data.columns = sleep_data.columns.str.strip().str.replace(" ", "_")

# Inspect data, first 10 rows of Daily Activity
daily_activity.head(10).style.set_caption("Daily Activity").format({"TotalSteps": "{:,.0f}", "Calories": "{:,.0f}"})

FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/fitbit/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv'

In [None]:
# Inspect data, first 10 rows of Sleep Data
sleep_data.head(10).style.set_caption("Sleep Data").format({"TotalTimeInBed": "{:,.0f} mins", "TotalMinutesAsleep": "{:.0f} mins"})

In [None]:
# Summary statistics 
daily_activity.describe()

In [None]:
sleep_data.describe()

In [None]:
# Correct Data Types
daily_activity['ActivityDate'] = pd.to_datetime(daily_activity['ActivityDate'])

fmt = "%m/%d/%Y %I:%M:%S %p"
sleep_data['SleepDay'] = pd.to_datetime(
    sleep_data['SleepDay'],
    format=fmt,             # tell pandas exactly what to expect
    errors='coerce'         # turn any errors into NaT
)

In [None]:
# Check corrected Data Type
daily_activity.head()

In [None]:
sleep_data.head()

In [None]:
# Check for missing values
print("Daily Activity\n",daily_activity.isnull().sum(),"\n")
print("Sleep Activity\n", sleep_data.isnull().sum(),"\n")

In [None]:
# Check for duplicates
print(daily_activity.duplicated().sum())

In [None]:
# Merge datasets
merged_data = pd.merge(
    daily_activity,
    sleep_data,
    left_on=['Id', 'ActivityDate'],   # keys from daily_activity
    right_on=['Id', 'SleepDay'],      # keys from sleep_data
    how='left'                        # keep every activity record
)

merged_data.head(10)

## Clean Data
#### Prepare the data for analysis by addressing inconsistencies

In [None]:
# Check for invalid values
merged_data.describe()

In [None]:
merged_data.isnull().sum()

In [None]:
# Rows with Sleep data
merged_data['has_sleep_data'] = merged_data['SleepDay'].notna().astype(int)

# Pivot Table
pivot = (merged_data.pivot_table(index='has_sleep_data',
                                 values=['Calories', 'TotalSteps'],
                                 aggfunc='mean').rename(index={0: 'no_sleep_data', 1:'with_sleep_data'}))

pivot

In [None]:
# Paired dataset
paired = pd.merge(daily_activity, sleep_data, left_on=['Id','ActivityDate'],
                                               right_on=['Id','SleepDay'],
                                               how='inner')
paired.shape

#### Summary of Data

In [None]:
print("Summary statistics for steps, calories, and sleep:")
merged_data[['TotalSteps', 'Calories', 'TotalMinutesAsleep']].describe()

In [None]:
print("\nCorrelation between TotalSteps and Calories:")
merged_data[['TotalSteps', 'Calories']].corr()

In [None]:
# Activity Level based on steps
merged_data['ActivityLevel'] = pd.cut(merged_data['TotalSteps'],bins=[0, 5000, 10000, float('inf')],
                                     labels=['Low','Medium','High'])

#View by activity group
merged_data.groupby('ActivityLevel')[['TotalSteps', 'Calories', 'TotalMinutesAsleep']].mean()

In [None]:
# Activity Level without sleep data
merged_data.groupby(['ActivityLevel', 'has_sleep_data'])[['Calories', 'TotalMinutesAsleep']].mean()

Replace NaNs with 0 in sleep columns

In [None]:
# Define the sleep columns to fill
sleep_cols = ['TotalSleepRecords', 'TotalMinutesAsleep', 'TotalTimeInBed']

# Replace NaN with 0 in those columns
merged_data[sleep_cols] = merged_data[sleep_cols].fillna(0)

# Check
merged_data[sleep_cols].isna().sum()  # should all be 0 now

Sleep duration on weekends?

In [None]:
sleep_data = merged_data[merged_data['TotalMinutesAsleep'] > 0].copy()

# Weekday column
sleep_data['DayOfWeek'] = sleep_data['ActivityDate'].dt.day_name()

# Average minutes asleep per day
avg_sleep_by_day = (sleep_data.groupby('DayOfWeek')['TotalMinutesAsleep'].mean()
    .reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']))

# Show in hours
avg_sleep_by_day / 60

# Visualization

In [None]:
# Bar Chart of average steps by day
merged_data['DayOfWeek'] = merged_data['ActivityDate'].dt.day_name()
avg_steps_by_day = merged_data.groupby('DayOfWeek')['TotalSteps'].mean().reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
plt.figure(figsize=(10, 6))
sns.barplot(x=avg_steps_by_day.index, y=avg_steps_by_day.values)
plt.title('Average Steps by Day of Week')
plt.xlabel('Day of Week')
plt.ylabel('Average Steps')
plt.xticks(rotation=45)
plt.show()

In [None]:
# Scatter plot: Steps vs. Calories
plt.figure(figsize=(10, 6))
sns.scatterplot(x='TotalSteps', y='Calories', data=merged_data)
plt.title('Steps vs. Calories Burned')
plt.xlabel('Total Steps')
plt.ylabel('Calories')
plt.show()

In [None]:
# 3. Missing-value heatmap
plt.figure(figsize=(10, 4))
sns.heatmap(merged_data.isnull(), cbar=False)
plt.title("Missing-value pattern in merged_data")
plt.show()

In [None]:
# Calories across groups
sns.set(style="whitegrid")

# Plot
plt.figure(figsize=(8, 5))
sns.barplot(
    data=merged_data,
    x='ActivityLevel',
    y='Calories',
    hue='has_sleep_data',
    palette='Set2'
)

# Labels and title
plt.title("Average Calories by Activity Level and Sleep Data Presence")
plt.xlabel("Activity Level")
plt.ylabel("Average Calories")
plt.legend(title="Sleep Data Recorded", labels=["No", "Yes"])
plt.tight_layout()
plt.show()

Visualize the full distribution of sleep minutes including zeros

In [None]:
# Bar chart with zeros, shows the amount of missing data
plt.figure(figsize=(8,4))
merged_data['TotalMinutesAsleep'].hist(bins=30)
plt.title("Distribution of Total Minutes Asleep (zeros included)")
plt.xlabel("Minutes asleep")
plt.ylabel("Count of days")
plt.show()

In [None]:
# Sleep by days of the week
plt.figure(figsize=(8,4))
plt.bar(avg_sleep_by_day.index, avg_sleep_by_day.values)
plt.ylabel("Average Sleep Duration (hours)")
plt.title("Average Sleep Duration by Day of Week (device-recorded nights)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## Insights

#### Key Findings


* Users are most active on Tuesdays and Saturdays, averaging 8,000 steps
* There’s a moderate to strong correlation (0.59) between steps and calories burned
* Sleep duration averages 6 hours, with more sleep on weekends


## Application to Bellabeat Leaf Tracker

* Promote the activity tracking for weekday fitness challenges
* Highlight sleep tracking to encourage better weekday sleep habits

## Marketing Recommendations
1. Targeted Weekday Campaigns: Launch ads promoting the Leaf tracker for weekday fitness, as users average 8,000 steps on Tuesdays/Saturday
2. Sleep Improvement Features: Add sleep coaching tips to the Bellabeat app, addressing shorter weekday sleep (6.5 hours vs. 7 hours weekdays)
3. Partner with Fitness Apps: Collaborate with apps to integrate Leaf data, leveraging high activity correlations with calories burned.

## Conclusion
This analysis of FitBit data revealed key trends in user activity and sleep, applied to enhance Bellabeat’s Leaf tracker marketing. Weekday activity peaks and sleep patterns informed targeted campaigns and feature enhancements.