# 1. Summary

Bellabeat is a company that produces health-focused products for women with a goal of growing in the smart device market. Bellabeat creates wearable technology for women that monitors health data such as activity, sleep, stress, menstrual cycle, and mindfulness habits. This technology connects to an app that allows users to track and view data on their wellness activity to understand their habits and make informed decisions relate to their health.

The goal of this case study is to gain insights on smart device use, gained from FitBit Fitness Tracker Data posted on Kaggle by MÖBIUS:

<https://www.kaggle.com/datasets/arashnic/fitbit?sort=recent-comments&page=2>

and apply these insights to Bellabeat's own customers that can be applied to Bellabeat's marketing strategy.

# 2. Ask Phase

For this case study, I will be considering Bellabeat's Leaf, a wellness tracker that can be worn as a bracelet, necklace, or clip and connects to the company's app to track activity, sleep, and stress. I will utilize the smart fitness data gained from FitBit users to gain insights and provide recommendations for Bellabeat's marketing strategy based around the following questions:

1.  What are some trends in smart device usage?

2.  How could these trends apply to Bellabeat customers?

3.  How could these trends help influence Bellabeat marketing strategy?

# 3. Prepare Phase

### 3.1 Data Background

The data for this case study was generated from a survey via Amazon Mechanical Turk between the dates of 3/12/2016 and 5/12/2016. It contains records from 30 FitBit users who consented to submitting their personal tracker data. The dataset posted on Kaggle by MÖBIUS contained the following CSV files:

| Table Name                    | Description                                                                                                                                                   |
|------------------|------------------------------------------------------|
| daiyActivity_merged           | Daily activity from 33 users over a period of 31 days, which includes daily steps, distance, intensities, and calories                                        |
| dailyCalories_merged          | The daily calories burned by each of the users over 31 days                                                                                                   |
| dailyIntensities_merged       | Daily intensities of each user measured in both minutes and distance and divided into categories of sedentary, lightly active, fairly active, and very active |
| dailySteps_merged             | The steps taken each day by each user                                                                                                                         |
| Heart rate_seconds_merged     | Heart rate logs by 7 of the users                                                                                                                             |
| hourlyCalories_merged         | Calories burned by hour of each of the users                                                                                                                  |
| hourlyIntensities_merged      | Hourly total and intensities averaged by each user                                                                                                            |
| hourlySteps_merged            | The steps taken by each user measured hourly                                                                                                                  |
| minuteCaloriesNarrow_merged   | Calories burned by each user measured by minutes and in a narrow format                                                                                       |
| minuteCaloriesWide_merged     | Calories burned by each user measured by minutes and in a wide format                                                                                         |
| minuteIntenstiesNarrow_merged | Intensities of each user by the minute in a narrow format                                                                                                     |
| minuteIntenstiesWide_merged   | Intensities of each user by the minute in a wide format                                                                                                       |
| minuteMETsNarrow_merged       | A ratio of the energy spent during physical activity compared to energy spent at a resting phase measured by minutes                                          |
| minuteSleep_merged            | Sleep time in minutes of 24 of the users                                                                                                                      |
| MinuteStepsNarrow_merged      | Steps taken by each user measured by minutes and in a narrow format                                                                                           |
| MinuteStepsWide_merged        | Steps taken by each user measured by minutes and in a wide format                                                                                             |
| sleepDay_merged               | Tracked sleep every day of each of the user measured by total sleeps per day, minutes asleep, and total time in bed                                           |
| weightLogInfo_merged          | Weight of 8 users track by day in Kg and pounds                                                                                                               |

For my analysis, I decided to take a dive into the daily data to see how users of wearable health technology are using their devices each day. I narrowed down my analysis to focus on 5 of the tables which included:

-   dailyActivity_merged

-   dailyCalories_merged

-   dailyIntensities_merged

-   dailySteps_merged

-   sleepDay_merged

Out of these tables, I removed dailySteps_merged, dailyIntensities_merged and dailyCalories_merged as the data from both could be found in the dailyActivity_merged file.

### 3.2 Data Limitations:

-   Small Sample Size

    -   We cannot be sure that this dataset is representative of the population as a whole due to its small sample size of only 33 users. Because of this, it is important to remember that the results may be biased due to small sample size and short time frame of data collection.

-   Missing Demographics

    -   The sample data did not include information on demographics such as age, gender, and location. Since Bellabeat's target audience is largely centered upon women, data with gender information for participants could potentially lead to more reliable results in this situation to see if there are differences between men and women in smart device use.

-   Missing Data

    -   There are 4 rows in the data that record 0 activity for that day and specific individual, showing that the tracker may not have been worn or maybe wasn't working properly. I chose to delete these records in excel before analyzing the datasets in RStudio as the goal of the business task is to analyze use of the smart device and these are days where the device was not used.

# 4. Process Phase

I chose R to use as my primary tool for this project due to the large size of the dataset and the wide variety of tools available in R to process, Analyze, and Visualize the data. However, before importing the datasets, I used Excel to:

-   Verify the number of of participants (33) and time frame of recorded data (33 days)

-   Check for missing entries and remove rows with 0 recorded data

-   Change the names of the names of both datasets to dailyActivity and dailySleep by dropping the \_merged from both files for my own ease down the line

## 4.1 Installing packages and opening libraries

In [None]:
library(tidyverse)
library(lubridate)
library(ggrepel)
library(janitor)

## 4.2 Importing Datasets

I would like to focus on daily trends for my case study. To do this, I have narrowed down the datasets to only include:

-   dailyActivity

-   dailySleep

dailySteps_merged, dailyCalories_merged, and dailyIntensities_merged can be left out for this analysis as the dailyActivity dataset includes the data from all 3 of these tables.

In [None]:
dailySleep <- read_csv(file = "../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
dailyActivity <- read_csv(file = "../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv") 


In [None]:
str(dailyActivity)

str(dailySleep) 

In [None]:
head(dailyActivity)
head(dailySleep)

## 4.3 Ensure there are no N/A entries

In [None]:
colSums(is.na(dailyActivity))
colSums(is.na(dailySleep))

## 4.4 Look for and remove duplicated entries

In [None]:
sum(duplicated(dailyActivity))
sum(duplicated(dailySleep))

### 4.4.1

The dailySleep dataset has 3 duplicates that can be dropped.

In [None]:
dailySleep <- dailySleep %>% 
  distinct()

Check for duplicates again to ensure they were dropped.

In [None]:
sum(duplicated(dailySleep))

### 4.4.2 Clean columns

Use the clean_names function to make sure they can be merged easily.

In [None]:
clean_names(dailyActivity)
dailyActivity <- rename_with(dailyActivity, tolower)


clean_names(dailySleep)
dailySleep <- rename_with(dailySleep, tolower)

In [None]:
head(dailyActivity)
head(dailySleep)

In [None]:
dailyActivity <- dailyActivity %>%
  rename(date = activitydate) %>%
  mutate(date = as_date(date, format = "%m/%d/%Y"))

In [None]:
dailySleep <- dailySleep %>%
  rename(date = sleepday) %>%
  mutate(date = as_date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))

In [None]:
head(dailyActivity)

In [None]:
head(dailySleep)

### 4.5 Merge datasets

Merge dailyActivity, and dailySleep into one dataset called dailyMetrics.

In [None]:
dailyMetrics <- merge(dailyActivity, dailySleep, by=c("id","date"))
tibble(dailyMetrics)

In [None]:
head(dailyMetrics)

# 5. Analyze + Share Phases

Find the average steps, calories, and minutes asleep per participant.

In [None]:
dailyAverage <- dailyMetrics %>% 
  group_by(id) %>% 
  summarise (avg_daily_steps = mean(totalsteps), avg_calories = mean(calories), avg_sleep = mean(totalminutesasleep))

head(dailyAverage)

## 5.1

Classify users based on their average daily steps. Using an article posted in the National Library of Medicine by Catrine Tudor-Locke and David R Bassett Jr, and linked here: <https://pubmed.ncbi.nlm.nih.gov/14715035/>

we can classify activity level in the following way based on daily steps:

-   **Sedentary**: \<5000 steps

-   **Low Active**: 5000-7499

-   **Somewhat Active**: 7500 - 9999

-   **Active**: 10000 - 14599

-   **Highly Active**: \>15000

Using this information, we can classify the users from our data

In [None]:
userMetrics <- dailyAverage %>% 
  mutate(user_type = case_when(
    avg_daily_steps < 5000 ~ "sedentary",
    avg_daily_steps >=5000 & avg_daily_steps<7500 ~ "lightly active",
    avg_daily_steps >=7500 & avg_daily_steps<10000 ~ "fairly active",
    avg_daily_steps >=10000 & avg_daily_steps<15000 ~ "active",
    avg_daily_steps >=15000 ~ "very active"
  ))

In [None]:
head(userMetrics)

Create a visualization to see the distribution of types of users in order of activity level.

In [None]:
userMetrics$user_type <- ordered(userMetrics$user_type, levels=c("sedentary","lightly active", "fairly active", "active", "very active"))

ggplot(data = userMetrics) + 
  geom_bar(mapping = aes(x = user_type, fill = user_type))+
  labs(x="User Types", y="Count", title="Distribution of User Types in Study")

-   most users fall under the "Fairly Active" classification (7500-9999 steps per day)

## 5.3

Look at how many minutes each user averages in each intensity and see what the average in each intensity is across users.

In [None]:
avgIntensities <- dailyActivity %>% 
  group_by(id) %>% 
  summarise("avg sedentary min"=mean(sedentaryminutes),
             "avg lightly active min"=mean(lightlyactiveminutes),
             "avg fairly active min"=mean(fairlyactiveminutes),
             "avg very active min" =mean(veryactiveminutes))

head(avgIntensities)

In [None]:
intensitiesAvg <- avgIntensities %>% summarise_if(is.numeric, mean) %>% 
  mutate(id = "all users")

head(intensitiesAvg)

By averaging the minutes spent in each category of intensity across users, we can see that on average, the participants in this study spent most of their time (997.85 minute/day) in the sedentary intensity category and the least amount of time (13.33 minutes/day) in the fairly active intensity category.

## 5.4

See if any variables correlate with calories burned

In [None]:
ggplot(data =userMetrics)+ 
  geom_col(mapping = aes(user_type, y=avg_calories, fill=user_type))+
  labs(title = "Average Calories Burned Daily by User Type", y= "Average Calories", x="User Type")+
  theme(axis.text.x = element_text(angle = 40))

In [None]:
install.packages("ggpubr")
library(ggpubr)

In [None]:
ggarrange(
  ggplot(dailyActivity, aes(x=totaldistance, y=calories))+
    geom_jitter()+
    geom_smooth(color="purple")+
    labs(title="Daily Distance vs Calories", x="Distance", y="calories"),
ggplot(dailyActivity, aes(x = totalsteps, y=calories))+
  geom_jitter()+
  geom_smooth(color="purple")+
  labs(title="Daily Steps vs Calories", x="Steps", y="calories")
)

## 5.5 Sleep patterns

I am interested in looking at how much sleep participants are getting and sleep quality by comparing their time asleep vs total time in bed . A study posted by the National Library of Medicine by David L. Reed, and William P. Sacco studied sleep efficiency and found that an efficient night of sleep included 85% or more of the time spent in bed to be sleeping. Using this, we can update our dailySleep table and label our participants by:

-   Low Quality Sleepers: \<85% of time in bed spent asleep

-   High Quality Sleepers: \>=85% of time in bed spent asleep

<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4751425/>

We can also see how much sleep the users get per day of the week and compare that to the recommendation of 8 hours of sleep a night.

In [None]:
weekdaySleep <- dailySleep %>% 
  mutate(weekday = weekdays(date))

weekdaySleep$weekday <- ordered(weekdaySleep$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

weekdaySleep <- weekdaySleep %>% 
  group_by(weekday) %>% 
  summarize(avg_minutes_asleep = mean(totalminutesasleep))

head(weekdaySleep)

In [None]:
ggplot(weekdaySleep, aes(weekday, avg_minutes_asleep))+
  geom_col(fill="#0071bc")+
   geom_hline(yintercept=480)+
  annotate("text", x="Friday", y=465, label = "480 Minutes = Recommended Sleep")+
  labs(title = "Minutes Spent Sleeping by Day of Week", x="", y="", subtitle="")+
  theme(axis.text.x = element_text(vjust = 0.5))

In [None]:
sleepQuality <- dailySleep %>% 
  group_by(id) %>% 
  summarize(avg_time_asleep = mean(totalminutesasleep), 
            avg_time_in_bed = mean(totaltimeinbed)) %>% 
  mutate(percentage_asleep = (avg_time_asleep/avg_time_in_bed)*100)

In [None]:
sleepQuality <- sleepQuality %>% 
  mutate(sleep_quality = case_when(
    percentage_asleep >= 85 ~ "High Quality", 
    percentage_asleep < 85 ~ "Low Quality"
  ))

head(sleepQuality)

In [None]:
ggplot(data = sleepQuality)+ 
  geom_bar(mapping  = aes(x=sleep_quality, fill=sleep_quality))+
  labs(title = "Participants with High Sleep Quality vs Low Sleep Quality")

Add Sleep Quality onto the userMetrics table and see if there is a correlation between average sleep quality and user type

In [None]:
userMetrics <- merge(userMetrics, sleepQuality, by=c("id"))
select (userMetrics,-c(avg_time_asleep, avg_time_in_bed, percentage_asleep))

head(userMetrics)

In [None]:
ggplot(data= userMetrics)+
  geom_point(size=3, mapping=aes(x=avg_sleep, y=avg_daily_steps, color = sleep_quality, shape=user_type))+
  geom_vline(xintercept=480)+
  annotate("text", x=550, y=15000, label = "480 Minutes = ")+
  annotate("text", x=580, y=14000, label = "Recommended Sleep")+
  labs(x="Average Minutes Asleep", y= "Average Daily Steps", title = "Average Time Spent Sleeping vs Average Daily Steps")

## 5.6 Analysis Summary and Share Phase

Through this analysis, users were classified into user type by activity level and sleep quality. Some key points:

-   The majority of participants fall into the "Fairly Active Lifestyle"

-   Participants that fell into the "Fairly Active" user type, also had a higher average of calories burned daily, over twice the calories burned as the "Sedentary" and "Lightly Active" categories.

-   On average, the users in this study spent the majority of their day in the "sedentary intensity"

-   Daily steps and daily distance were both positively correlated with daily burned calories

-   On average, the participants in this study did not meet the recommended 8 hours of sleep a night, for any day of the week but most of the participants did fall into the high quality sleeper category, meaning they spend 85% or more of their time in bed asleep

-   There did appear to be some clustering of "Lightly Active", "Fairly Active", and "Active" users that got the recommended 480 minutes, or 8 hours, of sleep a night, however, I do not believe the dataset was large enough or that there was enough of a trend in this specific analysis to draw a correlation between sleep and user type.

# 6 Act Phase 

Considering the small sample size and time frame from this dataset, I would recommend further analysis from Bellabeat's own tracking data to eliminate potential bias and increase the reliability of the results. However, based on this dataset alone and the trends found, I would make the following recommendations to Bellabeat:

1.  Most users fall into the "Fairly Active" lifestyle, considering this, the use of smart devices to measure health data may have more success being marketed as promoting a healthy and moderately active lifestyle as a whole as opposed to being marketed as a sports tracker for above average activity levels.

2.  As sleep is very important in maintaining a healthy lifestyle and seeing that on average, users in this study did not meet the recommended nightly sleep, Bellabeat may have success in marketing products that could promote healthy sleeping patterns such as:

    -   Wearable devices that are comfortable enough to be worn at night so sleep can regularly be tracked, yet aesthetically pleasing enough for consumers to wear during daily activities.

    -   Devices with long lasting batteries that also charge quickly so consumers can spend more time wearing the product and measuring health data as opposed to having to charge the product more.

3.  Since steps taken had a positive correlation with calories burned and users spent most of their time in the "Sedentary" intensity, Bellabeat could market wearable products with reminders to move and be active as the smart device detects users being in the "Sedentary" intensity after a certain period of time.