# **Bellabeat Case Study**

## 1st Phase: ASK
Case Study: Bellabeat

Business Task
Analyze smart device data and draw useful insights to help Bellabeat improve their marketing strategy.

We attempt to answer the following questions:

Are there any correlations and trends within the selected data?
To what extent does this apply to customers of Bellabeat?
How will these findings help the marketing team improve their strategy?

Data Source
We will be working with the [FitBit Fitness Tracker Data](https://www.kaggle.com/datasets/arashnic/fitbit), which was generated by respondents to a survey distributed via Amazon Mechanical Turk between 03.12.2016 and 05.12.2016. Thirty eligible Fitbit users consented to sharing their personal tracker data and minute-level information for physical activity, heart rate, and sleep monitoring.

## 2nd Phase: PREPARE

2nd Phase: PREPARE

Download Datasets
We download the specific datasets we will be making use of:

    hourly_steps
    daily_activity
    daily_sleep

Reviewing Data with Google Sheets
The datasets downloaded are viewed and cleaned in Google Sheets to carry out a further analysis. This will help us have a better view of the data and be sure of its organization, according to our objectives in this analysis.

Organization of Data
The data is in a long format, where each row represents one time point for every user. Every user gets a unique ID, and the data is tracked for the most part by time and date.

Data ROCCC (relevance, originality, completeness, consistency, and credibility)
This data is only up to 31 days and from only 30 users without demographic information; hence, it is not entirely reliable, nor does it represent the larger population. Besides, time being too small, one month, handicaps us in locating longer-term trends.

## 3rd Phrase: PROCESS

In [None]:
# Loading in the necessary packages
library(tidyverse)
library(lubridate)
library(data.table)
library(scales)
library(janitor)
library(skimr)
library(plotly)

In [None]:
Importing the Selected Datasets
The datasets are imported directly from Google Sheets using the googlesheets4 package in R, which easily integrates with Google Sheets data.

hourly_steps <- read_sheet("Google_Sheet_URL/hourly_steps")
daily_activity <- read_sheet("Google_Sheet_URL/daily_activity")
daily_sleep <- read_sheet("Google_Sheet_URL/daily_sleep")

**Cleaning our data**

In [None]:
# Looking for Duplicates
sum(duplicated(hourly_steps))
sum(duplicated(daily_activity))
sum(duplicated(daily_sleep))

In [None]:
# Looking for Nulls
sum(is.na(hourly_steps))
sum(is.na(daily_activity))
sum(is.na(daily_sleep))

In [None]:
# Removing duplicates and N/As
hourly_steps  <- hourly_steps  %>%
  distinct() %>%
  drop_na()

daily_activity  <- daily_activity  %>%
  distinct() %>%
  drop_na()

daily_sleep <- daily_sleep %>%
  distinct() %>%
  drop_na()

In [None]:
# Verifying duplicate removal
sum(duplicated(daily_sleep))

In [None]:
# Cleaning and Renaming columns to lower case
clean_names(hourly_steps)
hourly_steps <- rename_with(hourly_steps, tolower)
clean_names(daily_activity)
daily_activity <- rename_with(daily_activity, tolower)
clean_names(daily_sleep)
daily_sleep <- rename_with(daily_sleep, tolower)

In [None]:
# Cleaning date-time format for daily_sleep and daily_activity for merging later
daily_activity <- daily_activity %>%
  rename(date = activitydate) %>%
  mutate(date = as_date(date, format = "%m/%d/%Y"))

daily_sleep <- daily_sleep %>%
  rename(date = sleepday) %>%
  mutate(date = as_date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))

In [1]:
# Converting hourly_steps to date-time format
hourly_steps<- hourly_steps %>% 
  rename(date_time = activityhour) %>% 
  mutate(date_time = as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))

head(hourly_steps)

ERROR: Error in hourly_steps %>% rename(date_time = activityhour) %>% mutate(date_time = as.POSIXct(date_time, : could not find function "%>%"


Review and merge the data.

In [None]:
head(daily_activity)
head(daily_sleep)

In [None]:
# Merging the dataset as promised :)
daily_activity_sleep <- merge(daily_activity, daily_sleep, by=c ("id", "date"))
glimpse(daily_activity_sleep)

## 4th Phase: ANALYSIS

In [None]:
Here, we aggregate the data to analyze the correlation to draw meaningful insights.

Creation of Sleep Efficiency Column
We calculate sleep efficiency as a ratio of time spent asleep vs. total time in bed, times 100:

daily_activity_sleep <- daily_activity_sleep %>% mutate(sleep_efficiency = (totalminutesasleep / totaltimeinbed) * 100)

Correlation Matrix
Here, a correlation matrix is drawn to understand the relationship between the variables.

activity_sleep_correlation <- daily_activity_sleep %>% select(totalsteps, calories, totalminutes sleep, sleep_efficiency)
correlation_matrix <- cor(activity_sleep_correlation, use = "complete."
print(correlation_matrix)

From the matrix, we see that calories and total steps are most highly correlated.

Visualizing Key Relationships
We want to gain more insight by visualizing the key relationships among average steps, calories, and sleep efficiency below:

Steps vs. Sleep Efficiency

#Scatterplot Steps vs Efficiency
ggplot(aggregated_daily_activity_sleep, aes(x = avg_steps, y = avg_efficiency) +
  geom_point (color = "black") +
geom_smooth(method = "lm", color = "red") +
labs(title = "Average Steps vs. Average Sleep Efficiency," x = "Average Steps," y = "Average Sleep Efficiency")
       
Calories Vs Sleep Efficency
       
#Scatterplot Calories vs Efficiency
Calories vs. Sleep Efficiency
ggplot(aggregated_daily_activity_sleep, aes(x = avg_calories, y = avg_efficiency) +
  geom_point (color = "black") +
geom_smooth(method = "lm", color = "red") +
labs(title = "Average Calories vs. Average Sleep Efficiency," x = "Average Calories," y = "Average Sleep Efficiency (%)


## 5th Phase: ACT

Here, we will share our findings and act on them based on the analysis done.

Exhibit A
Here is the correlation matrix showing the relationships between chosen variables:

library(ggcorrplot)
ggcorrplot(correlation_matrix, hc.order = TRUE, type = "lower", lab = TRUE)

Exhibit B
Two scatterplots are shown to contrast steps with calories and sleep efficiency, indicating a positive slope of calories burned and sleep efficiency.

Actionable Insights for Bellabeat's Customers

Sleep Efficiency: Users value holistic wellness, and improvement in sleep efficiency can directly reflect on general well-being.
Calories and Steps: These metrics are valued because they have a deep connection with activity and fitness goals in mind; thus, helping users optimize daily routines.
Activity and Sleep: There is a relationship between calories burned and sleep efficiency, which makes active users sleep better and can therefore serve as a powerful insight to be used in the marketing of sleep-related products.

Recommendations for Bellabeat

Position Sleep Efficiency as a Driver for Wellness
Emphasize in marketing campaigns how sleep efficiency is a differentiated feature that allows users to improve their quality of sleep and therefore their general well-being.

Activity-Sleep Challenges
Engage in targeted activity-sleep challenge campaigns. These may take the form of challenging users to reach step goals with the aim of improving sleep efficiency. Users are rewarded through insights or incentives.

Customer Education Campaigns
Create content that will empower the user in developing a deeper understanding of how sleep efficiency and activity level are related to one another, thus enabling them to optimize both for better health outcomes.