In [None]:
# <span style="color:#73378C"> Bellabeat Case Study in R </span>

### <span style="color:#73378C"> By Ifejianyi Grace Amarachi </span>

   
 Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. 
    This case study will focus on one ofBellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices.




# <span style="color:#73378C"> Table of Contents </span>
* [1. Summary](#1)
* [2. Ask Phase](#2)
    - [2.1 Business Task](#3)  
* [4. Process Phase](#4)
    - [4.1 Setting up my environment](#4_1)
    - [4.2 Importing dataframes](#4_3)
    - [4.4 Cleaning and formatting](#4_4)
    - [4.5 Merging datasets](#4_5) 
* [5. Analyze and Share Phase](#5)
    - [5.1 Category of users](#5_1)
    - [5.2 Daily use of smart device](#5_2)
    - [5.3 Average Steps](#5_3)
    - [5.4 Correlation](#5_4)  
* [6. Recommendations (Act Phase)](#6)

<a id="1"></a>
## 1. Summary
Bellabeat is the go-to wellness brand for women with an ecosystem of products and services focused on women’s health. They develop smart devices  that monitor biometric and lifestyle data such as daily activity, steps, calories, sleep, stress, and reproductive health to help women better understand how their bodies work and make healthier choices.
​
The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. 
​

<a id="2"></a>
## 2. Ask Phase
### 2.1. Business Task
The objective of this analysis is to focus on one of
Bellabeat’s products and analyze how Non-Bellabeat consumers are using the smart devices. 
​
With the information we are to provide high-level recommendations for how these ights inform Bellabeat marketing strategy.
​
<a id="3"></a>
## 3. Prepare phase
<a id="3_1"></a>
### 3.1. Data Used
The data used for the case study is [Fitbit Fitness Tracker Data](https://www.kaggle.com/arashnic/fitbit) (CC0: Public Domain). It was made available through [Mobius](https://www.kaggle.com/arashnic).
<a id="3_2"></a>​
### 3.2.Data Privacy and Assecibility
The metadata of the dataset used is confirmed open source.The owner has dedicated the work to Public Domain, there by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights. The datesets can copied, modified, distributed even for commercial purposes, all without asking permission.
<a id="3_3"></a>
### 3.3. About the Data
This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences
<a id="3_4"></a>
### 3.4. Data Organization and Verification
The dataset has 18 CSV documents. Each document represents different quantitative data tracked by Fitbit. 15 in long format, 3 in wide format. The dataset consists of wide-ranging information from activity metrics, calories, sleep records, steps, weight.
​
Several data frames are subsets of larger, more complete data frames, therefore they will not be used.
​<a id="3_5"></a>
### 3.5. Data Credibility and Integrity
   1. The dataset was generated in 2016 thus not current.
 
   2. Small sample size 
 
   3. Lack of metadata  and demographic information  of users age, location,  gender there is high chance that the data isn`t representative of the population as a whole.
   4. The dataset also only contains data within one month of activity, this period is short and might not be enough to show users activity patterns that could be affected by seasons and weather.
​
<a id="4"></a>
## 4. Process Phase
 For easy documentation and reproducibility, Data cleaning,analysis and visualizationn will be done in R programming with RStudio.
 <a id="4_1"></a>
### 4.1.  Setting up my environment
These packages will help in the analysis process in R: `tidyverse`, `ggplot2`, `skimr`, `dplyr`, `lubridate`, `janitor`, `ggpubr`.
`

In [2]:
library(tidyverse)
library(ggplot2)
library(lubridate)
library(skimr)
library(readr)
library(janitor)
library(dplyr)
library(RColorBrewer)
library(ggpubr)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.3.6      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.2      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Attaching package: ‘lubridate’


The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union



Attaching package: ‘janitor’


The following objects are masked from ‘package:stats’:

    chisq.test, fisher.test




<a id="4_2"></a>
### 4.2. Importing dataframes
​
​
We will be working with these dataframes;
​
* `dailyActivity_merged`:Daily Activity over 31 days of 33 IDs. Tracking daily: Steps, Distance, Intensities, Calories
​
* `hourlyIntensities_merged`: Hourly total and average intensity over 31 days of 33 IDs
​
* `hourlySteps_merged`:     Hourly Steps over 31 days of 33 IDs
​
* `sleepDay_merged`: 23 IDs, Daily sleep logs, tracked by: Total count of sleeps a day, Total minutes, Total Time in Bed
​
* `hourlyCalories_merged`: Hourly Calories burned over 31 days of 33 IDs
these tables are all Microsoft Excel CSV files

In [3]:
daily_activity <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
hourly_calories <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
hourly_intensities <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
hourly_steps <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
sleep_day <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")


[1mRows: [22m[34m940[39m [1mColumns: [22m[34m15[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (1): ActivityDate
[32mdbl[39m (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m22099[39m [1mColumns: [22m[34m3[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): ActivityHour
[32mdbl[39m (2): Id, Calories

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
[1mRows: [22m[34m22099[39m [1mColumns: [22m[34m4[39m
[36m──[39m 

<a id="4_3"></a>
### 4.3. Preview Dataframes
Lets have a glimpse of the data we will be working with

In [None]:
head(daily_activity,5)
head(hourly_calories,5)
head(hourly_intensities,5)
head(hourly_steps,5)
head(sleep_day,5)


<a id="4_4"></a>
### 4.4. Cleaning and Formatting
I observed some problems with the time stamp data, inconsistent name of colomns, duplicates and N/A. So before analysis,all of these need to be cleaned and formatted
<a id="4_4_1"></a>
#### 4.4.1. Clean Colomn names

In [None]:
hourly_calories <- clean_names(hourly_calories)
hourly_intensities <- clean_names(hourly_intensities)
daily_activity <- clean_names(daily_activity)
daily_sleep <-clean_names(sleep_day)
hourly_steps <- clean_names(hourly_steps)

#crosscheck colomn names

colnames(daily_activity)
colnames(hourly_calories)
colnames(hourly_intensities)
colnames(hourly_steps)
colnames(daily_sleep)

<a id="4_4_2"></a>
#### 4.4.2 Verify the number of participants

In [None]:
sum(duplicated(hourly_calories))
sum(duplicated(hourly_steps))
sum(duplicated(hourly_intensities))
sum(duplicated(daily_sleep))
sum(duplicated(daily_activity))

<a id="4_4_3"></a>
#### 4.4.3. Remove duplcates and N/A



In [None]:
# remove duplicate and null values 
hourly_calories <- hourly_calories %>%
  distinct() %>%
  drop_na()

hourly_steps <- hourly_steps %>%
  distinct() %>%
  drop_na()

hourly_intensities <- hourly_intensities %>%
  distinct() %>%
  drop_na()

daily_activity <- daily_activity %>%
  distinct() %>%
  drop_na()

daily_sleep <- daily_sleep %>%
  distinct() %>%
  drop_na()

<a id="4_4_5"></a>
#### 4.4.4 Verify duplicates and N/A


In [None]:

sum(duplicated(hourly_calories))
sum(duplicated(hourly_steps))
sum(duplicated(hourly_intensities))
sum(duplicated(daily_sleep))
sum(duplicated(daily_activity))




In [None]:
sum(is.na(hourly_calories))
sum(is.na(hourly_steps))
sum(is.na(hourly_intensities))
sum(is.na(daily_sleep))
sum(is.na(daily_activity))

<a id="4_4_5"></a>
#### 4.4.5. Formatting Date and Time columns
Earlier we obverved that `sleep_date` is in a character format `<chr>`,we need change it into `POSIXct` formats. This is important for an hour by hour analysis of our data. We also have to rename the the colomns for easy refers.


In [None]:

# formatting date and time
#rename columns

hourly_calories <- hourly_calories %>%
  rename(date_time = activity_hour)

hourly_steps <- hourly_steps %>%
  rename(date_time = activity_hour)

hourly_intensities <- hourly_intensities %>%
  rename(date_time = activity_hour)

daily_activity <- daily_activity %>%
  rename(date = activity_date) 

daily_sleep <- daily_sleep %>% 
  rename(date = sleep_day)



In [None]:
#formatting the date time column
hourly_calories<- hourly_calories%>%
  mutate(date_time=as.POSIXct(hourly_calories$date_time, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone()))

hourly_intensities<- hourly_intensities%>%
  mutate(date_time=as.POSIXct(hourly_calories$date_time, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone()))

hourly_steps<- hourly_steps%>%
  mutate(date_time=as.POSIXct(hourly_calories$date_time, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone()))

daily_sleep<-daily_sleep%>%
  mutate(date=as.POSIXct(daily_sleep$date, format = "%m/%d/%Y %I:%M:%S %p",tz = Sys.timezone()))

daily_activity<- daily_activity%>%
  mutate(date=as.POSIXct(daily_activity$date, format = "%m/%d/%Y"))



In [None]:
#verification
head(daily_activity,5)
head(hourly_calories,5)
head(hourly_intensities)
head(hourly_steps,5)
head(daily_sleep,5)

In [None]:

hourly_calories = subset(hourly_calories, select = c(1:3))
hourly_steps = subset(hourly_steps, select = c(1:3))
hourly_intensities = subset(hourly_intensities, select = c(1:4))


<a id="4_5"></a>
### 4.5. Merging Data


In [None]:
# merging the daily dataframes (daily activity and daily sleep) using "id"
daily_activity_sleep <- inner_join(daily_activity, daily_sleep, by= "id")

head(daily_activity_sleep)
n_distinct(daily_activity_sleep$id)

<a id="5"></a>
## 5. Analyse 
The data has stored appropriately and has been prepared for analysis, let's analyse trends
<a id="5_1"></a>
### 5.1. Category of users

Classification of users based on daily average steps: users were  categorized according to [National center for biotechnology information](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5488109/) into five categories;

* Steps per day	Classification
* <5000	Sedentary lifestyle
* 5000–7499	Physically inactive
* 7500–9999	Moderately active
* ≥10,000	Physically active
* ≥12,500	Very active


In [None]:
#classify our users by the daily average steps

user_type_by_steps <- daily_activity_sleep   %>%  
  group_by(id) %>% 
  summarise(average_steps = mean(total_steps)) %>% 
  mutate(user_type = case_when(average_steps >= 12500 ~ "Highly Active",
                              average_steps >= 10000 ~ "Active",
                              average_steps >= 7500 ~ "Fairly Active" ,
                              average_steps >= 5000  ~ "Low Active", 
                              average_steps < 5000 ~ "Sedentary"),
        user_type = factor(user_type, levels = c("Sedentary","Low Active", "Fairly Active", "Active", "Highly Active")))

head(user_type_by_steps)

To visualize on a pie chart for user type we create a dataframe with the percentage of each user type


In [None]:
user_type_by_steps_percent<- user_type_by_steps%>%
   group_by(user_type) %>% summarise(users =n())%>%
   mutate(total_users = sum(users))%>%  group_by(user_type)%>% 
   summarise(percent =(users/total_users))%>%
   mutate(labels = scales::percent(percent))

head(user_type_by_steps_percent)



Visualize on a pie chart


In [None]:
#plot for the steps percent
user_type_by_steps_percent%>%
ggplot(aes(x="", y = percent, fill = user_type)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  scale_fill_brewer()+
  theme(  axis.ticks = element_blank(), 
          panel.grid = element_blank(),
          axis.text = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.border = element_blank(),
          plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  geom_text(aes(label = labels, x= 1.2),
                position = position_stack(vjust = 0.6),
                color = "black") +
  labs(title = " User type by Steps") +
    guides(fill = guide_legend(title = "User Type"))
  options(repr.plot.width = 1, repr.plot.height = 1)

<a id="5_2"></a>
### 5.2. Daily use of smart device;

Usage type distribution; how the smart device was used by the participants


In [None]:

  min(daily_activity$total_steps)
  max(daily_activity$total_steps)
  mean(daily_activity$total_steps)


 
The average total steps is 7637.91,lets assume that participants did not make use of their smart watches on days with the total steps <200 so they are filtered out.
We assign the following usage types;
​
  * Low use - 1- 10 days
  * moderate use - 11-20 days
  * high use  21-31 days
   
   

In [None]:
#Create a dataframe showing the categories

 daily_usage_group <- daily_activity %>%
    filter(total_steps >200 ) %>% 
    group_by(id) %>%
    summarize(date=sum(n())) %>%
    mutate(usage = case_when(
      date >= 1 & date <= 10 ~ "Low Use",
      date >= 11 & date <= 20 ~ "Moderate Use", 
      date >= 21 & date <= 31 ~ "High Use")) %>% 
    mutate(usage = factor(usage, level = c('High Use','Moderate Use', 'Low Use'))) %>%
    rename(daysused = date) %>% 
    group_by(usage)
 
  head(daily_usage_group)

In [None]:
#To visualize on a pie chart for user type, we create a dataframe 

daily_usage_group_pc<- daily_usage_group%>%
     group_by(usage) %>%
     summarise(participants =n())%>%
     mutate(total_participants=sum(participants))%>%
    group_by(usage)%>%
     summarise(percent= participants/total_participants)%>% 
     arrange(percent)%>% 
     mutate(labels = scales::percent(percent))
  
head(daily_usage_group_pc)

In [None]:
#Visualizing on a pie chart

  ggplot(data= daily_usage_group_pc,aes(x="", y = percent, fill =usage)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
      theme(  axis.ticks = element_blank(), 
          panel.grid = element_blank(),
          axis.text = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.border = element_blank(),
          plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  geom_text(aes(label = labels, x= 1.2),
            position = position_stack(vjust = 0.6)) +
 scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"),
                  labels = c("High use - 21 to 31 days",
                             "Moderate use - 11 to 20 days",
                             "Low use - 1 to 10 days"))+
      labs(title = " Daily use of Smart devive") 


**Observations**
  * 73% of users who frequently used their devices between 22 - 31 days, that is 24 out of 33 participants.
  * 12% of users who moderately used their devices between 15 - 21 days. Which is  7 out of 33 participants.
  * 6% of users who used their devices least frequently between 1- 14 days.This makes up 2 out of 33 participants.
  * A large majority of users use the device frequently between 22-31 days.
​

<a id="5_3"></a>
### 5.3. Average steps
We will be looking at the average  steps by day, by usage groups and  steps through the hours of the day. 


<a id="5_3_1"></a>
#### 5.3.1. Average steps by day
​

In [None]:
#Leftjoin daily activity with usage groups df

daily_activity_usage <- daily_activity%>%
  left_join(daily_usage_group, by= 'id')%>%
  mutate(day= format(ymd(date), format = '%a'))%>%
  mutate(total_mintutes_worn = sedentary_minutes+lightly_active_minutes+fairly_active_minutes+very_active_minutes)%>%
  mutate(total_hours= seconds_to_period(total_mintutes_worn*60))

head(daily_activity_usage)

In [None]:

# average steps per day dataframe

steps_per_day<- daily_activity_usage%>%
  group_by(day)%>%
  summarise(average_steps = round(mean(total_steps)))%>%
  mutate(day= factor(day, levels = c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')))

head(steps_per_day)

In [None]:
# plots for avg steps by day

ggplot(steps_per_day, aes(x=day, y= average_steps, fill= average_steps))+
  geom_col(color="darkblue", size = 0.1) +  
  geom_hline(yintercept = 7500)+
  scale_fill_gradientn(limits=c(0,10000), breaks=seq(0,10000, by = 2500), 
                       colours = brewer.pal(9, "Blues")) +
  scale_y_continuous(limits=c(0,10000), breaks=seq(0, 10000, by = 2500))+ 
  labs(title= ("Average Steps"), subtitle = ('By Day'), x="" , y="Calories")+
  theme(plot.title=element_text(size = 16,hjust = 0))+
  theme(plot.subtitle=element_text(size = 14,hjust = 0))+
  theme(axis.text.y=element_text(size=14)) +
  theme(axis.text.x=element_text(size=14,hjust= 0.5))+
  theme(axis.title.x = element_text(margin = margin(t = 14, r = 0, b = 0, l = 0)))+
  theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))+
  theme(legend.position = "top")+
  theme(legend.title=element_text(size=12))+
  theme(legend.text=element_text(size=8))+
  guides(fill = guide_colourbar(barwidth = 12))
options(repr.plot.width = 10, repr.plot.height = 8)


**Observations**
  * Tuesday and Saturday are recorded the most active days 
  * Sunday has the lowest number of steps, unsurprisingly a rest day
​

<a id="5_3_2"></a>
#### 5.3.2. Average steps by usage groups

In [None]:
#dataframe for average steps by usage groups

steps_by_ug<- daily_activity_usage%>%
  group_by(day, usage)%>%
select(usage, total_steps,day)%>%
  mutate(day= factor(day, level=c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')))

head(steps_by_ug)

In [None]:
# plot for the average steps per usage group

ggplot(steps_by_ug, aes(x= , y=total_steps, fill=usage))+
  geom_boxplot()+
  scale_y_continuous(limits = c(0,40000), breaks = seq(0,40000, by =5000))+
  theme(legend.position = "none", plot.title = element_text(size = 11))+
  ggtitle("A boxplot with jitter")+
  xlab(" ")+
  labs(title = ("Average Steps"), subtitle = ('By Usage Group'), x=" ", y="Steps")+
  theme(plot.title=element_text(size = 16,hjust = 0))+
  theme(plot.subtitle=element_text(size = 14,hjust = 0))+
  theme(axis.text.y=element_text(size=14)) +
  theme(axis.text.x=element_text(size=14,hjust= 0.5))+
  theme(axis.title.x = element_text(margin = margin(t = 14, r = 0, b = 0, l = 0)))+
  theme(axis.title.y = element_text(margin = margin(t = 0, r = 10, b = 0, l = 0)))+
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())+
  theme(legend.position = "top")+
  theme(legend.title=element_text(size=12))+
  theme(legend.text=element_text(size=8))+
  facet_grid(~usage)
options(repr.plot.width = 5, repr.plot.height = 6)


**Observations**
  * The high use group has the highest number of steps with an average of 9500 steps
  * The average steps decreases across the groups

<a id="5_3_3"></a>
### 5.3.3. Hourly steps through the day

In [None]:
# Let's seperate the date_time colomn so we can work with the time colomn
hourly_steps<- hourly_steps%>%
  separate(date_time, into = c("date", "time"), sep= " ")%>%
  mutate(date= ymd(date))

head(hourly_steps)

In [None]:
# plot hourly steps through the day
hourly_steps%>%
  group_by(time)%>%
  summarise(average_steps = mean(step_total))%>%
  ggplot()+
geom_col(mapping = aes(x=time, y= average_steps, fill= average_steps))+
  labs(title = "Hourly steps throughout the day", x="", y="")+
  scale_fill_gradient(low = "red", high = "green")+
  theme(axis.text.x= element_text(angle = 90))

**Observations**
  * most active hours is between 8am to 7pm.
  * more steps are taken during lunch time (12pm-2pm) and evenings(5pm-7pm).

<a id="5_3_4"></a>
### 5.3.4. Daily use of smart device in minutes
WE want to know how long each user had their smart device on per day. To achieve this daily activity and daily usage group df will be merged 

In [None]:
daily_usage_group_merged<- merge(daily_activity,daily_usage_group, by = c("id"))

head(daily_usage_group_merged)

 There are 1440 minutes in a day,  the sum of veryactiveminutes, fairlyactiveminutes, lightlyactiveminutes and sedentaryminutes is the total minutes the smart device was used by the users

In [None]:
 #group the users into categories

# minutes worn}
minutes_worn <- daily_usage_group_merged %>% 
  mutate(total_minutes_worn = very_active_minutes+fairly_active_minutes+lightly_active_minutes+sedentary_minutes)%>%
  mutate (minutes_worn_percent = (total_minutes_worn/1440)*100) %>%
  mutate (worn = case_when(
    minutes_worn_percent == 100 ~ "All day",
    minutes_worn_percent < 100 & minutes_worn_percent >= 50~ "More than half day", 
    minutes_worn_percent < 50 & minutes_worn_percent > 0 ~ "Less than half day"
  ))

head(minutes_worn)

To visualize on a pie chart for user type we create a dataframe to show the total users and will calculate percentage of minutes worn

In [None]:
total_minutes_worn_percent<- minutes_worn%>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

total_minutes_worn_percent$worn <- factor(total_minutes_worn_percent$worn, levels = c("All day", "More than half day", "Less than half day"))

head(total_minutes_worn_percent)


We need to know how each of the usage group had their smart device on per day, to do this we create a dataframe for each of the usage group

In [None]:
# high use
minutes_worn_high_use <- minutes_worn%>%
  filter (usage == "High Use")%>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

minutes_worn_high_use$worn <- factor(minutes_worn_high_use$worn, levels = c("All day", "More than half day", "Less than half day"))

# moderate use
minutes_worn_moderate_use <- minutes_worn%>%
  filter(usage == "Moderate Use") %>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

minutes_worn_moderate_use$worn <- factor(minutes_worn_moderate_use$worn, levels = c("All day", "More than half day", "Less than half day"))

# low use
minutes_worn_low_use <- minutes_worn%>%
  filter (usage == "Low Use") %>%
  group_by(worn) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent = total / totals) %>%
  mutate(labels = scales::percent(total_percent))

minutes_worn_low_use$worn <- factor(minutes_worn_low_use$worn, levels = c("All day", "More than half day", "Less than half day"))


head(minutes_worn_high_use)
head(minutes_worn_moderate_use)
head(minutes_worn_low_use)

In [None]:
# plot for minutes worn 
ggarrange(
  ggplot(total_minutes_worn_percent, aes(x="",y=total_percent, fill=worn)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5)) +
  scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5), size = 3.5)+
  labs(title="Time worn per day", subtitle = "Total Users"),
ggarrange(
    ggplot(minutes_worn_high_use, aes(x="",y=total_percent, fill=worn)) +
      geom_bar(stat = "identity", width = 1)+
      coord_polar("y", start=0)+
      theme_minimal()+
      theme(axis.title.x= element_blank(),
            axis.title.y = element_blank(),
            panel.border = element_blank(), 
            panel.grid = element_blank(), 
            axis.ticks = element_blank(),
            axis.text.x = element_blank(),
            plot.title = element_text(hjust = 0.5, size=14, face = "bold"),
            plot.subtitle = element_text(hjust = 0.5), 
            legend.position = "none")+
      scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
      geom_text(aes(label = labels),
                      position = position_stack(vjust = 0.5), size = 3)+
      labs(title="", subtitle = "High use - Users"), 
    
    ggplot(minutes_worn_moderate_use, aes(x="",y=total_percent, fill=worn)) +
      geom_bar(stat = "identity", width = 1)+
      coord_polar("y", start=0)+
      theme_minimal()+
      theme(axis.title.x= element_blank(),
            axis.title.y = element_blank(),
            panel.border = element_blank(), 
            panel.grid = element_blank(), 
            axis.ticks = element_blank(),
            axis.text.x = element_blank(),
            plot.title = element_text(hjust = 0.5, size=14, face = "bold"), 
            plot.subtitle = element_text(hjust = 0.5),
            legend.position = "none") +
      scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
      geom_text(aes(label = labels),
                position = position_stack(vjust = 0.5), size = 3)+
      labs(title="", subtitle = "Moderate use - Users"), 
    
    ggplot(minutes_worn_low_use, aes(x="",y=total_percent, fill=worn)) +
      geom_bar(stat = "identity", width = 1)+
      coord_polar("y", start=0)+
      theme_minimal()+
      theme(axis.title.x= element_blank(),
            axis.title.y = element_blank(),
            panel.border = element_blank(), 
            panel.grid = element_blank(), 
            axis.ticks = element_blank(),
            axis.text.x = element_blank(),
            plot.title = element_text(hjust = 0.5, size=14, face = "bold"), 
            plot.subtitle = element_text(hjust = 0.5),
            legend.position = "none") +
      scale_fill_manual(values = c("#004d99", "#3399ff", "#cce6ff"))+
      geom_text(aes(label = labels),
                position = position_stack(vjust = 0.5), size = 3)+
      labs(title="", subtitle = "Low use - Users"), 
    ncol = 3), 
  nrow = 2)

 **Observations**
Let's reminder that: 
 * High users- 45.8% of the users who used the devices for 21 to 31 days used their smart devices all day, 51.7% used it more than half a day.
 * moderate users- wear the device more on a daily basis
 * Low users- they mostly wear the device all day on the day they use it, the graph show that they do not wear the device less than half a day
 * we can see that about 50% of the total users wore the smart device all day, 47% more than half day and 3% less than half day




<a id="5_4"></a>
### 5.4. Correlation between daily steps and calories

In [None]:
# let's check for correlation between steps and calories
ggplot(data=daily_activity)+
  geom_smooth(mapping=aes(x= total_steps,y= calories))+
   geom_point(mapping=aes(x= total_steps,y= calories))+
  labs(title = "Daily steps vs Calories", x= "calories",y= "total_steps")+
      theme(panel.background = element_blank(), plot.title = element_text(size = 14))

**Observations** 
  * There is a positive correlation between the total steps and calories burnt.
  * The more steps taken the more calories are burnt
  

<a id="6"></a> 
## 6. Recommendations
 1. To enhance user experiences give real-time feedback to users - providing regular daily (even hourly) engagement with users so that they can make quick informed decisions to optimize their routine and through this product involvement increases.
 
 2. Device's Accuracy: Improving a product's accuracy in measurement is crucial to any fitness device, such features as sensor, connectivity and charging. This will help prevent lost meaningful data.
 
 3. Product design: modify smart device so it can be worn all day to fit activities without interfering, modifications such as water resistance, long lasting batteries and fashionable.
 
 4. Further Research: Surveys to be sent out at intervals in order to gain a more in-depth knowledge of consumer behavior and preferences such as:

 * Consumers' primary interest to pick up a fitness-focused wearable device (e.g, preference of functions)?
 * Is there an issue with the ease of use of the device? i.e, syncing, integrating, charging, comfortableness for the user to sleep on with