In [None]:
# This R environment comes with many helpful analytics packages installed
# It is defined by the kaggle/rstats Docker image: https://github.com/kaggle/docker-rstats
# For example, here's a helpful package to load

library(tidyverse) # metapackage of all tidyverse packages

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

list.files(path = "../input")

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**Lets begin with the FitBit Fitness data.**

In [43]:
library('tidyverse')
library('janitor')
library('skimr')
library('here')
library('dplyr')
library('lubridate')
library('ggplot2')

# Importing Dataset

In [44]:
#Importing data
daily_activity <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
daily_calories <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
daily_intensities <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
daily_steps <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
heartrate_seconds <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
hourly_calories <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
hourly_intensities <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
hourly_steps <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
minute_calories_narrow <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteCaloriesNarrow_merged.csv")
minute_calories_wide <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteCaloriesWide_merged.csv")
minute_MET_narrow <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteMETsNarrow_merged.csv")
minute_sleep <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteSleep_merged.csv")
minute_step_narrow <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteStepsNarrow_merged.csv")
minute_step_wide <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/minuteStepsWide_merged.csv")
sleep_Day <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight_loginfo <- read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

The following 3 datasets we will be using for trend analysis:
dailyActivity_merged.csv
sleepDay_merged.csv
weightLogInfo_merged.csv

# Cleaning the data

In [45]:
dailyActivity_merged_2 <- clean_names(daily_activity)
sleepDay_merged_2 <- clean_names(sleep_Day)
weightLogInfo_merged_2 <- clean_names(weight_loginfo)

View(dailyActivity_merged_2)
View(sleepDay_merged_2)
View(weightLogInfo_merged_2)

dailyActivity has the data in mdy format, whereas weightLog & sleepDay has it in mdy_hms format. Need to transform into ymd format

In [46]:
#standardize date format
dailyActivity_merged_2$activity_date <- as.Date(dailyActivity_merged_2$activity_date, "%m/%d/%Y")
weightLogInfo_merged_2$date <- parse_date_time(weightLogInfo_merged_2$date, orders = 'mdy HMS')
weightLogInfo_merged_2$date <- as.Date(weightLogInfo_merged_2$date, "%m/%d/%y %h:%m:%s")
sleepDay_merged_2$sleep_day <- parse_date_time(sleepDay_merged_2$sleep_day, orders = 'mdy HMS')
sleepDay_merged_2$sleep_day <- as.Date(sleepDay_merged_2$sleep_day, "%m/%d/%y %h:%m:%s")
View(sleepDay_merged_2)
View(weightLogInfo_merged_2)
View(dailyActivity_merged_2)


In [47]:
#examining the structure of dataframes after formatting
str(sleepDay_merged_2)
str(weightLogInfo_merged_2)
str(dailyActivity_merged_2)

We are using a left join to merge the data. NA will be shown on every observation not matching, which is replaced by a 0 here.

In [48]:
daily_activity_sleep <- merge(x= dailyActivity_merged_2, y= sleepDay_merged_2,
                              by.x = c("id", "activity_date"), by.y = c("id", "sleep_day"), all.x = TRUE)
daily_activity_sleep [is.na(daily_activity_sleep)] <- 0
View(daily_activity_sleep)

# Creating Categories

In [49]:
daily_activity_sleep <- daily_activity_sleep %>% 
  mutate(sleep_categories = case_when(
    total_minutes_asleep > 360 & total_minutes_asleep <= 480 ~ "6h-8h",
    total_minutes_asleep > 480 ~ "> 8h",
    TRUE ~ "< 6h"
  )) %>% 
  mutate(calorie_categories = case_when(
    calories > 1500 & calories <= 2500 ~ "1.5k-2.5k",
    calories > 2500 ~ "> 2.5k",
    TRUE ~ "< 1.5k"
  )) %>% 
  mutate(distance_categories = case_when(
    total_distance > 5 & total_distance <= 10 ~ "5km-10km",
    total_distance > 10 ~ "> 10km",
    TRUE ~ "<5km"
  ))

View(daily_activity_sleep)

# Creating Visualizations

In [50]:
#Correlation between distance & calories burnt
ggplot(data= daily_activity_sleep) +
  geom_col(mapping= aes(x=distance_categories, y= calories, fill= distance_categories))

**Correlation between sleep & calories burnt**

In [51]:
#Correlation between sleep & calories burnt
ggplot(data= daily_activity_sleep) +
  geom_col(mapping= aes(x=sleep_categories, y= calories, fill= sleep_categories))+facet_wrap("distance_categories")

# Summary of Data Analysis

**Correlation between distance & calories burnt**

* The graph shows direct correlation between the distance taken and the calories burnt, where the greater the distance taken, the more the calories burnt.
* The average calories burnt by a person taken distance of less than 5km is around 1800 calories a day.
* The average calories burnt by a person taken distance of 5km-10km is around 2400 calories a day.
* The average calories burnt by a person taken distance of more than 10km is around 3100 calories a day.

**Correlation between sleep & calories burnt**
* People who tend to sleep <6h a day & people tend to sleep >8h a day burn fewer calories as compared to people with 6h-8h sleep while covering similar distance.

# Based on the analysis conducted, please find my recommendations for Bellabeat as follows,

1. There is a clear relationship between sleep and calories burnt. This can showcase to the customers the benefits of tracking sleep in achieving wight loss goals.
2. A marketing strategy can be implemented to tell about sufficient sleep required by body, how it be achieved and how bellabeat can help them keep track of it and improve it.
3. One of the most beneficial features of smart wearing devices is to motivate customers to have healthier lifestyles. A peer comparison feature might be developed to encourage customers to increase their active level to improve their health.
