# Google Capstone project: Bellabeat

# # About the Company

**Urška Sršen** and **Sando Mur** founded **Bellabeat**, a high-tech company that manufactures health-focused smart products.
Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around
the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with
knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly
positioned itself as a tech-driven wellness company for women

**PHASE 1: ASK**

**1. Identifying the business task**

   Help the company to target more audience by analyzing the use of the smart devices.This inturn helps us in making high    level recommendations about how the trends in user pattern can improve the marketing strategy.

**2. Who are the key stakeholders?**

   The key shakeholders here are Urška Sršen and Sando Mur, who are the founders of Bellabeat and the Marketing      and analytics team.

**3. Business Task**

   Identify the patterns and trends on how a user is using the products of bellabeat and use these insights to help the      marketing team for making data driven decisions.

**PHASE 2: PREPARE**

*  **Credibility of Data**
  
   FitBit Fitness Tracker Data is the data used from kaggle.The data is collected from thirty eligible Fitbit users who      have submitted their personal tracker data  including minute-level output for physical activity, heart rate, and sleep    monitoring.The data is accessible in both long and wide formats and was licensed under CC0: Public Domain via Mobius.
   
*  **Loading the Datasets**  

   All the required dataset for analysis is loaded and data is viewed.The dataset of daily activities is loaded on all      the futher analysis will be done for daily usage of the device. 
   
    


# Importing packages

In [133]:
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)

In [134]:
#importing datasets
dailyActivity<-read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
dailyCalories<-read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
dailySteps<-read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
sleep<-read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight<-read.csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")


In [135]:
#View the data
head(dailyActivity)
head(dailyCalories)
head(dailySteps)
head(sleep)
head(weight)

**PHASE 3: PROCESS**

In this phase the data is cleaned and manipulated to make it ready to analyze.The date format has to befixed and few of the tables can be merged.


# Cleaning the data

In [136]:
#formatting the date column in all dataset
dailyActivity$Date= as.POSIXct(dailyActivity$ActivityDate, format="%m/%d/%Y%H:%M:%S:%:%p",tz=Sys.timezone())
dailyActivity$Date<-format(dailyActivity$ActivityDate, format ="/%m/%d/%Y%")
dailyActivity$Time<-format(dailyActivity$ActivityDate, format="%H:%M:%S")

dailyCalories$Date= as.POSIXct(dailyCalories$ActivityDay, format="%m/%d/%Y%H:%M:%S:%:%p",tz=Sys.timezone())
dailyCalories$Date<-format(dailyCalories$ActivityDay, format ="/%m/%d/%Y%")
dailyCalories$Time<-format(dailyCalories$ActivityDay, format="%H:%M:%S")

dailySteps$Date= as.POSIXct(dailySteps$ActivityDay, format="%m/%d/%Y%H:%M:%S:%:%p",tz=Sys.timezone())
dailySteps$Date<-format(dailySteps$ActivityDay, format ="/%m/%d/%Y%")
dailySteps$Time<-format(dailySteps$ActivityDay, format="%H:%M:%S")

weight$Date= as.POSIXct(weight$Date, format="%m/%d/%Y%H:%M:%S:%:%p",tz=Sys.timezone())
weight$Date<-format(weight$Date, format ="/%m/%d/%Y%")
weight$Date<-format(weight$Date, format ="%H:%M:%S%")

sleep$Date= as.POSIXct(sleep$SleepDay, format="%m/%d/%Y%H:%M:%S:%:%p",tz=Sys.timezone())
sleep$Date<-format(sleep$SleepDay, format ="/%m/%d/%Y%")
sleep$Time<-format(sleep$SleepDay, format="%H:%M:%S")

In [137]:
#checking for distinct values for finding duplicates
n_distinct(dailyActivity$Id)
n_distinct(dailySteps$Id)
n_distinct(dailyCalories$Id)
n_distinct(weight$Id)
n_distinct(sleep$Id)

In [138]:
#checking for no of duplicates
sum(duplicated(dailyActivity))
sum(is.na(dailyActivity))
sum(duplicated(dailySteps))
sum(is.na(dailySteps))
sum(duplicated(dailyCalories))
sum(is.na(dailyCalories))
sum(duplicated(sleep))
sum(is.na(sleep))
#there are 3 duplicates in sleep

In [139]:
#remove 3 duplicates that were found
sleep <- sleep %>% 
  distinct()
sum(duplicated(sleep))

In [175]:
#Merging tables that can be made into one table
merged_activity_calories<-merge(dailyActivity,dailyCalories,by = c('Id','Calories'))
daily_data <- merge(merged_activity_calories, sleep, by = "Id")
#removing few repeated columns of date and time
dailydata=subset(daily_data, select=-c(Date,Date.x,Time,Time.x,Time.y,Date.y,ActivityDay,SleepDay))
head(dailydata)




**PHASE 4: ANALYZE**
* Identifying trends and relationships in data
* Summarizing the informations from the data

In [141]:
summary(dailydata)

**Some Insights from the data above:**

* The average total steps in only about 6368 which is somewhat active but the target for a person should be 10000 steps,which is said to help reduce certain health conditions, such as high blood pressure and heart disease.
* The mean distance covered is 4.4 which can still be improved and users can set a target for 5 miles
* The average shows that users are only active for 1 mile and the rest of the distance is either fairly active or lightly active.
* The average of Sedentary minutes in 937min (15.6 hrs) which implies that the users are inactive for majority of the time.
* Most of the users on an average have 7 hrs of sleep which is the recommended amount.

**PHASE 5: SHARE**
* Anlysing various trends and visualizing the data.

# Visualizing the data

In [142]:
ggplot(data=sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + 
 geom_point() + geom_smooth() + labs(title="Time in bed VS Total Time Sleeping")

cor(sleep$TotalMinutesAsleep, sleep$TotalTimeInBed)

Its highly correlated with a correlation value of 0.93.It depicts positive correlation.The more time people spend in bed the more sleep they have.

In [143]:
ggplot(data=dailyActivity, aes(x=TotalSteps, y=Calories)) + 
 geom_point() + geom_smooth() + labs(title="Total Steps vs. Calories") 


The more number of steps the more calories are burnt.There is a positive correlation in the above graph.

In [144]:
#a bar graph of diffrent minutes
tinact<-sum(dailyActivity$SedentaryMinutes)
veryact<-sum(dailyActivity$VeryActiveMinutes)
fair<-sum(dailyActivity$FairlyActiveMinutes)
light<-sum(dailyActivity$LightlyActiveMinutes)

differentmin<- c(Sedentary=tinact,VerActive=veryact,FairlyActive=fair,LightlyActive=light)
barplot(differentmin, main="Time spent in minutes", col=c("red2", "green3", "slateblue4", "yellow2"))


A lot of time is spent in being inactive which is not a good indication of the users.

In [213]:
#histogram for very active minutes
ggplot(data=dailyActivity, aes(x = VeryActiveMinutes))+geom_histogram()

The above graph shows that the no of minutes people are active for more that an hour is comparatively less.People are very active for hardly 0-20 minutes.

**PHASE 6: ACT**

Recommendations
* Most of the users follow a sedentary life-style i.e being inactive or sitting for prolonged hours.Bellabeat devices can send notifications to such users reminding them to move around.
* The more the number of steps more is the calories burnt which is a good sign.The device can give a target to reduce certain amount of calories everyday.This would motivate users to burn that amount of calories
* The company should target more women who spend their time being inactive(working infront of computer,being at home,etc).They should be their target audience since they are susceptible for bad health conditions.
* The total number of steps taken on an average is around 6000 which is fairly low and the recommended target is 10000 steps a day.The app could provide points everyday to achieve those steps.
* The app could come with campaign for more users to use the device and also share their data for analysis.They could spread   awareness about benefits of leading a healthy life style and how some activities which we consider normal(sitting for longer time) might actual be the key reason for poor health.
* The app could also set up milestones for once in 3 months or 6 months or so for the user.They could use those points to claim some price.
