## About the Company
Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company.

## Questions for the analysis
1.What are some trends in smart device usage?<br>
2.How could these trends apply to Bellabeat customers?<br>
3.How could these trends help influence Bellabeat marketing strategy?<br>

## Business task
Identify potential opportunities for growth and recommendations for the Bellabeat marketing strategy improvement based on trends in smart device usage.

## Loading R packages

In [None]:
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)

## Importing datasets
For this project, I will use FitBit Fitness Tracker Data.

In [None]:
activity <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
intensities <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
sleep <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

I already checked the data in Google Sheets. I just want to make sure that everything were imported correctly by using View(),str() and head() functions.

In [None]:
View(activity)

In [None]:
str(activity)

str(calories)

str(intensities)

str(sleep)

str(weight)

There are some problems with the time stamp data. So before performing exploratory data analysis, I need to convert it to date time format and split to date and time.

## Fixing Formatting

In [None]:
# intensities
intensities$ActivityHour=as.POSIXct(intensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intensities$time <- format(intensities$ActivityHour, format = "%H:%M:%S")
intensities$date <- format(intensities$ActivityHour, format = "%m/%d/%y")
# calories
calories$ActivityHour=as.POSIXct(calories$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
calories$time <- format(calories$ActivityHour, format = "%H:%M:%S")
calories$date <- format(calories$ActivityHour, format = "%m/%d/%y")
# activity
activity$ActivityDate=as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")
# sleep
sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")

In [None]:
str(activity)

str(calories)

str(intensities)

str(sleep)

str(weight)

Now that everything is consistent, I can start exploring data sets.

## Exploratory Data Analysis

In [None]:
#Number of Unique IDs in each data sets
n_distinct(activity$Id)
n_distinct(calories$Id)
n_distinct(intensities$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)

There are 33 participants in the activity, calories and intensities data sets, 24 in the sleep and only 8 in the weight data set. 8 participants is not a significant number to make any recommendations and conclusions based on this data.

Let’s explore the summary statistics of the data sets:

### Activity data sets

In [None]:
# activity
activity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes, Calories) %>%
  summary()

# explore number of active minutes per category
activity %>%
  select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
  summary()

### Calories data sets

In [None]:
# calories
calories %>%
  select(Calories) %>%
  summary()

### Sleep data sets

In [None]:
# sleep
sleep %>%
  select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
  summary()

### Weight data sets

In [None]:
# weight
weight %>%
  select(WeightKg, BMI) %>%
  summary()

Some interesting discoveries from this summary analysis:

Average sedentary time is 991 minutes or 16 hours. It should be reduced!

The majority of the participants are lightly active.

On the average, participants sleep 1 time for 7 hours.

Average total steps per day are 7638 which a little bit less for having health benefits. A 2011 study noted that healthy adults tend to do 4,000–18,000 steps per day, and that 10,000 steps per day is a reasonable target for healthy adults.
One study found that getting at least 15,000 steps per day is correlated with a lower risk of metabolic syndrome, which often includes obesity and heart attack.

However, getting to 10,000 steps may also help people lose weight and improve their mood.

## Merging the data sets

Before beginning to visualize the data, I need to merge two data sets. I’m going to merge (inner join) 'activity' and 'sleep' on columns 'Id' and 'date' (that I previously created after converting data to date time format).

In [None]:
merged_data <- merge(sleep,activity,by = c('Id','date'))
head(merged_data)

In [None]:
str(merged_data)

## Data Visualization

In [None]:
ggplot(data=activity, aes(x=TotalSteps, y=Calories)) + 
  geom_point(color='orange') + geom_smooth() + labs(title="Total Steps vs. Calories")

I am getting positive correlation here between Total Steps and Calories, which is obvious - the less sedentary we are, the more calories we burn.

In [None]:
ggplot(data=sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + 
  geom_point(color = 'salmon')+ labs(title="Total Minutes Asleep vs. Total Time in Bed")

The relationship between Total Minutes Asleep and Total Time in Bed is very much linear. So if the Bellabeat users want to improve their sleep, we should consider using notification to go to sleep.

Now I'm going to look at intensities data over time (hourly).

In [None]:
intensity_new <- intensities %>%
  group_by(time) %>%
  drop_na() %>%
  summarise(mean_total_int = mean(TotalIntensity))

ggplot(data=intensity_new, aes(x=time, y=mean_total_int)) + geom_histogram(stat = "identity", fill='lightseagreen') +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title="Average Total Intensity vs. Time")


After visualizing Total Intensity hourly, I found out that people are more active between 6 am and 10pm.

Most activity happens between 5 pm and 7 pm - I suppose, that people go to a gym or for a walk after finishing work. We can use this time in the Bellabeat app to remind and motivate users to go for a run or walk.

In [None]:
ggplot(data=merged_data, aes(x=TotalMinutesAsleep, y=SedentaryMinutes)) + 
geom_point(color='orange') + geom_smooth() +
  labs(title="Minutes Asleep vs. Sedentary Minutes")

Here we can clearly see the negative relationship between Sedentary Minutes and Sleep time.

### For Suggestion : 
if Bellabeat users want to improve their sleep, Bellabeat app can recommend reducing sedentary time and give sedentary notification after a duration.

Keep in mind that we need to support this insights with more data, because correlation between some data doesn’t mean causation.

In [None]:
colnames(merged_data)

In [None]:
ggplot(data = merged_data,aes (x = LightlyActiveMinutes , y = Calories)) + 
  geom_point(color = 'turquoise') + geom_smooth() +
  labs(title = 'LightlyActiveMinutes vs Calories')

In [None]:
ggplot(data = merged_data,aes (x = VeryActiveMinutes , y = Calories)) + 
  geom_point(color = 'brown') + geom_smooth() +
  labs(title = 'VeryActiveMinutes vs Calories')


As we can see from the visualization of Calories vs VeryActiveMinutes and Calories vs LightlyActiveMinutes the Calories burn more in VeryActiveMinutes while burning of calories can be seen lower in LightlyActiveMinutes.

## Summarizing recommendations for the business

As we already know, collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

After analyzing FitBit Fitness Tracker Data, I found some insights that would help influence Bellabeat marketing strategy.

![](https://images.pexels.com/photos/4775195/pexels-photo-4775195.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)

## Target audience

Women who work full-time jobs (according to the hourly intensity data) and spend a lot of time at the computer/in a meeting/ focused on work they are doing (according to the sedentary time data).

These women do some light activity to stay healthy (according to the activity type analysis). Even though they need to improve their everyday activity to have health benefits. They might need some knowledge about developing healthy habits or motivation to keep going.

As there is no gender information about the participants, I assumed that all genders were presented and balanced in this data set.

### The key message for the Bellabeat online campaign

The Bellabeat app is not just another fitness activity app. It’s a guide (a friend) who empowers women to balance full personal and professional life and healthy habits and routines by educating and motivating them through daily app recommendations.

### Ideas for the Bellabeat app

Average total steps per day are 7638 which a little bit less for having health benefits. A 2011 study noted that healthy adults tend to do 4,000–18,000 steps per day, and that 10,000 steps per day is a reasonable target for healthy adults.
One study found that getting at least 15,000 steps per day is correlated with a lower risk of metabolic syndrome, which often includes obesity and heart attack.

However, getting to 10,000 steps may also help people lose weight and improve their mood.. Bellabeat can encourage people to take at least 10, 000 explaining the benefits for their health.

If users want to lose weight, it’s probably a good idea to control daily calorie consumption. Bellabeat can suggest some ideas for low-calorie lunch and dinner.

If users want to improve their sleep, Bellabeat should consider using app notifications to go to bed.

Most activity happens between 5 pm and 7 pm - I suppose, that people go to a gym or for a walk after finishing work. Bellabeat can use this time to remind and motivate users to go for a run or walk.

As we can see from the visualization of Calories vs VeryActiveMinutes and Calories vs LightlyActiveMinutes the Calories burn more in VeryActiveMinutes while burning of calories can be seen lower in LightlyActiveMinutes.

### For Suggestion: 
if users want to improve their sleep, the Bellabeat app can recommend reducing sedentary time.
And also, Bellabeat can show some notification during LightlyActiveMinutes so that users get some awarness during sedentary time also.

Thank you for your interest to my Bellabeat Case Study!

This is my first project using R. I would appreciate if ypu guys any comments and recommendations for improvement!