# **Introduction**
### Bellabeat is a company that manufactures health-focused smart products for women. They offer several  products centered around women, including Bellabeat app, Leaf, Time and Spring. Bellabeat has recently been investing in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. 

## **Business Task**
### The owner, Sršen, wants to use available consumer data to reveal more opportunities for growth. In that end, the marketing analytics team will be focusing on how to better market their smart devices. We will be looking for trends in competitive smart device usage, how Bellabeat can apply these trends, and how Bellabeat can utilize these trends for marketing strategies. Any cleaning or manipulation of data will show in this report so that it can be replicated.

## **Preparation of Data**
### The data chosen for this analysis is from the FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius). The data contains information from thirty-three FitBit users on their usage of the device. In this analysis we will be looking at information, such as daily activity, daily calories, and daily steps, in order to find trends in usage.

### R Studio was used to complete this analysis because of the scope of the data and ease of visualization

## **Installing and Loading of Packages:**

#### First I installed the necessary packages to complete my analysis

In [None]:

install.packages("tidyverse")
library("tidyverse")
install.packages("here")
library("here")
install.packages("skimr")
library("skimr")
install.packages("janitor")
library("janitor")

## **Importing Data**
### The data included 18 CSV files that broke down usage into days, hours, minutes, and sometimes seconds. For the most part we will only be looking at daily data to notice trends day to day. Pertinent files were imported and renamed for ease of use and consistency. 

#### Below are the csv's that I chose to import

In [None]:
dailyActivity <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
dailyCalories <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
dailyIntensities <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
dailySteps <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
heartRate <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
dailySleep <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weightLog <- read.csv("/kaggle/input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

## **Cleaning the Data**
### Next the data was viewed for consistency and to check for any missing, mislabeled, or incorrect data. The head() function was utilized to check the data, and the length() function was used to make sure that all the users were accounted for.

## Daily Activity

In [None]:
head(dailyActivity)
length(unique(dailyActivity$Id))

## Daily Calories

In [None]:
head(dailyCalories)
length(unique(dailyCalories$Id))

## Daily Intensities

In [None]:
head(dailyIntensities)
length(unique(dailyIntensities$Id))

## Daily Steps

In [None]:
head(dailySteps)
length(unique(dailySteps$Id))

## Daily Sleep

In [None]:
head(dailySleep)
length(unique(dailySleep$Id))

## Heart Rate

In [None]:
head(heartRate)
length(unique(heartRate$Id))

## Weight Log

In [None]:
head(weightLog)
length(unique(weightLog$Id))

## Preliminary Analysis:

#### After reviewing the data it is clear that because they all have an "Id" column the data sets can be merged for analysis. Another thing to note is that not all datasets have all thirty-three users. This could be missing data, but after reviewing how FitBits operate, it is likely the users did not log the missing data or opted out. For example, only twenty-four users were in "dailySleep". This could be due to the FitBit being uncomfortable for some to wear to sleep. Only fourteen users logged "heartRate". In FitBit this option is not automatically on and must be turned on for it to take data, perhaps the users did not know this. Also only eight users logged their weight. FitBit cannot weigh the users and log the information for them, they either did not want to or were not reminded to. While many of these points are speculative it gives us a good start for some recommendations.

### Next we will combine the dataframes we wish to compare to begin our analysis:

In [None]:
dailyIntensities_Sleep <- merge(dailyIntensities,dailySleep,by="Id")
dailyIntesitites_Sleep_Activity <- merge(dailyIntensities_Sleep, dailyActivity,by="Id")
dailySteps_Calories <- merge(dailySteps,dailyCalories,by="Id")

head(dailyIntesitites_Sleep_Activity)
head(dailySteps_Calories)

## **Analysis**
### After reviewing the data a few analysis can be made:

#### Below you can see the relationship between active minutes and calories burned. This shows that the more a user was active while wearing the FitBit, the more calories they burned. The data is further broke down to show individual user usage.

In [None]:
ggplot(data = dailyActivity) +
  geom_point(mapping = aes(x = VeryActiveMinutes + FairlyActiveMinutes + LightlyActiveMinutes, y = Calories, color = as.character(Id))) +
  labs(title="All Active Minutes and Totaly Daily Calories Burned", x="Total Active Minutes")

### We can also look at the relationship between the total daily steps of a user and the amount of calories burned to find a positive correlation.

In [None]:
ggplot(data = dailyActivity) +
  geom_point(mapping = aes(x = TotalSteps, y = Calories, color = as.character(Id))) +
  labs(title="Total Daily Steps and Calories Burned")

### Next we will look at the number of Ids again. You can clearly see thirty-three unique Ids listed under dailyActivity. While dailySleep shows only twenty-four, all of which are also in dailyActivity. It's the same for heartrate which contains only fourteen and eight in weightLog. All of which were in dailyActivity. This means that while people were using the app for loggin activities. They were not using it for these extra features.

In [None]:
count(dailyActivity, Id)
count(dailySleep, Id)
count(heartRate, Id)
count(weightLog, Id)

#### Finally we can look at intense activity per weekday. This can show us when users are most active and know when to send reminders or notifications through our apps. The code written below shows us changing the character date to a date recognized by the program as a true date. After that we clean the data to give us weekdays and average high activity times during the week. It is then plotted out on a graph

In [None]:
dailyActivity$ActivityDate=as.POSIXct(dailyActivity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())

weekdays <- dailyActivity %>%
  mutate(dailyActivity, Weekday = wday(ActivityDate, label = TRUE))
weekdaysv2 <- weekdays %>% 
  group_by(Weekday) %>% 
  summarise(AvgVeryActiveMinutes = mean(VeryActiveMinutes))

ggplot(data = weekdaysv2, mapping = aes(x = Weekday, y = AvgVeryActiveMinutes)) +
  geom_col(fill = "purple") +
  labs(title = "Average High Activity Throughout Week")

## **Recommendations**
#### Due to the fact that Bellabeat's competitors are not able to segment their data by gender, Bellabeat could have a competitive advantage in the market by continuing to focus on the female gender and collecting gender specific user data that could potentially be used in the future to help cater more specifically to this market. 

* Additionally, the data displayed that of the thirty-three users, only twenty-four of them were utilizing the sleep function, fourteen of them were utilizing heart rate function, and eight were logging their weight. Therefore, the majority of users are using the competitor's product to track their daily activity and calorie levels overall. Bellabeat should continue to add features and functionality to this main feature as well since this is what the users come into the most contact with.
* Another approach would be to call out the less used features that Bellabeat could also offer, so that users could increase their activity on these valuable features. The more the users increase their interaction with these less frequently used features, the more they will see the value of the product and be less likely to churn and be more likely to be a promoter of the product. I would recommend to build out notifications and/or gamification to remind users to use these features and have in-app rewards when they complete certain milestones. 
* The data also shows that the most common user persona that uses the competitive products are more than likely someone that is mid-level on their activity levels. For example, they seem to maintain a consistent level of activity throughout the week, averaging around twenty minutes a day. So the average user most likely has daily exercise built into their routine but also is not an high-tier athlete that exercises several hours a day. Therefore, I would suggest that the main user personas to market towards would be women that are currently engaging in some type of exercise routine currently but would like to take their health routines to the 'next level' or see how they can incorporate the Bella Beat's device into their daily routine of moderate exercise. 