
# How Can a Wellness Technology Company Play It Smart?

#### Author: Luiza Nazarkulova | Jan 2021

In [None]:
import os
from IPython.display import Image
Image(filename="../input/bellebeat/tg_image_3424401971.jpeg")

# 1. ASK
 ## Description
 This is a case study for my capstone project that I completed as part of the Google Data Analytics Certificate. In this project I am going to analyze customer data to help advance business goals for Bellabeat, a high-tech company that develops health-focused smart products for women. The process of this data analysis consists of 6 phases: ask, prepare, process, analyze, share, and act. 

### Background:
Bellabeat is the go-to wellness brand for women with an ecosystem of products and services focused on women’s health. The company is headquartered in San Francisco and has offices in Zagreb, San Francisco and London. It was founded in 2013 by Sandro Mur and Urška Sršen. 

### Business Task
 Analyze FitBit Fitness Tracker Data to gain insights into how consumers are using the FitBit app and discover trends and insights for Bellabeat marketing strategy.
### Key Stakeholders: 
*  Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.
*  Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

### Deliverables:
*  A clear summary of the business task
*  A description of all data sources used
*  Documentation of any cleaning or manipulation of data
*  A summary of analysis
*  Supporting visualizations and key findings
*  High-level content recommendations based on the analysis

# 2. Prepare
## Data Source:
*  The data used in this data analysis comes from [FitBit Fitness Tracker Data](https://www.kaggle.com/arashnic/fitbit), that is publicly available on Kaggle.
*  This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. 
* 30 eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. 



## Limitations of the given data set:
 - The size of the sample population (30 respondents) is not representative of the entire target group and might be insufficient for reliable statistical analysis
 - Time constraints- data sets date back to 2016.This might be outdated since variables could've changed since that year
 - Collected from 3rd party, which again might not be reliable.
 Thus, the data used in this analysis is not good quality data and might not produce the most accurate results.


# 3. Process
*  In this case study, I am using Python to clean and analyze the data.
*  Importing Necessary Libraries with Aliases:


In [None]:
import pandas as pd #for data arrays
import numpy as np# for data analysis and manipulation
import matplotlib as plt # for data viz
import datetime as dt #manipulation with dates and time

## Importing Data Sets:

# 1. Daily Activity Data Analysis

In [None]:
#using the file with merged daily activities in this dataframe 
df_act = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")

In [None]:
#checking the first 10 rows
df_act.head(10)

# Checking for NA values:

In [None]:
df_act.isnull().values.any()

# Checking the structure of our dataframe:

In [None]:
df_act.info()

From above we can see that Activity Date was wrongly categorized as object (mixed or numeric) data type. Therefore, column ActivityDate has to be converted to datetime64 NumPy type:

In [None]:
df_act["ActivityDate"] = pd.to_datetime(df_act["ActivityDate"],format="%m/%d/%Y")
#confirming
df_act.info()


## Sanity Check
### Checking for unique values in IDs to find how many observations we are considering

In [None]:
df_act.nunique()

 In this part of the analysis out of 33 observations in the Daily Activity Dataset we will try to determine the average number of steps each user takes a day:

In [None]:

df_act.describe()

From above we can see that average user takes 7638 steps a day, with 21 Active minutes and 991 Sedentary Minutes. Moreover, on average users burn about 2303 Calories a day.


## 2. Sleep Data Analysis

In [None]:
#using the file with merged sleep information in this dataframe 
df_sleep = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
#checking the first 10 rows
df_sleep.head(10)



In [None]:
#cheching for Null values
df_sleep.isnull().values.any()

In [None]:
df_sleep.info()

In [None]:
#Convert Sleep Day type from object to datetime in pandas
df_sleep["SleepDay"]=df_sleep["SleepDay"].astype('datetime64[ns]')
#df['Date'] = df['Date'].astype('datetime64[ns]')
df_sleep.head(10)

In [None]:
df_sleep.describe()

 We notice that on average user sleeps for 420 minutes a day, which is about 7 hours. Moreover,participants spend 459 minutes in bed, or 7.65 hours. This means participants spend around 39 minutes awake in bed. 

Next, we want to know what times during the day users are the most active. For this we use our hourlysteps_merged.csv file. 

In [None]:
#using the file with merged sleep information in this dataframe 
df_hourlySteps = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
#checking the first 10 rows
df_hourlySteps.head(10)


Here we notice that the column Activity Hour contains both the date and time. If we want to analyze only the time values, we need to split our column into two: 'date' and 'time'.

In [None]:
ActivityHour=df_hourlySteps['ActivityHour']
df_hourlySteps[['date','time']] =  ActivityHour.str.split(" ", n=1, expand=True)
df_hourlySteps.head(10)

In [None]:
df_hourlySteps.describe()

In [None]:
# 4. Analyze

## Interpreting key results:
*  Fitbit users take 7638 steps a day, which is considered 'low active'. According to CDC's recommendation most adults should aim for 10,000 steps a day in order to reduce risk of common health problems.
*   Participants spend 21 active minutes a day, which is about 147 active minutes a week. Department of Health and Human Services recommends these exercise guidelines: Aerobic activity. Get at least 150 minutes of moderate aerobic activity or 75 minutes of vigorous aerobic activity a week, or a combination of moderate and vigorous activity.Fitbit users seem to be doing well in this area.
*   According to the U.S. Department of Health and Human Services, the average adult woman expends roughly 1,600 to 2,400 calories per day. But we cannot interpret these findings without age and weight data. 
*   Average Fitbit user sleeps 7 hours a day. According to National Sleep Foundation guidelines, healthy adults need between 7 and 9 hours of sleep per night. Therefore, in this are the participants seem to be getting enough sleep.
*   An average user spends 39 minutes awake in bed. According to Health Central, people should not spend more than 1 hour in bed awake. This is to prevent a mental link being formed between being awake and being in bed, which can lead to insomnia.


# 5. Share

In this part of the analysis we are visualizing our findings and relationships that may exist between our variable in data frames activity and sleep day.

# Figure 1

In [None]:
# import matplotlib package
import matplotlib.pyplot as plt

# plotting scatter plot
plt.style.use("default")
plt.figure(figsize=(8,8)) # specify size of the chart
plt.scatter(df_act.VeryActiveMinutes, df_act.Calories, 
            alpha = 0.8, c = df_act.Calories, 
            cmap = "magma")

# add annotations and visuals
median_calories = 2303
median_activeminutes = 21

plt.colorbar(orientation = "vertical")
plt.axvline(median_activeminutes, color = "Purple", label = "Median active minutes")
plt.axhline(median_calories, color = "Green", label = "Median calories burned")
plt.xlabel("VeryActiveMinutes")
plt.ylabel("Calories")
plt.title("Calories burned for every active minute")
plt.grid(True)
plt.legend()
plt.show()


Figure above shows a positive relationship between daily calories burned and daily active minutes of an average Fitbit User, which means that the more active they were and the more intense physical activity they do,the more calories were burned.

# Figure 2

In [None]:
# import matplotlib package
import matplotlib.pyplot as plt

# plotting scatter plot
plt.style.use("default")
plt.figure(figsize=(8,8)) # specify size of the chart
plt.scatter(df_act.TotalSteps, df_act.Calories, 
            alpha = 0.8, c = df_act.Calories, 
            cmap = "PuBu")

# add annotations and visuals
median_calories = 2303
median_TotalSteps = 7638

plt.colorbar(orientation = "vertical")
plt.axvline(median_TotalSteps, color = "Purple", label = "Median steps taken")
plt.axhline(median_calories, color = "Green", label = "Median calories burned")
plt.xlabel("Total Steps Taken")
plt.ylabel("Calories")
plt.title("The Relationship between Calories Burned and Steps Taken")
plt.grid(True)
plt.legend()
plt.show()


Figure 2 shows another positive relationship between steps taken and calories burned. The more steps the participants took, the more calories they burned.

# Figure 3

In [None]:
# import matplotlib package
import matplotlib.pyplot as plt

# plotting scatter plot
plt.style.use("default")
plt.figure(figsize=(8,8)) # specify size of the chart
plt.scatter(df_sleep.TotalMinutesAsleep, df_sleep.TotalTimeInBed, 
            alpha = 0.8, c = df_sleep.TotalMinutesAsleep, 
            cmap = "PuBu")

# add annotations and visuals
median_TotalTimeInBed = 459
median_TotalMinutesAsleep = 420

plt.colorbar(orientation = "vertical")
plt.axvline(median_TotalTimeInBed, color = "Purple", label = "Total Time in Bed")
plt.axhline(median_TotalMinutesAsleep, color = "Green", label = "Total Minutes Asleep")
plt.xlabel("Total Minutes Asleep")
plt.ylabel("Total Time in Bed")
plt.title("The Time in Bed")
plt.grid(True)
plt.legend()
plt.show()


Figure 3 demonstrates a positive relationship between Total Time in Bed and Total Minutes Asleep. This means that users spend most of their time sleeping in bed, which is healthy.

# Figure 4

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


plt.figure(figsize = (15,8))
ax = sns.barplot(x = 'time', y = 'StepTotal',data = df_hourlySteps)
ax.set_xticklabels(ax.get_xticklabels(),rotation = 30)

As we can see participants are the most active from 5PM to 7PM, and 12PM to 2PM. 

# 6. ACT

In this part of our data analysis we will be providing some recommendations for Bellabeat based on the insights of Fitbit Data. Considering that the data concerned only a small sample of Fitbit Users, I would advice Bellabeat to leverage their own database to make more conlucisive decisions. Nevertheless, we were able to identify some trends in the given dataset and following are the list of recommendations that could be applied to Bellabeat App Users:

1. Majority of the users take 7638 steps a day, which falls into 'low active' category of lifestyle. Bellabeat could encourage their customers to reach a goal of 10,000 steps a day through push notifications.

2. We discovered that people are the most active from 5PM to 7PM. We assume that this is a time when people come home from work, and go to gym. Based on this information, we would advise Bellabeat App to send a notification around this time to remind their users to get some physical exercise. 

3. We witnessed a clear relathionship between calories burned and active minutes. For those users, who are willing to lose weight Bellabeat App could promote fitness programs for their customers, and provide a platform where they could track their calories.

4. The average number of daily sedentary minutes was about 909 minutes. This is relatively low active and could potentially impact sleep quality of the users. Therefore, Bellabeat could add regular reminders (vibrations) to their devices to encourage their users to move around or go for a run, and also add a tracking feauture for their sleep patterns. 

5. Since the target group of Bellabeat is women, expanding the tracking features for reproductive health  could be highly beneficial for the company, and would create a supportive platform to prioritize women's health. 
