# Bellabeat: How Can A Wellness Technology Company Play It Smart?

## STEP 1: ASK

**Background**
Bellabeat is a high-tech manufacturer of beautifully-designed health-focused smart products for women since 2013. Inspiring and empowering women with knowledge about their own health and habits, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for females.

The co-founder and Chief Creative Officer, Urška Sršen is confident that an analysis of non-Bellebeat consumer data (ie. FitBit fitness tracker usage data) would reveal more opportunities for growth. 

**Business Task**
Analyze FitBit Fitness Tracker Data to gain insights into how consumers are using the FitBit app and discover trends and insights for Bellabeat marketing strategy.

**Business Objectives:**

* What are the trends identified?
* How could these trends apply to Bellabeat customers?
* How could these trends help influence Bellabeat marketing strategy?

**Key Stakeholders:**
* Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
* Sando Mur: Mathematician, Bellabeat’s cofounder and key member of the Bellabeat executive team
* Bellabeat marketing analytics team: A team of data analysts guiding Bellabeat's marketing strategy.

## STEP 2: PREPARE

**Information on Data Source:**<br>
The data is publicly available on Kaggle: FitBit Fitness Tracker Data and stored in 18 csv files.
Generated by respondents from a distributed survey via Amazon Mechanical Turk between 12 March 2016 to 12 May 2016.
30 FitBit users who consented to the submission of personal tracker data.
Data collected includes:<br>
(1) physical activity recorded in minutes, <br>
(2) heart rate, <br>
(3) sleep monitoring,<br> 
(4) daily activity and <br>
(5) steps.<br>

**Limitations of Data Set:**<br>
Data collected from year 2016. Users' daily activity, fitness and sleeping habits, <br>diet and food consumption may have changed since then, hence data may not be timely or relevant.
<br><br>
Sample size of 30 female FitBit users is not representative of the entire female population.
As data is collected in a survey, hence unable to ascertain the integrity or accuracy of data.

**Is Data ROCCC?**<br>
A good data source is ROCCC which stands for Reliable, Original, Comprehensive, Current, and Cited.<br>

1) Reliable - LOW - Not reliable as it only has 30 respondents<br>
2) Original - LOW - Third party provider (Amazon Mechanical Turk)<br>
3) Comprehensive - MED - Parameters match most of Bellabeat's products' parameters<br>
4) Current - LOW - Data is 5 years old and is not relevant<br>
5) Cited - LOW - Data collected from third party, hence unknown<br>
6) Overall, the dataset is considered bad quality data and it is not recommended <br>to produce business recommendations based    on this data.<br>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import math
import random

### Importing datasets 

In [None]:
activities = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
intensities = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
sleep = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

### Data cleaning 

In [None]:
activities.head()

In [None]:
# converting type from object -to-> datetime
activities['ActivityDate'] = pd.to_datetime(activities['ActivityDate'], dayfirst = True)

In [None]:
activities = activities[['Id', 'TotalSteps', 'TotalDistance', 'SedentaryMinutes', 'Calories']]

In [None]:
calories.head()

In [None]:
calories['ActivityDay'] = pd.to_datetime(calories['ActivityDay'], dayfirst = True)

In [None]:
intensities.head()

In [None]:
intensities['ActivityDay'] = pd.to_datetime(intensities['ActivityDay'], dayfirst = True)

In [None]:
sleep.head()

In [None]:
# we remove time od sleep since all rows have the same time (12:00:00 AM)
sleep['SleepDay'] = pd.to_datetime(sleep['SleepDay'], dayfirst = True)
sleep.head()

In [None]:
weight.head()

In [None]:
# we keep only useful columns
weight = weight[['Id', 'Date', 'WeightKg', 'BMI']]
weight['Date'] = pd.to_datetime(weight['Date'], dayfirst = True, utc = True)
weight.head()

### Data Exploration 

In [None]:
# Number of participants in each dataset

print("Number of people tested for activities" + " is: " + str(activities['Id'].nunique()))
print("Number of people tested for calories" + " is: " + str(calories['Id'].nunique()))
print("Number of people tested for intensities" + " is: " + str(intensities['Id'].nunique()))
print("Number of people tested for sleep" + " is: " + str(sleep['Id'].nunique()))
print("Number of people tested for weight" + " is: " + str(weight['Id'].nunique()))

In [None]:
activities[['TotalSteps', 'TotalDistance', 'SedentaryMinutes', 'Calories']].describe()

In [None]:
# Since calories burnt by a human body through out the day cannot be 0, we need to remove those rows
activities[activities['Calories'] == 0]

In [None]:
activities.drop(labels = [30, 653, 817, 879],axis = 0, inplace = True)
activities[['TotalSteps', 'TotalDistance', 'SedentaryMinutes', 'Calories']].describe()

In [None]:
# Similarly we remove other 
empty_rows = list(activities[activities['TotalSteps'] == 0].index) + list(activities[activities['TotalDistance'] == 0].index)\
                + list(activities[activities['SedentaryMinutes'] == 0].index)
len(empty_rows)

In [None]:
activities.drop(labels = empty_rows,axis = 0, inplace = True)
activities.describe()


In [None]:
# What percentage of Women do not burn enough calories

min_calories_burn = 2000
num_people_cal_burn = len(activities[activities['Calories'] < min_calories_burn])
total_people = len(activities)

print("Percentage of Women that burn less than " + str(min_calories_burn) + " calories everyday = " + str(round(num_people_cal_burn*100/total_people, 2)))

In [None]:
# What percentage of Women do not complete required number of steps each day

min_steps = 5000
num_people_lacking = len(activities[activities['TotalSteps'] < min_steps])
total_people = len(activities)

print("Percentage of Women that do not complete " + str(min_steps) + " steps everyday = " + str(round(num_people_lacking*100/total_people, 2)))

In [None]:
# Converting id type: int -to-> str
activities['Id'] = activities['Id'].astype(str)
activities.dtypes

In [None]:
# Converting id type: int -to-> str
calories['Id'] = calories['Id'].astype(str)

# remove rows with calories = 0
empty_rows = list(calories[calories['Calories'] == 0].index)
calories.drop(labels = empty_rows,axis = 0, inplace = True)


In [None]:
calories.describe()

In [None]:
# Converting id type: int -to-> str
sleep['Id'] = sleep['Id'].astype(str)

sleep.describe()

In [None]:
# Converting id type: int -to-> str
weight['Id'] = weight['Id'].astype(str)
weight.describe()

### Summary of  some interesting descriptive statistics:

* Average sedentary time = 16 hours. This definitely need to be reduced
* On average, participants sleep only once a day for 7 hours (appropriate)
* Almost 40% women do not burn enough calories everyday (less than 2000 calories)
* Almost 68% women do not lead an active lifestyle (10000 steps each day is considered an active lifestyle. <br>
  This is recommended by the Center of Disease Control and prevention (CDC) )
* Almost 32% women lead a sedentary lifestyle (less than 5000 steps each day)

### Data Analysis and Vizualization

In [None]:
weight['Month'] = weight['Date'].apply(lambda x: x.month)

In [None]:
weight.head()

In [None]:
# How BMI of the participants have changed over a period of time (in months)

people_wt = list(weight['Id'].unique())

for person in people_wt:
    df = weight[weight['Id'] == person]
    sns.lineplot(x='Month', y = 'BMI', data = df)
    plt.ylim([20,30])

In [None]:
# Comparison between total distance travelled and calories burnt

sns.set_theme(color_codes = True)
x = activities['Calories']
y = activities['TotalDistance']
plt.figure(figsize = (12, 8))
sns.regplot(x = x, y=y, data=activities, marker = "+")
plt.title("Calories burnt vs. Total distance travelled")
plt.show()

In [None]:
activities.head()

In [None]:
# Adding a column called weekday

activities = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
activities['ActivityDate'] = pd.to_datetime(activities['ActivityDate'], dayfirst = True)

def weekday_name(x):
    daynames = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    return daynames[x]

activities["Weekday"] = activities["ActivityDate"].map(lambda x: x.weekday()).map(lambda x: weekday_name(x))

In [None]:
steps_by_weekday = activities.groupby(['Weekday'])['TotalSteps'].mean()
steps_by_weekday = steps_by_weekday[['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']]
steps_by_weekday

In [None]:

plt.figure(figsize = (12, 8))
sns.lineplot(x = steps_by_weekday.index, y = steps_by_weekday.values, data = steps_by_weekday)
plt.ylabel("Average No. of steps taken")
plt.title("Steps taken by weekday", fontsize = 16)

### What the data says?
* Participants walked the highest number of steps on Saturday while the least number of<br> 
  steps were walked on Monday followed by Friday, Tuesday and Sunday.<br>

### How can we use this data?
* We can add a feature in the Bellabeat app that notifies the users to walk more on the less active <br>
  days in advance while also quoting the highest number of steps they took on a day (mostly Saturday) **to motivate them**.

In [None]:
# Finding the No. of times users logged in app across the week

num_weekday = activities.groupby(["Weekday"])["Weekday"].count()
num_weekday = num_weekday[['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']]

plt.figure(figsize = (10, 7))
sns.barplot(x =num_weekday.index, y = num_weekday.values, color = 'teal')
plt.ylabel("Frequency")
plt.title("No. of times users logged in app across the week", fontsize = 16)

* People use the fitness app, least on Monday and mid-week.
* Providing extra notifications on these days may motivate the users to be active.

In [None]:
activemins = activities.groupby(["Weekday"])[['VeryActiveMinutes', 'FairlyActiveMinutes', 'LightlyActiveMinutes', 'SedentaryMinutes']].sum()
activemins = activemins.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
activemins

In [None]:
fig, axes = plt.subplots(3,2, figsize = (14, 16))

for i, (idx, row) in enumerate(activemins[:-1].iterrows()):
    
    ax = axes[i//2, i%2]
    colors = ["lightcoral", "yellowgreen", "lightskyblue", "darkorange"]
    explode = [0, 0, 0, 0.1]
    row = row[row.gt(row.sum() * .01)]
    ax.pie(row, colors = colors, autopct = "%1.1f%%", wedgeprops = {"edgecolor":"black"})
    ax.set_title(idx)
    ax.legend(row.index)
    
fig.subplots_adjust(wspace = 0.1)

* Above is a breakdown of how much activity do people do by percentage time <br>of the day
  throughout the week.
  <br><br>
* Saturday has the least sedentary time while Monday has the most sendentary time.
<br><br>
* Similarly Saturday has the highest fairly active minutes (people <br>spending the time to roam around)
  while Monday has the least.

In [None]:
activities["Total_hours"] = (activities["VeryActiveMinutes"] + activities["FairlyActiveMinutes"] + 
activities["LightlyActiveMinutes"] + activities["SedentaryMinutes"]) / 60

In [None]:
plt.style.use("default")
plt.figure(figsize=(8,6)) # Specify size of the chart
plt.scatter(activities.Total_hours, activities.Calories, 
            alpha = 0.8, c = activities.Calories, 
            cmap = "Spectral")

# adding annotations and visuals
median_calories = 2303
median_hours = 20
median_sedentary = 991 / 60

plt.colorbar(orientation = "vertical")
plt.axvline(median_hours, color = "Blue", label = "Median steps")
plt.axvline(median_sedentary, color = "Purple", label = "Median sedentary")
plt.axhline(median_calories, color = "Red", label = "Median hours")
plt.xlabel("Hours logged")
plt.ylabel("Calories burned")
plt.title("Calories burned for every hour logged")
plt.legend()
plt.grid(True)
plt.show()

**From the scatter plot, we discovered that:**

It is a positive correlation.

We observed that intensity of calories burned increase when users are at the range of > 0 to 15,000 steps with calories burn rate cooling down from 15,000 steps onwards.

**Noted a few outliers:**

Zero steps with zero to minimal calories burned.
1 observation of > 35,000 steps with < 3,000 calories burned.
Deduced that outliers could be due to natural variation of data, change in user's usage or errors in data collection (ie. miscalculations, data contamination or human error). 

## Summarizing our Analysis 

In the final step, we will be delivering our insights and providing recommendations based on our analysis.

Here, we revisit our business questions and share with you our high-level business recommendations.

1. **What are the trends identified?**

Majority of the time (~ 80%) the users are using the FitBit app while doing sedentary activities and not for tracking their health habits.

Users prefer to track their activities during weekdays as compared to weekends - perhaps because they spend more time outside on weekdays and stay in on weekends.

* Almost 40% women do not burn enough calories everyday (less than 2000 calories)
* Almost 68% women do not lead an active lifestyle (10000 steps each day is considered an active lifestyle. <br>
  This is recommended by the Center of Disease Control and prevention (CDC) )
* Almost 32% women lead a sedentary lifestyle (less than 5000 steps each day)

2. **How could these trends apply to Bellabeat customers?**

The company should focus on developing and including features for their app that helps their users to stay motivated
to remain fit and active when they are the lowest on motivation.<br>
This can be done using a reward points based system, where a person gets rewarded for burning more calories, remaining consistent in their workout routine for the longest number of days etc.


3. **How could these trends help influence Bellabeat marketing strategy?**

Bellabeat marketing team can encourage users by educating and equipping them with knowledge about their own fitness and routine activity habits, fitness benefits, suggest different types of exercise (ie. simple 10 minutes exercise on weekday, especially Monday and a more intense exercise on weekends). The app can make the users aware of their calory intake and burnt rate information to be vigilant about their habits.