# Case study: Bellabeat with Python

### Author- Mayan Derikoma

**BellaBeat**

**Case Study**
How Can a Wellness Technology Company Play It Smart ?

**Introduction**
 
**Scenario**
 You are a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused
 products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the
 global sma device market.
 Urška Sršen, cofounder and Chief Creative officer of Bellabeat, believes that analyzing smart device data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze the data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.
 
 **Characters and products**
 
**Characters**
* Urška Sršen :Bellabeat’s cofounder and Chief Creative Officer
* Sando Mur:Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
* Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have been busy learning about Bellabeat’s mission and business goals — as well as how you, as a junior data analyst, can help Bellabeat achieve them.
 
 **Products**
* Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
* Leaf:Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
* Time:This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
* Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
* Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
 
 **About the company**
 
 Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused sma products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. By 2016, Bellabeat had opened o ces around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to suppo campaigns around key marketing dates.
 Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

#### ASK PHASE
Ask
 Sršen asks you to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. She then wants you to select one Bellabeat product to apply these insights to in your presentation. These questions
 will guide your analysis:
 1. What are some trends in sma device usage?
 2. How could these trends apply to Bellabeat customers?
 3. How could these trends help in uence Bellabeat marketing strategy?
 You will produce a report with the following deliverables:
 1. A clear summary of the business task
 2. A description of all data sources used
 3. Documentation of any cleaning or manipulation of data
 4. A summary of your analysis
 5. Suppo ing visualizations and key ndings
 6. Your top high-level content recommendations based on your analysis
 
 
 **Case Study Roadmap**
 **Guiding questions**
 ● What is the problem you are trying to solve?
 ● How can your insights drive business decisions?
 **Key tasks**
 1. Identify the business task
 2. Consider key stakeholders
**Deliverable**
 A clear statement of the business task

# Prepare Phase
* The dataset used is the Fitbit Fitness Tracker Data gotten from(CCO: Public Domain, through Mobius).
1. The dataset was stored in 18 csv files, and it was downloaded and stored in the special directory.
2. The data was generated by respondents from a distributed survey via Amazon Mechanical Turk between 12 March 2016 to 12 May 2016.
3. This dataset was gotten from 30 eligible FitBit users, who consented to their data being collected and used. It includes minute-level output for Physical activity, heart rate, and sleep monitoring. It includes information about daily activities, steps, and heart rate that can be used to explore users' habits.

**Limitations of the Dataset:**
1. The dataset was collected in the year 2016. It recorded users' daily activity, fitness and sleeping habits, diet. Since 2016 till date, the relevance of the data might be questionble.
2. The sample size is too minute to attribute the results to a greater population.
3. The dataset is from a third party open source data, and so, the integrity and accuracy cannot be fully ascertained.

#### Process Phase

First, we would load the necessary packages to aid the analyses

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#importing the daily activity dataset
df_daily_activity= pd.read_csv("/kaggle/input/fitbit/mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/dailyActivity_merged.csv")
df_daily_activity

In [None]:
#importing the daily heart rate dataset
df_daily_hr= pd.read_csv("/kaggle/input/fitbit/mturkfitbit_export_3.12.16-4.11.16/Fitabase Data 3.12.16-4.11.16/heartrate_seconds_merged.csv")
df_daily_hr

In [None]:
#importing the sleepDay merged dataset
df_daily_sleep= pd.read_csv("/kaggle/input/fitbit/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
df_daily_sleep

In [None]:
#viewing the structure of the imported datasets
df_daily_activity.info()
df_daily_hr.info()
df_daily_sleep.info()

#### The data cleaning and Manipulation process
* Check for null values
* Check for duplicate values
* Confirm datatypes and that each column has the correct datatype
* Data Manipulation
1. Create a new column "DayOfTheWeek" from Date, in the daily_activity dataset
2. Create a new column "TotalActiveMinutes" by adding all the columns with activity
3. Create a new column "TotalHour" for TotalActiveMinutes and TotalSleep in the appropriate datasets
* Split the Time column to get the TimeOnly column
* Rename the appropriate columns
* Reorganize the columns
* Merge the daily_activity dataset with the sleep dataset and get the data set for analysis

In [None]:
#checking for missing values in the daily activity dataset
df_daily_activity.isnull().sum()

In [None]:
#checking for missing values in the daily heart rate dataset
df_daily_hr.isnull().sum()

In [None]:
#checking for missing values in the daily sleep dataset
df_daily_sleep.isnull().sum()

Checking for duplicates in the imported datasets

In [None]:
duplicates_df_daily_activity= print("df_daily_activity=", df_daily_activity.duplicated().sum())

In [None]:
duplicates_df_daily_hr= print("df_daily_hr=", df_daily_hr.duplicated().sum())

In [None]:
duplicates_df_daily_sleep= print("df_daily_sleep=", df_daily_sleep.duplicated().sum())

The daily_sleep dataset has 3 duplicates, we have to find it and delete it

In [None]:
#extracting the duplicted rows
df_daily_sleep.loc[df_daily_sleep.duplicated(), :]

In [None]:
#Dropping the duplicates
df_daily_sleep.drop_duplicates()

The number of rows in our daily_sleep DataFrame has reduced from 413 to 410, which means we have sucessfully deleted the 3 duplicate rows

*On observation, all the dates in the imported datasets are in the wrong format, as object. We need to convert them to datetime. we would start with the daily activity dataset*

In [None]:
#converting ActivityDate from object to datetime
df_daily_activity["ActivityDate"]= pd.to_datetime(df_daily_activity["ActivityDate"],
                                                 format= "%d/%m/%Y")
                                                                                                  
df_daily_activity

In [None]:
#converting Time in heart rate from object to datetime
df_daily_hr["Time"]= pd.to_datetime(df_daily_hr["Time"],
                                   format = "%d/%m%Y/%H/%M/%S")
df_daily_hr

In [None]:
#converting sleepDay to datetime
df_daily_sleep["SleepDay"]= pd.to_datetime(df_daily_sleep["SleepDay"],
                                           format = "%d/%m%Y/%H/%M/%S")
                                          
df_daily_sleep

In [None]:
#finding the unique users of the bellabeat software

unique_id= len(pd.unique(df_daily_activity["Id"]))
unique_id

In [None]:
#We need to extract the Day of the Week from the date. This would help our Analysis
#extracting the day of the week from the date to determine if certain days were more ctive days

df_daily_activity["DayOfTheWeek"]= df_daily_activity["ActivityDate"].dt.day_name()
df_daily_activity["DayOfTheWeek"]

In [None]:
#creating a new column for the total minutes of daily activity
df_daily_activity["TotalMinutes"]= df_daily_activity["VeryActiveMinutes"] + df_daily_activity["FairlyActiveMinutes"] +  df_daily_activity["LightlyActiveMinutes"] + df_daily_activity["SedentaryMinutes"]
df_daily_activity["TotalMinutes"].head()

In [None]:
#Creating a new column "TotalHours" by converting to Hours and rounding up to two decimal places
df_daily_activity["TotalHour"]= round(df_daily_activity["TotalMinutes"]/60)

In [None]:
#Creating a new column "TotalHours" by converting to Hours and rounding up to two decimal places
df_daily_sleep["TotalHourInBed"]= round(df_daily_sleep["TotalTimeInBed"]/60)

In [None]:
#Creating a new column "TotalHoursAsleep" by converting to Hours and rounding up to two decimal places
df_daily_sleep["TotalHourAsleep"]= round(df_daily_sleep["TotalMinutesAsleep"]/60)

In [None]:
#spliiting the Time column in the daily heartrate dataset, to extract time from datatime
df_daily_hr["TimeOnly"]= df_daily_hr["Time"].dt.time
df_daily_hr

In [None]:
df_daily_activity.head()

In [None]:
df_daily_hr.head()

In [None]:
df_daily_sleep.head()

Renaming and Reordering of columns

In [None]:
#renaming the ActivityDate to Date in the daily_activity dataset
df_daily_activity= df_daily_activity.rename(columns={"ActivityDate":"Date", "TotalMinutes":"TotalExerciseMinutes"})
df_daily_activity

In [None]:
#reorganizing the columns
new_cols= ["Id", "Date", "DayOfTheWeek", "TotalSteps", "TotalDistance", "TrackerDistance", "LoggedActivitiesDistance", "VeryActiveDistance",
          "ModeratelyActiveDistance", "LightActiveDistance", "SedentaryActiveDistance", "VeryActiveMinutes", "FairlyActiveMinutes", "LightlyActiveMinutes",
          "SedentaryMinutes", "TotalExerciseMinutes", "TotalHour","Calories"]
df_daily_activity= df_daily_activity.reindex(columns= new_cols)

df_daily_activity.head()

In [None]:
#renaming the Value and ValueInMinutes to Heartrate and HeartrateInMinutes in the Heartrate dataset
df_daily_hr= df_daily_hr.rename(columns={"Time":"Date","Value":"HeartRate"})
df_daily_hr.head()

In [None]:
#rearranging the columns
new_cols= ["Id", "Date", "TimeOnly", "HeartRate"]
df_daily_hr= df_daily_hr.reindex(columns= new_cols)
df_daily_hr.head()

In [None]:
#renaming the SleepDay to Date in the daily_sleep dataset
df_daily_sleep= df_daily_sleep.rename(columns={"SleepDay":"Date"})
df_daily_sleep

In [None]:
#rearranging the sleep Dataframe
new_cols= ["Id", "Date", "TotalSleepRecords", "TotalMinutesAsleep", "TotalHourAsleep",
           "TotalTimeInBed", "TotalHourInBed"]
df_daily_sleep= df_daily_sleep.reindex(columns= new_cols)
df_daily_sleep.head()

In [None]:
df_daily_hr

### Analyze Phase

**The Statistical summary of both DataFrames**

In [None]:
#exclude the Id and date column
cols= set(df_daily_activity.columns)- {"Id", "Date"}
df_daily_activity= df_daily_activity[list(cols)]

df_daily_activity.describe()

In [None]:
#exclude the Id and date column
cols= set(df_daily_sleep.columns)- {"Id", "Date"}
df_daily_sleep= df_daily_sleep[list(cols)]

df_daily_sleep.describe()

In [None]:
cols= set(df_daily_hr.columns)- {"Id", "Date"}
df_daily_hr= df_daily_hr[list(cols)]

df_daily_hr.describe()

### Statistical Summary derived from the Dataset 
We analyzed the Mean/Average, the Minimum and Maximum of the daily activity, sleep and heart rate dataset. Below where the findings
 
* A total of 28,497 steps were taken by 35 distinct users from our dataset. The mean steps taken was 6546.56 with the minimum at 0, while the maximum was 28,497.
* Average calories burned was 2189.45cal. The minimum calories burned was 0, and the maximum was 4562cal.
* The Average time in bed was 7.61hours. The minimum time in bed was 1hour, while the maximum was 16hours.
* The Average exercise time was 20hours. The minimum time a user spent exercising was 5hours, and the maximum time was 24 hours. 
* The average heart rate record ws 79bpm. The minimum heart rate was 36bpm, while the maximum was 185bpm.
* The average sedentary hours for the users was 16.58hours. The minimum sedentary time was 32minutes and the maximum sedentary time was 24hours.

All these are the total time calculated for the timeframe of the analysis.

### SHARE PHASE

We would create visualizations to represents the insights gotten from the datasets. This would help reveal trends

In [None]:
#plotting the total weekly Exercise Time, which is the time spent exercising per week for each participant
plt.figure(figsize= (8,6))
plt.bar(x= df_daily_activity["DayOfTheWeek"],
        height= df_daily_activity["TotalHour"],
        color= "#455A64")
plt.xticks(rotation= 45)
plt.title("Weekly Exercise", fontsize=14, weight= "bold")
plt.show()

In [None]:
#plotting the total weekly Calories burned for each participant
plt.figure(figsize= (8,6))
plt.bar(x= df_daily_activity["DayOfTheWeek"],
        height= df_daily_activity["Calories"],
        color= "#455A64")
plt.xticks(rotation= 45)
plt.title("Weekly Calories", fontsize=14, weight= "bold")
plt.show()

In [None]:
#The total weekly steps for the participants
plt.figure(figsize= (8,6))
plt.bar(x= df_daily_activity["DayOfTheWeek"],
        height= df_daily_activity["TotalSteps"],
        color= "#455A64")
plt.xticks(rotation= 45)
plt.title("Weekly Steps", size=14, weight= "bold")
plt.show()

In [None]:
#The total time spent in the bed for the week
plt.figure(figsize= (8,6))
plt.hist(df_daily_sleep["TotalTimeInBed"],
         bins= 8,
         color= "#8BC34A")
plt.xticks(rotation= 45)
plt.title("Weekly Time In Bed", fontsize=14, weight= "bold")
plt.xlabel("Days", weight= "bold")
plt.ylabel("Time in Bed", weight= "bold")
plt.show()

In [None]:
#The total time spent as sedentary time for the week
plt.figure(figsize= (8,6))
plt.bar(x= df_daily_activity["DayOfTheWeek"],
        height= df_daily_activity["SedentaryMinutes"],
        color= "#8BC34A")
plt.xticks(rotation= 45)
plt.title("Sedentary Time per week", fontsize=14, weight= "bold")
plt.show()

In [None]:
#Comparing the calories burnt with the total steps taken
plt.figure(figsize= (8,8))
scatter= plt.scatter(df_daily_activity["TotalSteps"],
                    df_daily_activity["Calories"],
                    alpha= 0.6,
                    color= "#E74C3C")
plt.title("Relationship between Total Steps and Calories Burnt", size= 14, weight= "bold")
plt.xlabel("Total Steps", weight= "bold")
plt.ylabel("Calories", weight= "bold")
plt.show()

#### SHARE

*Smart device was used to track both the active and sedentary lifestyle of the users*


**Key Findings includes:**
* Weekdays were the most active days recorded by users. Their sedentary minutes was noticed to be during the weekdays.
* This could mean that after a hectic week, the smart device users rested more on weekends. This also suggests that the steps recorded during the week was not as a result of exercise, but from active work.
* The recommended daily steps for adults (adult women) is between 6,000 to 10,000 steps per day, and we can see that the users recorded far below the recommended steps. This reveals that majority of users are lacking beneficial physical activity.

**Rcommendations**
* More personal information is needed like the age, mood and state of mind of participants, as these factors help to influence the willingness for physical activity. If this data is available, more insights can be drawn from users.
* The Bellabeat marketing team should increse campaigns for more weekend physical activity, with incentive for users who exercise more on weekdays. Personalized notifications can be made available to users to help prompt physical activity.
* This campaigns should encourage meaningful physical activities that are beneficial to the health, and also reduces stress.