## Analyzing Data for Business Insight
### A case study of Bellabeat: How Can A Wellness Technology Company Play It Smart?
***
***
## Introduction

The purpose of this article is to derive insights from a given data using data analytical techniques such as exploration, manipulation and visualization. 
The analytical experimentation implemented in this article follows the six data analysis phases of Ask, Prepare, Process, Analyze, Share and Act. Each of the phases and process involved are heighted below. The tools used for this analysis are the Python programming language and the Pandas exploratory library.


## PHASE 1: ASK
#### 1.1 Problem Definition:
Bellabeat, a high-tech manufacturer of health-focused products for women needs the company’s data analytically investigated for marketing purposes.

#### 1.2 Business Objective:
By analyzing the company’s smart device fitness data, Bellabeat hope to gain insight on how consumers are using their smart devices, and then use the obtained data-driven knowledge to unlock new marketing strategy for growth opportunities (Gap analysis). 

#### 1.3 Stakeholders:
- __Urška Sršen__: Cofounder and Chief Creative Officer at Bellabeat.
- __Sando Mur__: Bellabeat’s cofounder and executive team member.
- __Bellabeat’s data analytics team__: Group of data analysts responsible for Bellabeat’s marketing strategy.

#### 1.4 Analytical Tasks:
- Data cleaning and exploration
- Identification of trends
- Mapping trends to business goals
- Highlight data driven marketing strategy via data trends

#### 1.5 Deliverables:
Analytics report summary containing:
- Documentation of data cleaning and manipulation
- Summary of analysis 
- Supporting visualizations and key findings 
- Recommendations based on analysis 


## PHASE 2: PREPARE
The Bellabeat’s FitBit Fitness Tracker Data is a publicly available data which record users’ daily activities from their smart device. The data is made available through Mobius and can be obtained [here](https://www.kaggle.com/arashnic/fitbit).

The data contains information of users’ physical activity such as heart rate, sleep monitor and daily steps which are stored in several CSV file.
The dailyActivity_merged.csv file which contains total steps, distance and activeness information from 30 users of Bellabeat’s Fitness Tracker is used for the analysis.


#### 2.1 Data Limitation
- Outdated data: the data span from March 2016 to May 2016 which is about 5 years obsolete.
- Unrepresentative sample: the data consists of information of only 30 users which is too small to represents a userbase of thousands (sampling bias).
- Bias: since the data is not collected by Bellabeat (first-party data), the credibility, integrity and reliability of the data is not guaranteed.
- Uncomprehensive: the data lacks certain essential features which could enrich analytics such as occupation, location and motivation of users.

Conclusively, the data does not ROCCC (meaning the data is not reliable, original, comprehensive, current nor cited).


#### Load Libraries

In [None]:
import numpy as np 
import pandas as pd 
import datetime as dt 
import matplotlib.pyplot as plt 

#### Load Data

In [None]:
daily_activity = pd.read_csv("../input/dailyactivity-merged/dailyActivity_merged.csv")

## PHASE 3: PROCESS

- Data exploration
- Clean data
- Data transformation

In [None]:
daily_activity.head()

In [None]:
daily_activity.tail()

In [None]:
daily_activity.info()

#### Data Cleaning - Check for null values


In [None]:
missing_count = daily_activity.isnull().sum()

missing_count

#### Data Cleaning - reformat column names to be more readble

In [None]:
col = list(daily_activity.columns)

col

In [None]:
new_col = []

for i in col:
    count = 0
    names = ""
    
    for j in i:
        if count > 0 and j != j.lower():            
            names += "_"
            names += j.lower()
            
        else:
            names += j.lower()
            
        count += 1
    new_col.append(names)       

In [None]:
daily_activity.rename(columns=dict(zip(col, new_col)), inplace=True)

daily_activity.head()

#### Data Exploration - get unique number of users

In [None]:
unique_id = len(pd.unique(daily_activity["id"]))
  
print("IDs: ", unique_id)

There are 33 users, 3 more than the numbers recorded by the data provider. An example of data unreliability


#### Data Cleaning - reformat `activity_date` column to a `year-mm-dy` format

In [None]:
daily_activity["activity_date"] = pd.to_datetime(daily_activity["activity_date"], format="%m/%d/%Y")

daily_activity.head()

#### Data Manipulation - create `week_days` column from the `activity_date` column

In [None]:
daily_activity["week_days"] = daily_activity["activity_date"].dt.day_name()

daily_activity["week_days"].head(7)

#### Data Manipulation - create `total_minutes` column from the summation of the recorded minutes

In [None]:
daily_activity["total_minutes"] = daily_activity["very_active_minutes"] + daily_activity["fairly_active_minutes"] + daily_activity["lightly_active_minutes"] + daily_activity["sedentary_minutes"]

#### Data Manipulation - create `total_hours` column from the summed minutes

In [None]:
daily_activity["total_hours"] = round(daily_activity["total_minutes"] / 60, 2)

In [None]:
daily_activity.head()

## PHASE 4: ANALYZE
- Identify patterns
- Draw conclusion and data-driven decisions
- Obtain recommendation

#### Data overall statistics

In [None]:
daily_activity.describe()

In [None]:
daily_activity['week_days'].value_counts()

In [None]:
daily_activity['total_hours'].value_counts(bins=7)

### Patterns identified

1.	Of the 33 users recorded, the Bellabeat Fitness Tracker is mostly used on weekdays compared to weekends. Also, Tuesdays, Wednesdays and Thursdays rank as the days with the highest users while Saturdays, Sundays and Mondays are the least.
2.	Within the space of 3 month which the data was recorded, the accumulative highest hour spent by the users is between 20-25 hours and the least category of users spent an accumulated hour of 0-5.
3.	Most users who log their activity on the app are seated, representing about 80% of time logged in while the fairly active members are the least with 1.1%.
4.	The lightly active users covered the most distance with about 62%

***
***

## PHASE 5: SHARE

- Create visuals to communicate analytical finding.


#### Plotting histogram of hour usage duration 


In [None]:
plt.style.use("default")
plt.figure(figsize=(6,4)) 
plt.hist(daily_activity.total_hours, bins = 6, 
         width = 1.0, color = "lime", edgecolor = "black")
plt.xlabel("Hour Range")
plt.ylabel("Frequency")
plt.title("Duration of Usage in Hour")
plt.show()



#### Plotting pie chart of Activeness Percentile in Minutes

In [None]:
def display_pie(item, x, titles):
    very_active_minutes = daily_activity[item[0]].sum()
    fairly_active_minutes = daily_activity[item[1]].sum()
    lightly_active_minutes = daily_activity[item[2]].sum()
    sedentary_minutes = daily_activity[item[3]].sum()

    slices = [very_active_minutes, fairly_active_minutes, lightly_active_minutes, sedentary_minutes]
    labels = ["Very active", "Fairly active", "Lightly active", "Sedentary"]
    colours = ["lime", "firebrick", "lightskyblue", "deeppink"]
    explode = [0, 0, 0, x]
    plt.style.use("default")
    plt.pie(slices, labels = labels, 
            colors = colours, wedgeprops = {"edgecolor": "black"}, 
            explode = explode, autopct = "%1.1f%%")
    plt.title(titles)
    plt.tight_layout()
    plt.show()

In [None]:
label = ["very_active_minutes", "fairly_active_minutes", "lightly_active_minutes", "sedentary_minutes"]

display_pie(label, 0.1, "Activeness Percentile in Minutes")

#### Plotting pie chart of Activeness Percentile in Distance

In [None]:
label = ["very_active_distance", "moderately_active_distance", "light_active_distance", "sedentary_active_distance"]

display_pie(label, 0, "Activeness Percentile in Distance")

#### Plotting bar chart of users per day


In [None]:
plt.style.use("default")
plt.figure(figsize=(6,4)) 
plt.hist(daily_activity.week_days, bins = 7, 
         width = 0.6, color = "lime", edgecolor = "black")
plt.xlabel("Week Days")
plt.ylabel("Frequency")
plt.title("Number of Users per Week Days")
#plt.grid(True)
plt.show()

#### Plotting scatter plot of Culmulative Calories Lost per Step

In [None]:
plt.style.use("default")
plt.figure(figsize=(8,6)) 
plt.scatter(daily_activity.total_steps, daily_activity.calories, 
            alpha = 0.8, c = daily_activity.calories, 
            cmap = "seismic")

median_calories = 2303
median_steps = 7637

plt.colorbar(orientation = "vertical")
plt.axvline(median_steps, color = "Blue", label = "Median steps")
plt.axhline(median_calories, color = "Red", label = "Median calories lost")
plt.xlabel("Total Steps")
plt.ylabel("Calories Lost")
plt.title("Culmulative Calories Lost per Step")
plt.grid(True)
plt.legend()
plt.show()

## PHASE 6: ACT

#### Identified Insights:
1. The Bellabeat Fitness Tracker is mostly used on Tuesdays, Wednesdays and Thursdays, while rarely used on weekends and on Mondays.
2. Most users who log their activity are sedentary (seated), representing about 80% of total time logged in.
3. The more step taken by user, the more calories lost. Users with above 3000 steps lost the most calories.

#### Solve Business Problem:
Recall that the business aim is to gain insight to unlock new marketing strategy for growth.
1. Users often forget to use Bellabeat Fitness Tracker on weekends and Mondays.
2. Users do not log into the app when doing active chores, they are almost always logged in when seated.
3. Walking exercises and calorie loss are correlated.

#### Make Recommendations and Decision.
1.	Set reminders or notification messages to users on weekends and Mondays to remind them of Bellabeat Fitness Tracker usage.
2.	Educate users on how to use the Bellabeat Fitness Tracker especially during active chores, this would help it to track their activity and proffer best routine.
3.	Since the activity level directly defines the loss of calories, when users are motivated and reminded to use the app, their activity level increases and so does their calorie loss, thereby encouraging them to advertise the product to family and friends.


### All of this would lead to growth for Bellabeat!