# <span style = 'color: dimgray'> ***Bellabeat*** </span>

### <span style='color: navy'> **Key Questions:** </span>
### 1. What are some trends in smart device usage?
### 2. How could these trends apply to Bellabeat customers?
### 3. How could these trends help influence Bellabeat marketing strategy?

### <span style = 'color: brown'> Importing packages needed for manipulating data and visualizations </span>

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### <span style ='color:brown'> Data Selected For Analysis </span>
#### dailyActivity_merged.csv, weightLogInfo_merged.csv, sleepDay_merged.csv


In [7]:
daily_act = pd.read_csv('/kaggle/input/fitbit/dailyActivity_merged.csv')
daily_sleep = pd.read_csv('/kaggle/input/fitbit/sleepDay_merged.csv')
weight_log = pd.read_csv('/kaggle/input/fitbit/weightLogInfo_merged.csv')

### <span style ='color:brown'> Inspect Data </span>

##### Daily Activity

In [8]:
daily_act.head()

Unnamed: 0,Id,ActivityDate,TotalSteps,TotalDistance,TrackerDistance,LoggedActivitiesDistance,VeryActiveDistance,ModeratelyActiveDistance,LightActiveDistance,SedentaryActiveDistance,VeryActiveMinutes,FairlyActiveMinutes,LightlyActiveMinutes,SedentaryMinutes,Calories
0,1503960366,4/12/2016,13162,8.5,8.5,0.0,1.88,0.55,6.06,0.0,25,13,328,728,1985
1,1503960366,4/13/2016,10735,6.97,6.97,0.0,1.57,0.69,4.71,0.0,21,19,217,776,1797
2,1503960366,4/14/2016,10460,6.74,6.74,0.0,2.44,0.4,3.91,0.0,30,11,181,1218,1776
3,1503960366,4/15/2016,9762,6.28,6.28,0.0,2.14,1.26,2.83,0.0,29,34,209,726,1745
4,1503960366,4/16/2016,12669,8.16,8.16,0.0,2.71,0.41,5.04,0.0,36,10,221,773,1863


In [9]:
daily_act.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Id                        940 non-null    int64  
 1   ActivityDate              940 non-null    object 
 2   TotalSteps                940 non-null    int64  
 3   TotalDistance             940 non-null    float64
 4   TrackerDistance           940 non-null    float64
 5   LoggedActivitiesDistance  940 non-null    float64
 6   VeryActiveDistance        940 non-null    float64
 7   ModeratelyActiveDistance  940 non-null    float64
 8   LightActiveDistance       940 non-null    float64
 9   SedentaryActiveDistance   940 non-null    float64
 10  VeryActiveMinutes         940 non-null    int64  
 11  FairlyActiveMinutes       940 non-null    int64  
 12  LightlyActiveMinutes      940 non-null    int64  
 13  SedentaryMinutes          940 non-null    int64  
 14  Calories  

In [10]:
daily_act.isnull().sum()

Id                          0
ActivityDate                0
TotalSteps                  0
TotalDistance               0
TrackerDistance             0
LoggedActivitiesDistance    0
VeryActiveDistance          0
ModeratelyActiveDistance    0
LightActiveDistance         0
SedentaryActiveDistance     0
VeryActiveMinutes           0
FairlyActiveMinutes         0
LightlyActiveMinutes        0
SedentaryMinutes            0
Calories                    0
dtype: int64

In [11]:
daily_act.shape

(940, 15)

In [12]:
daily_act.Id.nunique()

33

In [13]:
print(f'Start Date : {daily_act.ActivityDate.min()}')
print(f'End Date : {daily_act.ActivityDate.max()}')

Start Date : 4/12/2016
End Date : 5/9/2016


###### **Daily Activity Datasets contains 15 columns of features and 33 unique users. Measured date is from 4/12/2016 to 5/9/2016. Features contain total steps daily, total distance travelled daily and so on.** 

##### Daily Sleeps

In [14]:
daily_sleep.head()

Unnamed: 0,Id,SleepDay,TotalSleepRecords,TotalMinutesAsleep,TotalTimeInBed
0,1503960366,4/12/2016 12:00:00 AM,1,327,346
1,1503960366,4/13/2016 12:00:00 AM,2,384,407
2,1503960366,4/15/2016 12:00:00 AM,1,412,442
3,1503960366,4/16/2016 12:00:00 AM,2,340,367
4,1503960366,4/17/2016 12:00:00 AM,1,700,712


In [15]:
daily_sleep.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 413 entries, 0 to 412
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Id                  413 non-null    int64 
 1   SleepDay            413 non-null    object
 2   TotalSleepRecords   413 non-null    int64 
 3   TotalMinutesAsleep  413 non-null    int64 
 4   TotalTimeInBed      413 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 16.3+ KB


In [16]:
daily_sleep.isnull().sum()

Id                    0
SleepDay              0
TotalSleepRecords     0
TotalMinutesAsleep    0
TotalTimeInBed        0
dtype: int64

In [17]:
daily_sleep.Id.nunique()

24

In [18]:
print(f'Start Date is {min(daily_sleep.SleepDay)}')
print(f'End Date is {max(daily_sleep.SleepDay)}')

Start Date is 4/12/2016 12:00:00 AM
End Date is 5/9/2016 12:00:00 AM


###### **Daily Sleep data only contains 24 out of 33 users. Recorded time is sames as the activity dataset. Therefore, each dataset will be analyzed separately.**

##### Weight Log Info

In [19]:
weight_log.head()

Unnamed: 0,Id,Date,WeightKg,WeightPounds,Fat,BMI,IsManualReport,LogId
0,1503960366,5/2/2016 11:59:59 PM,52.599998,115.963147,22.0,22.65,True,1462233599000
1,1503960366,5/3/2016 11:59:59 PM,52.599998,115.963147,,22.65,True,1462319999000
2,1927972279,4/13/2016 1:08:52 AM,133.5,294.31712,,47.540001,False,1460509732000
3,2873212765,4/21/2016 11:59:59 PM,56.700001,125.002104,,21.450001,True,1461283199000
4,2873212765,5/12/2016 11:59:59 PM,57.299999,126.324875,,21.690001,True,1463097599000


In [20]:
weight_log.isnull().sum()

Id                 0
Date               0
WeightKg           0
WeightPounds       0
Fat               65
BMI                0
IsManualReport     0
LogId              0
dtype: int64

In [21]:
weight_log.Id.nunique()

8

###### **The weight data only contains 8 users, which is not siginificant at this time. We will ignore this data in this analysis.** 

### <span style = 'color: brown'> Checking Duplicates </span>

In [22]:
daily_act.duplicated().sum()

0

In [23]:
daily_sleep.duplicated().sum()

3

###### **Found 3 duplicates rows in daily sleep data. Will drop them for unbiased analysis results**

In [24]:
daily_sleep.drop_duplicates(inplace = True)

### <span style= 'color: brown'> Prepare for Analysis </span>

#### Daily Activity

In [25]:
daily_act.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Id                        940 non-null    int64  
 1   ActivityDate              940 non-null    object 
 2   TotalSteps                940 non-null    int64  
 3   TotalDistance             940 non-null    float64
 4   TrackerDistance           940 non-null    float64
 5   LoggedActivitiesDistance  940 non-null    float64
 6   VeryActiveDistance        940 non-null    float64
 7   ModeratelyActiveDistance  940 non-null    float64
 8   LightActiveDistance       940 non-null    float64
 9   SedentaryActiveDistance   940 non-null    float64
 10  VeryActiveMinutes         940 non-null    int64  
 11  FairlyActiveMinutes       940 non-null    int64  
 12  LightlyActiveMinutes      940 non-null    int64  
 13  SedentaryMinutes          940 non-null    int64  
 14  Calories  

##### Change the ActivityDate column to datetime object and rename to Date

In [26]:
daily_act['ActivityDate'] = pd.to_datetime(daily_act['ActivityDate'])
daily_act = daily_act.rename(columns = {
    'ActivityDate' : 'Date'
})
daily_act.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 940 entries, 0 to 939
Data columns (total 15 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Id                        940 non-null    int64         
 1   Date                      940 non-null    datetime64[ns]
 2   TotalSteps                940 non-null    int64         
 3   TotalDistance             940 non-null    float64       
 4   TrackerDistance           940 non-null    float64       
 5   LoggedActivitiesDistance  940 non-null    float64       
 6   VeryActiveDistance        940 non-null    float64       
 7   ModeratelyActiveDistance  940 non-null    float64       
 8   LightActiveDistance       940 non-null    float64       
 9   SedentaryActiveDistance   940 non-null    float64       
 10  VeryActiveMinutes         940 non-null    int64         
 11  FairlyActiveMinutes       940 non-null    int64         
 12  LightlyActiveMinutes  

#### Create Day_of_Week and Is_Weekend variables to see how weekday activities compare to weekends.

In [27]:
# Make a copy of the dataset
daily_act_1 = daily_act.copy()

In [28]:
daily_act_1['Day_of_Week'] = daily_act_1['Date'].dt.day_of_week

# Create Is_Weekend variable to indicate whether that day is weekend
daily_act_1['Is_Weekend'] = daily_act_1['Day_of_Week'] > 4

# Map Day_of_Week variable to actually represent each day in a week
daily_act_1['Day_of_Week'] = daily_act_1['Day_of_Week'].map({
    0: 'Monday',
    1: 'Tuesday',
    2: 'Wednesday',
    3: 'Thursday',
    4: 'Friday',
    5: 'Saturday',
    6: 'Sunday'
})

In [29]:
daily_act_1.head()

Unnamed: 0,Id,Date,TotalSteps,TotalDistance,TrackerDistance,LoggedActivitiesDistance,VeryActiveDistance,ModeratelyActiveDistance,LightActiveDistance,SedentaryActiveDistance,VeryActiveMinutes,FairlyActiveMinutes,LightlyActiveMinutes,SedentaryMinutes,Calories,Day_of_Week,Is_Weekend
0,1503960366,2016-04-12,13162,8.5,8.5,0.0,1.88,0.55,6.06,0.0,25,13,328,728,1985,Tuesday,False
1,1503960366,2016-04-13,10735,6.97,6.97,0.0,1.57,0.69,4.71,0.0,21,19,217,776,1797,Wednesday,False
2,1503960366,2016-04-14,10460,6.74,6.74,0.0,2.44,0.4,3.91,0.0,30,11,181,1218,1776,Thursday,False
3,1503960366,2016-04-15,9762,6.28,6.28,0.0,2.14,1.26,2.83,0.0,29,34,209,726,1745,Friday,False
4,1503960366,2016-04-16,12669,8.16,8.16,0.0,2.71,0.41,5.04,0.0,36,10,221,773,1863,Saturday,True


#### Export data to Tableau for analysis and visualizations

In [30]:
# daily_act_1.to_csv('Daily_Act_Analysis.csv')

### <span style= 'color: brown'> Tableau Dashboard </span>

In [31]:
%%html 
<div class='tableauPlaceholder' id='viz1687294367096' style='position: relative'><noscript><a href='#'><img alt='Daily Dashboard ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Co&#47;CourseraFitBitAnalysis&#47;DailyDashboard&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='CourseraFitBitAnalysis&#47;DailyDashboard' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Co&#47;CourseraFitBitAnalysis&#47;DailyDashboard&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1687294367096');                    var vizElement = divElement.getElementsByTagName('object')[0];                    if ( divElement.offsetWidth > 800 ) { vizElement.style.width='1366px';vizElement.style.height='795px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='1366px';vizElement.style.height='795px';} else { vizElement.style.width='100%';vizElement.style.height='1427px';}                     var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>