# **Bellabeat activity trends**
### A study in smart device user habits

## **Objective**
1. <font size="4">To indentify trends and patterns of smart device users</font>
2. <font size="4">Discuss potential areas for further exploration</font> 

## **Data Overview**

### **Data source:**

<font size="4">The data source for this analysis is public domain Fitbit fitness tracker data generated via survey between 3/12/2016
and 5/12/2016. The datasets are comprised of personal tracker data from thirty consenting Fitbit users.</font>

### **Limitations:**

<font size="4">The sample size of thirty users may not best represent the broader demographic of users. Also, the ages and genders of
the users are undefined, therefore some context is missing regarding the data. Activities in addition are unspecified and
were seldom logged by the users.</font>

## **Analysis**

In [None]:
# I began by importing the numpy, pandas, matplotlib and seaborn packages

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

In [None]:
# I imported the dataset and ran the head and describe functions to familiarize myself with the data

daily_activity = pd.read_csv('../input/bellabeat/dailyActivity_merged.csv')

daily_activity.head()

In [None]:
daily_activity.describe()

In [None]:
# Here I plotted a histogram to get a sense of the distribution of different categories of data

daily_activity.hist(figsize=(15, 12))
plt.show()

In [None]:
'''
I noticed that under sedentary minutes there were a number of records showing 1440 minutes (24 hours) of sedentary activity as well as 0 total steps throughout the day. 
I used boolean masks to filter out those records and saved the filtered results to a new dataframe.
'''

In [None]:
daily_activity[daily_activity.SedentaryMinutes == 1440] 

In [None]:
daily_activity = daily_activity[daily_activity.SedentaryMinutes < 1440]

daily_activity

In [None]:
daily_activity[daily_activity.TotalSteps == 0] 

In [None]:
daily_activity = daily_activity[daily_activity.TotalSteps > 0] 

In [None]:
# I then plotted a histogram and ran the describe function with the revised data

daily_activity.hist(figsize=(15, 12))
plt.show()

In [None]:
daily_activity.describe()


In [None]:
# Here I ran some boolean masks to filter the data and examine it for any trends or notables

In [None]:
daily_activity[daily_activity.TotalSteps < 200] 

In [None]:
daily_activity[daily_activity.LoggedActivitiesDistance == 0] 

In [None]:
daily_activity[daily_activity.LoggedActivitiesDistance > 0]

In [None]:
print(daily_activity[daily_activity.LoggedActivitiesDistance > 0].Id.count)

In [None]:
33/856

# NOTE: Only 3.9% of activities were logged. 
# Only 4 of 30 users logged activities in total

In [None]:
daily_activity.groupby('ActivityDate').TotalSteps.mean().plot(figsize=(15, 12), kind='barh')
plt.show()


In [None]:
'''
I exported the revised dataset and added the day of week column in excel using the =WEEKDAY function where Sunday = 1 and Saturday = 7. 
I then imported the revised dataset and proceeded with the analysis grouping by the day of week to further analyze for trends.
'''

In [None]:
# daily_activity.to_csv('../input/bellabeat/daily_activity.csv', index = False) -- commented out due to read only

In [None]:
daily_activity = pd.read_csv('../input/bellabeat/csv_files/daily_activity2.csv')

daily_activity.head()

In [None]:
daily_activity.groupby('DayofWeek').TotalSteps.mean().plot(figsize=(10, 4), kind='barh')
plt.show()

print(daily_activity.groupby('DayofWeek').TotalSteps.mean())

In [None]:
daily_activity.groupby('DayofWeek').VeryActiveMinutes.mean().plot(figsize=(10, 4), kind='barh')
plt.show()

print(daily_activity.groupby('DayofWeek').VeryActiveMinutes.mean())

In [None]:
daily_activity.groupby('DayofWeek').LightlyActiveMinutes.mean().plot(figsize=(10, 4), kind='barh')
plt.show()

print(daily_activity.groupby('DayofWeek').LightlyActiveMinutes.mean())

In [None]:
daily_activity.groupby('DayofWeek').SedentaryMinutes.mean().plot(figsize=(10, 4), kind='barh')
plt.show()

print(daily_activity.groupby('DayofWeek').SedentaryMinutes.mean())

In [None]:
daily_activity.groupby('DayofWeek').VeryActiveMinutes.sum().plot(figsize=(10, 4), kind='barh')
plt.show()

print(daily_activity.groupby('DayofWeek').VeryActiveMinutes.sum())

In [None]:
'''
At this stage I ws beginning to notice a few trends. Saturdays had the highest overall activity on average measured by total steps, followed by Tuesdays. 
Tuesday was the highest day of very active minutes and calories burned followed by Monday and Saturday respectively. 
There was a declining trend in very active minutes from Tuesday to Friday, picking up again on Saturday. Saturday lead all days in fairly and lightly active minutes, followed by Tuesday. 
The most sedentary minutes were on Mondays and users overall tended to be more sedentary during weekdays in contrast to weekends. Only 4 of 30 users logged activity distance.
'''

In [None]:
daily_activity

In [None]:
'''
Till this point, I was exploring daily patterns. I decided to explore hourly patterns to see if I could gain any additional insight. I added a DayofWeek column to the dailyintensities and hourlyintensities datasets in excel and imported them for analysis.
'''

In [None]:
daily_intensity = pd.read_csv('../input/bellabeat/csv_files/dailyIntensities.csv')

hourly_intensity = pd.read_csv('../input/bellabeat/csv_files/hourlyIntensities.csv')

daily_intensity.head()


In [None]:
hourly_intensity.head()

In [None]:
daily_intensity.hist(figsize=(15, 12))
plt.show()


In [None]:
hourly_intensity.hist(figsize=(10, 8))
plt.show()


In [None]:
hourly_intensity.groupby('DayofWeek').AverageIntensity.mean().plot(figsize=(10, 4), kind='barh')
plt.show()

print(hourly_intensity.groupby('DayofWeek').AverageIntensity.mean())

In [None]:
hourly_intensity.describe()

<font size="4">I continued my analysis of hourly trends in Tableau, creating the following dashboard:</font>

[User activity dashboard](https://public.tableau.com/views/Bellabeat2/Dashboard1?:language=en-US&:display_count=n&:origin=viz_share_link)

## **Conclusion**

* <font size="4">The data revealed that the users were simultaneously the least sedentary and the most engaged in activities on Saturdays. Users tended to engage in activities as early as 12am on Saturday through 2am, carrying over from late night Friday activities.</font>
  
* <font size="4">The probable times of intense exercise per the heartrate data were primarily Saturdayfrom 8am to 10am, Sunday around 7pm, Monday between 4pm and 5pm and Tuesday around 6pm.</font>

* <font size="4">Users were mostly sedentary and the highest intensity of activity was moderate, as indicated by the maximum recorded heartrate of 103.99. Only 3.9% of activities were logged in total by only 13% of the users.</font>


## **Suggestions**



* <font size="4">This study revealed that the users were generally sedentary, non-enthusiast-types who probably engage in nightlife and are most active during weekends. This profile can be used to inform the marketing approach. Also, users were generally passive in thier use of the smart devices. Consider ways to encourage user engagement with the application and devices. This could be done through prompts for feedback or notifications within the application and/or devices that may provide recommendations or suggestions based on the learned patterns of the user, or the average user data.</font>

* <font size="4">Further analysis into the types of activities users are generally engaged in should provide additional insights. This data might be obtained via a survey of the target market for each product.</font>
