# 1. About
Bellabeat is a high-tech company that manufactures health-focused smart products. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

# 2. Ask

## 2.1 Business tasks:
1. Identify some trends in smart device usage.
2. Understand how those trends to Bellabeat customers.
3. Find out how these trends could help influence Bellabeat marketing strategy.

## 2.2 Identify Stakeholders:

1. Urška Sršen - Bellabeat cofounder and Chief Creative Officer
2. Sando Mur - Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
3. Bellabeat marketing analytics team

# 3. Prepare


## 3.1 Import required libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
import seaborn as sns
import datetime as dt

sns.set()
sns.set_palette("Reds_d")

## 3.2 Load required data

In [None]:
daily_activity = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
intensities = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
hourly_intensities = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
steps = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
sleep = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight = pd.read_csv("../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

## 3.3 Take a quick look at the data

In [None]:
daily_activity.head()

In [None]:
daily_activity.describe()

In [None]:
daily_activity.info()

We can see that all rows are non-null, however, if we look at min values of all features, we can notice that there are rows with 0 values. We will exclude those elements later to avoid wrong assumptions.

In [None]:
sleep.head()

In [None]:
sleep.info()

In [None]:
sleep.describe()

In [None]:
for i in sleep:
  print(sleep[sleep[i]==0][i].value_counts())

Sleep data seems to be clean and correct

In [None]:
weight["Id"].nunique()

We will not include weight dataset as it has insufficient number of users

## 4.Process (Data Cleaning)

## 4.1 Change Datetime format

In [None]:
daily_activity["ActivityDate"] = pd.to_datetime(daily_activity["ActivityDate"])
daily_activity.rename(columns={"ActivityDate": "Date"}, inplace=True)

In [None]:
sleep["SleepDay"] = pd.to_datetime(sleep["SleepDay"])
sleep.rename(columns={"SleepDay": "Date"}, inplace=True)

In [None]:
sleep["TotalHoursAsleep"] = (sleep["TotalMinutesAsleep"] / 60).round(2)
sleep["TotalHoursInBed"] = (sleep["TotalTimeInBed"] / 60).round(2)

sleep.drop(columns=["TotalMinutesAsleep", "TotalTimeInBed"], inplace=True)

In [None]:
hourly_intensities["ActivityHours"] = pd.to_datetime(hourly_intensities["ActivityHour"], format='%m/%d/%Y %I:%M:%S %p')
hourly_intensities['ActivityHours'] = hourly_intensities['ActivityHours'].dt.hour

## 4.2 Drop rows with 0 values

According to several resources the average number of steps the person takes per day is 3000-4000 steps. In order to avoid wrong assumptions we will exclude information about people who made less than 500 steps

In [None]:
cleaned_data = daily_activity.copy()
cleaned_data.drop(cleaned_data[cleaned_data.TotalSteps < 1000].index, inplace=True)

In [None]:
cleaned_data.info()

## 4.3 Merge data

In [None]:
cleaned_merged_data = pd.merge(cleaned_data, sleep, how="inner", on=["Id", "Date"])

In [None]:
cleaned_merged_data["NoSleepHours"] = cleaned_merged_data["TotalHoursInBed"] - cleaned_merged_data["TotalHoursAsleep"]

In [None]:
cleaned_merged_data.info()

In [None]:
cleaned_merged_data.describe()

# 5. Analyze and Share

In [None]:
average_steps = pd.DataFrame(cleaned_data.groupby('Id')['TotalSteps'].agg("mean"))
average_steps.rename(columns={'TotalSteps':'Average_Steps'}, inplace=True)

In [None]:
sns.displot(average_steps, bins=10)

Average number of steps is around 7500 steps per person

In [None]:
plt.figure(figsize=(10,8), clear=True)
sns.set_context('paper', font_scale=1.4)

cleaned_mx = cleaned_merged_data.corr()

sns.heatmap(cleaned_mx, cmap="Reds_r", linewidths=1)

According to heatmap that shows correlation between values there is strong negative correlation between Setendtary activity and Sleeping hours. Let's check it out more precisely:

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12,10))
fig.suptitle('Relation between types of activities with number of sleep hours')


sns.regplot(data=cleaned_merged_data, x="TotalHoursAsleep", y="LightlyActiveMinutes", ax=axes[0,0])
sns.regplot(data=cleaned_merged_data, x="TotalHoursAsleep", y="FairlyActiveMinutes", ax=axes[0,1])
sns.regplot(data=cleaned_merged_data, x="TotalHoursAsleep", y="VeryActiveMinutes", ax=axes[1,0])
sns.regplot(data=cleaned_merged_data, x="TotalHoursAsleep", y="SedentaryMinutes", ax=axes[1,1])

We can clearly see that all types activities are not affecting sleeping hours except Sedentary activity. There is strong linear relation that shows that the more hours people spent in sedentary category, the worse their sleep was.

In [None]:
average_sleep = pd.DataFrame(cleaned_merged_data.groupby('Id')['TotalHoursAsleep'].agg("mean"))
average_sleep.rename(columns={'TotalHoursAsleep':'Average_sleep'}, inplace=True)

In [None]:
plt.figure(figsize=(10,6))
sns.histplot(average_sleep, bins=15)

The healthy sleep should last from 7 to 9 hours and we can see that most of people are in this range. However there are still a lot of people having lack of sleep.

In [None]:
hourly_intensities.info()

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(data=hourly_intensities, x="ActivityHours", y="AverageIntensity", palette="plasma")

Acccording to the information provided the most active hours are **5-7 PM** and **12-2 PM**

# 6. Act

### Here are some insights:

- Average number of steps is around 7500 steps per day. This number is greater than the one that represents Sedentary lifestyle (<5000). However, a 2020 [study]("https://www.medicalnewstoday.com/articles/how-many-steps-should-you-take-a-day") found that participants who took 8,000 steps per day had a 51% lower riskTrusted Source of dying by any cause compared with those who took 4,000 per day. This trend continued with higher step counts, as participants who took 12,000 steps per day had a 65% lower risk of dying than those who took 4,000. This finding suggests that the benefits of walking increase with step count but also shows that people who cannot reach 10,000 steps in a day can still benefit from the activity.
- Predominant number of sedentary hours causes less hours of sleep. This is proved by [this]("https://pubmed.ncbi.nlm.nih.gov/27830446/") article, that states that "sedentary behavior to be associated with an increased risk of insomnia".
- Speaking of sleep hours, predominant number of people participated in the research have average sleep in the healty range of 7-9 hours. But there is a significant number of people having lack of sleeping hours. This might also be caused by casualities of data collection e.g. some people don't wear device every night.
- The most active hours are 12-2 PM and 5-7 PM. 

### Recommendations

- Share information about their daily steps number through notifications and add recommendations if this number is low or congratulate if this number is higher. 
- Add notifications reminding them to go to sleep at a particular time.
- During most active hours (12-2 PM and 5-7 PM) provide motivational messages via notifications.
- Notify users if their sedentary time exceeds normal amount and remind them that this can cause insomtia.