# Introduction

### Bella Beat 
A high-tech company that provides health-focused smart products. The products are designed to collect data on activity, sleep, stress, and reproductive health. Bellabeat empowers women with knowledge about their own health and habits.

### Products:
##### App 
Provides users with health data related to their activity, so they better understand current habits and make healthy decisions.
##### Leaf
A tracker can be worn as a bracelet, necklace, or clip. Connects to the app to track activity (sleep and stress)
##### Time
A smartwatch to track user activity. The product tracks your daily wellness.
##### Spring
A water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The product connects to the app.
##### Membership
A subscription program. Provides 24/7 fully personalized guidance on nutrition, activity, sleep, health, beauty based on lifestyle and goals.

# Business tasks

### The stakeholders:
•	Urška Sršen (Bellabeat’s cofounder and Chief Creative Officer)

•	Sando Mur (Mathematician and Bellabeat’s cofounder)

### The business tasks:
•	Analyze non-Bellabeat smart device usage data to find patterns and gain insight

•	Discover more opportunities for growth through marketing strategy from the insights


# Data Information

### Dataset
The dataset used for this analysis is from Kaggle: [FitBit Fitness Tracker Data](https://www.kaggle.com/arashnic/fitbit)

### License
CC0: Public Domain, dataset made available through [Mobius](https://www.kaggle.com/arashnic)

### Description
This data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore habits and patterns.
In addition, since the data is provided by FitBit Fitness Tracker, a well known entity, so we can conclude the data is credible.

In the downloadable data, there are 18 csv files. Each of them show data related to the different functions of the device: calories, activity level, daily steps, etc.

In this study, we will be focusing on daily information data to simplify the analysis as shorter time periods don’t affect relevant trends.

# Data Processing

The observation of the data sets will be done using Python in Jupyter Notebook. The cleaning and visualization processes will be conducted via Python programming language, since it provides more flexibility when it comes to working with big volumes of data.

### Importing datasets

In [None]:
import numpy as np
import pandas as pd

daily_activity= pd.read_csv('../input/fitbit/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv')

daily_sleep= pd.read_csv('../input/fitbit/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv')

weight= pd.read_csv('../input/fitbit/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv')

hourly_Intensities=pd.read_csv('../input/fitbit/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv')

### Viewing Dataset

In [None]:
daily_activity.head(4)

In [None]:
daily_activity.info()

In [None]:
daily_sleep.head(4)

In [None]:
daily_sleep.info()

In [None]:
weight.head(4)

In [None]:
weight.info()

In [None]:
hourly_Intensities.head(3)

In [None]:
hourly_Intensities.info()

# Data Cleaning

### Correct Formatting 

The 'weight' column contains both date and time in the 'Date' column. Since our analysis is not based on time, so we will only keep the date and remove the time.

In [None]:
#Removing time and only keeping the date
weight = weight.astype({'Date':'datetime64'})
weight['Date'] = [d.date() for d in weight['Date']]
weight.head(3)

For each of the 4 dataset, the date column is not correctly formatted. So we will correctly format the date column 

In [None]:
# Formatting the date column in all the table
daily_activity = daily_activity.astype({'ActivityDate':'datetime64'})
daily_sleep = daily_sleep.astype({'SleepDay':'datetime64'})
weight = weight.astype({'Date':'datetime64'})
hourly_Intensities = hourly_Intensities.astype({'ActivityHour':'datetime64'})

In [None]:
# Randomly checking a tabe, if it is correctly formatted.
weight.info()

### Renaming the column
We will rename the column name to 'Id' and 'Date' for all the table which have a different names for id and date column. This is done to make the name same in order to later joins the tables by those column if required.

In [None]:
daily_activity.rename({'ActivityDate':'Date'}, axis=1, inplace=True)

daily_sleep.rename({'SleepDay':'Date'}, axis=1, inplace=True)

hourly_Intensities.rename({'ActivityHour':'Date'}, axis=1, inplace=True)

### Merging the tables

Since our data is seperated, but we need to find patterns between them. So, we will join the first three tables(daily_activity, daily_sleep, weight) by 'Id' and 'Date'.

However we will not merge the 'hourly_Intensities' table asnwe need it seperately to perform analysis

In [None]:
#Joining the three tables
final_df = pd.merge(pd.merge(daily_activity, daily_sleep,how='outer', on =['Id','Date']),weight,how='outer',on =['Id','Date'])
final_df.head(3)

In [None]:
# Checking if the join correctly executed
final_df.info()

### Handling null values
We will look at the data and keep only the columns which will be required for our analysis.

In [None]:
#Checking for null values in percentage
final_df.isna().sum()*100 / len(final_df)

We can see that in the last few columns there are more than 50% of the Null values present. This is because we have join the three tables for which its showing null values.

However, later we will only use those rows having non null values for those particular columns.

Like we will later compare 'weight VS totat_step'. So we will extract the non null columns from 'Weight' column. We will do the same whereever required

### Checking for time consistency

To check the time consistency we will seperate the 'Date' column into 3 seperate column as 'Day of week', 'Month', and 'Year'. Doing this may also help us in further process throughout our analysis.

In [None]:
#creating 3 new columns from the 'Date' column.
final_df['Day'] = pd.to_datetime(final_df['Date']).dt.dayofweek
final_df['Month'] = pd.to_datetime(final_df['Date']).dt.month
final_df['Year'] = pd.to_datetime(final_df['Date']).dt.year

In [None]:
#Checking if new columns were inserted
final_df.head(3)

We can see that new columns have been created but 'Day' column and 'Month' column have number instead of name. So we will convert this number into respective name.

We will first check the values of 'Day' and 'Month' column and map accordingly.

In [None]:
# checking number of unique values in day column
final_df.Day.unique()

In [None]:
# checking number of unique values in month column
final_df.Month.unique()

Since we have only two month of Data, so we will not use this column.

Rather we will convert the Day from number to Day name

In [None]:
#mapping 'Day' column to day names
day_map = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
final_df['Day'] = final_df['Day'].map(day_map)

In [None]:
#checking the changes
print(final_df.Day.unique())

In [None]:
#checking the changes
final_df.info()

We will do the same process for 'hourly_Intensities' table

In [None]:
#creating four new columns from the 'ActivityHour' column in 'hourly_Intensities' table
hourly_Intensities['Hour'] = pd.to_datetime(hourly_Intensities['Date']).dt.hour
hourly_Intensities['Day'] = pd.to_datetime(hourly_Intensities['Date']).dt.dayofweek
hourly_Intensities['Month'] = pd.to_datetime(hourly_Intensities['Date']).dt.month
hourly_Intensities['Year'] = pd.to_datetime(hourly_Intensities['Date']).dt.year

In [None]:
hourly_Intensities.head(2)

In [None]:
# checking number of unique values in day column
hourly_Intensities.Day.unique()

In [None]:
# checking number of unique values in month column
hourly_Intensities.Month.unique()

In [None]:
#mapping 'Day' column to day names
day_map = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
hourly_Intensities['Day'] = hourly_Intensities['Day'].map(day_map)

In [None]:
#checking the changes
hourly_Intensities.head(3)

#### Thus is our 2 clean  dataset we will perform our further analysis

# Analyze

### Waking habits of a person is related to calorie burnt

In [None]:
#importing the necessary library for plotting
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Extracting the non nul values from calorie table
calorie_df = final_df.loc[final_df['Calories'].notnull()]

In [None]:
#calorie VS steps plot
sns.lmplot(x='TotalSteps',y='Calories',data=calorie_df,palette='bright', aspect=4*.5)

##### We can clearly see that there is a positive increase in the curve. It means that the more steps taken in a day, the more calories burned.

### Level of activity during exercise
Here we will find the amount of people and their level of activity overall. Whether they are very active or won't do any activity throughout the day.
##### What percentage of people are active at what levels?

In [None]:
#Extracting the necessary column and ading the values

activ = final_df[['VeryActiveMinutes', 'FairlyActiveMinutes', 'LightlyActiveMinutes','SedentaryMinutes']].copy()

df_sum = activ.sum()

In [None]:
#plotting the activity level

df_sum.plot(kind='pie', autopct='%1.1f%%',figsize=(15,10),label="",cmap='coolwarm')

##### We can clearly see that the highest amount almost 81% of people are inactive most of the time throughout the day.  (sedentary means inactive)
We can further explore the activity to their sleep pattern in order to find the time they sleep by dong different level of activity.

But first lets find out the average time of users sleeping.

In [None]:
# Sleeping pattern of womens
sns.histplot(final_df['TotalMinutesAsleep'],bins=40)

##### From the graph, we can see the users average sleeping time is between 400 to 550 minuts. A majority of users sleep approximately 420 to 480 minutes i.e 7 to 8 hours. And from health related information we know that 7 to 8 hours of sleep in a day is the best time for being healthy.

Now lets see the different sleeping behaviour of women of different activities, like for light activities, for moderate activities, for heavy activities and for Sedentary Activities

In [None]:
# sleeping time at different activity level
sns.lmplot(x='LightlyActiveMinutes',y='TotalMinutesAsleep',data=final_df,palette='bright', aspect=3*.5)

sns.lmplot(x='FairlyActiveMinutes',y='TotalMinutesAsleep',data=final_df,palette='bright', aspect=3*.5)

sns.lmplot(x='VeryActiveMinutes',y='TotalMinutesAsleep',data=final_df,palette='bright', aspect=3*.5)

sns.lmplot(x='SedentaryMinutes',y='TotalMinutesAsleep',data=final_df,palette='bright', aspect=3*.5)

##### From the above curves we can see that womens who do atleast some physical activity tends to have a better sleep compare to womens who don't do any activity/Sedentary activity

However if we can find out the average steps taken by users who sleep between 420 to 480 mins, then we can suggest every user to take that steps every day in order for them to be healthy.

In [None]:
#sleep VS total_steps
sns.lmplot(x='TotalSteps',y='TotalMinutesAsleep',data=final_df,palette='bright', aspect=4*.5)

##### From this graph we can conclude that people who sleep between 420 to 480 minutes often walks nearer to 10,000 steps a day. Thus, for good health benifits, the total steps during the day should be almost equal to 9,999 steps

### Activity level of users related to Weight

First we will create a new table consisting valuses in weight column. i.e there should be no null values in the 'weight' column.

In [None]:
# Extracting the non null values from weight table and creatin a new table as 'weight_df'
weight_df = final_df.loc[final_df['WeightKg'].notnull()]
weight_df.head(3)

We will now find patterns between weight of womens by their level of activity.

In [None]:
#plotting different activity level vs weight

sns.lmplot(x='SedentaryMinutes',y='WeightKg',data=weight_df,palette='bright', aspect=3*.5)

sns.lmplot(x='LightlyActiveMinutes',y='WeightKg',data=weight_df,palette='bright', aspect=3*.5)

sns.lmplot(x='FairlyActiveMinutes',y='WeightKg',data=weight_df,palette='bright', aspect=3*.5)

sns.lmplot(x='VeryActiveMinutes',y='WeightKg',data=weight_df,palette='bright', aspect=3*.5)

###### It seems that women with light and moderate activity either tends to remain in shape or loose weight. This is a good indication. Because if they have more weight, this will tends to have more fats, which leads to obessity, and then different health problems occur

###### However people with no activity gain weights very fast unwillingly. They might faces health related issues if they won't do any activity further.

###### Also women with high activity, tends to gains weight. This may be because they need more weight as they are engaged in physicl sports like bodubuilding, weightlifting, etc. So, they are gaining weight willingly.

### Hourly activity habits of users

In [None]:
hourly_Intensities.head(3)

In [None]:
hours_intensity = hourly_Intensities.groupby(['Hour']).agg({'AverageIntensity': 'sum'})

In [None]:
#Plotting a bar plot of the count of different values in 'Hour' column from'hourly_Intensities' table
hours_intensity.plot(y="AverageIntensity", kind="bar")

##### From the fig we can clearly see that from 5pm to 7pm there are more fitness activity doing by highest number of women. Probably this may be the time at which the working hour of most of the women ended. And they move to gym or engage in any physical activity after the end their office.

##### Also we can see a peak in betwen 12pm to 2pm at noon time. Maybe at this time most of the women take break from work and may be go to a walk.

##### From both the above statement , we may conclude that most of the users are office working womens as they do activity in between office break hours and after end of office hours

# Observation

### People who walked more burned more calories.

### The highest amount of users almost 81% of people are inactive most of the time throughout the day.

### Womens who do atleast some physical activity tends to have a better sleep compared to womens who don't do any activity.

### People who sleep between 7 to 8 hours a day often walks nearer to 10,000 steps a day. Also, health science recommend to sleep an average of 7 hours a day. Thus, for good health, women should walk almost 10,000 steps per day

### Women with light and moderate activity either tends to remain in shape or loose weight. People with no activity gain weights very fast unwillingly.

### Most of the users are office working womens as they do activity in between office break hours and after end of office hours.

# Recomendation


### In order for the company to attract new user and gain more customers, they should improve thier notification system.
#### For users whose goal is to loose weight, Bella Beat could send them a notification through their apps recommending them to walk more upto a certain steps each day.
#### For inactive users and users who have sleep problem, Bellabeat could recommend them to walk atlest nearer to 10,000 steps each day.


### The marketing team could also focuses on office worker.
#### Bellabeat could collaborate with companies to create special health events.
#### Bellabeat can create a new digital marketing campaign showing advertise of their product to womens with a job.