# Cardio Good Fitness Case Study

### Task:
Perform descriptive analytics to create a customer profile for each CardioGood Fitness treadmill product line.

### Data:
The team identifies the following customer variables to study: 
* product purchased, TM195, TM498, or TM798;
* gender;
* age, in years;
* education, in years;
* relationship status, single or partnered;
* annual household income ($);
* average number of times the customer plans to use the treadmill each week;
* average number of miles the customer expects to walk/run each week;
* self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape;

The data can be downloaded here: https://www.kaggle.com/saurav9786/cardiogoodfitness

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

### Explore Data

In [None]:
data = pd.read_csv('./CardioGoodFitness.csv')

In [None]:
data.head(10)

In [None]:
data.shape

In [None]:
data.info() # no null values

We are presented with a clean data set that has no missing values.  Let's try to make some visualizations to explain the data in an intuitive sense. 

### Visualizations

In [None]:
sns.countplot(x = data.Product)
plt.title('Product Popularity', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Number of Customers', fontsize = 15)
plt.show()

In [None]:
sns.set_theme(style="whitegrid")
sns.barplot(data = data, x = 'Product', y = 'Age', linewidth=2.5)
plt.title('Customer Age per Product', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Age', fontsize = 15)
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(6,6))
sns.countplot(data = data, x = 'Product', hue = 'Gender', ax = ax, linewidth=2.5)
plt.title('Customer Gender per Product', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Number of Customers', fontsize = 15)
plt.show()

We can notice that **TM798 treadmill** is **mostly used by men** as there are only 6 women that use it

In [None]:
sns.boxplot(data = data, x = 'Product', y = 'Education', linewidth=2.5)
plt.title('Customer Education per Product', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Education (in years)', fontsize = 15)
plt.show()

We can notice that **TM798 treadmill** attracts customers with a **higher amount of education (in years)**.  I am not sure how relevant that is to the use of treadmills

In [None]:
fig, ax = plt.subplots(figsize=(6,6))
sns.countplot(data = data, x = 'Product', hue = 'MaritalStatus', ax = ax, linewidth=2.5)
plt.title('Customer Marital Status per Product', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Number of Customers', fontsize = 15)
plt.show()

In [None]:
sns.boxplot(data = data, x = 'Product', y = 'Usage', linewidth=2.5)
plt.title('Customer Usage per Product', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Hours', fontsize = 15)
plt.show()

* Product **TM195** is inteded to be used between 3-4 hours per week
* Product **TM498** is intended to be used approximately 3 hours per week with a few outliers
* While it has the least sales, product **TM798** is intended to be used the most

In [None]:
sns.barplot(data = data, x = 'Product', y = 'Income', linewidth=2.5)
plt.title('Customer Income per Product', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Income ($)', fontsize = 15)
plt.show()

In [None]:
sns.barplot(data = data, x = 'Product', y = 'Miles', linewidth=2.5)
plt.title('Customer Expected Milage per Product', fontsize = 20)
plt.xlabel('Product', fontsize = 15)
plt.ylabel('Miles', fontsize = 15)
plt.show()

### Conclusion


**Customer Profiles**:
<br>  
* **TM195** - this treadmill is the most popular amongst all the company's products.  The average cutsomer is in its late twenties with an average income of $45,000.  The product appeals to both men and women. It is intended to be used around 3-4 hours per week for about 75 miles, so the customers are fairly active


* **TM498** - this treadmill is the second most popular product in the company.  The average customer is in its late twenties with an average income of just under $50,000.  The product appeals to both men and women.  It is intended to be used around 3 hours per week.  The customers are fairly active.  There is no major difference between TM 498 and TM 195.


* **TM798** - this treadmill is the least popular amongst the products as it has a very different audience.  The average age of a customer is late twenties, yet the average income of this person is around $75,000, which is a significant increase compared to two other products' customers.  These customers seem to be highly educated and active.  They intend to use the product for 4-5 hours a week with the expected milage of 165 miles.  The main audience of the TM 798 are men. 