# Customer Profiles on Treadmill Models

[A. Data Set Information](#A.-Data-Set-Information)

[1. Imported Packages and Data](#1.-Imported-Packages-and-Data)

[2. Examine Data Set](#2.-Examine-Data-Set)

[3. Univariate Analysis](#3.-Univariate-Analysis)

[4. Bivariate Analysis](#4.-Bivariate-Analysis)

[5. Multivariate Analysis](#5.-Multivariate-Analysis)

[6. Discussion](#6.-Discussion)

# A. Data Set Information


Columns:
1. Product, treadmill model
2. Age, in years
3. Gender
4. Education, in years
5. MaritalStatus, single or partnered
6. Usage, average number of times customer plans to use the treadmill each week
7. Fitness, self-rated fitness 1-to-5 scale, 1 is poor, 5 is excellent shape
8. Income, US dollars
9. Miles, average number of miles customer expects to travel each week

## 1. Imported Packages and Data

#### 1.1 Packages

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt 

import seaborn as sns
sns.set_palette("colorblind")

import warnings
warnings.filterwarnings('ignore')

from CGF_functions import *

def col_stats(value,index):
    #Input value of interest to get descriptive stats
    #input column for needed stats, as string
    
    return pd.pivot_table(CGF_df,values=[value], index=[index],aggfunc={str(value):[np.mean,min,max,np.median]})

ModuleNotFoundError: No module named 'CGF_functions'

#### 1.2 Treadmill customer data

In [None]:
CGF_df = pd.read_csv("CardioGoodFitness.csv")

## 2. Examine Data Set

#### 2.1 First 5 rows of treadmill data set

In [None]:
CGF_df.head()

#### 2.2 Check for missing values

In [None]:
CGF_df.isna().any()

#### 2.3 Check data types

In [None]:
CGF_df.info()

#### 2.4 Check data set size

In [None]:
CGF_df.shape

- Data set has 180 rows, 9 columns

#### 2.5 Split data set by model for future analysis

In [None]:
CGF_195 = CGF_df[CGF_df["Product"] == "TM195"]
CGF_498 = CGF_df[CGF_df["Product"] == "TM498"]
CGF_798 = CGF_df[CGF_df["Product"] == "TM798"]

CGF_list = [CGF_195,CGF_498,CGF_798]

## 3. Univariate Analysis

#### 3.1 Summary statistics of all customers

In [None]:
CGF_df.describe(include='all')

- Income
    - Mean income is around 54k, median income is around 51k
    - Max reported income is just over 100k
    
- Age
    - Customer age ranges from 18-50 years old 
    - Mean and median age is 28.79 and 26, respecitively 

- Education
    - Customer education ranges from 12-21 years old
    - Mean and median education is 15.57 and 16, respectively

- Usage
    - Expected usage is around 3 times a week

- Miles
    - Mean expected miles traveled is around 103.19 miles, median is 94 miels

#### 3.1 Which treadmill models have been sold?

In [None]:
models_df = CGF_df["Product"].value_counts()
models_df

<b> 3.2 What is the model composition of our sales? </b>

In [None]:
models_df.plot.pie(autopct='%.1f%%',figsize=(8,8),explode=[0.03]*3)
plt.title("Treadmill Model Composition")

- Treadmill TM195 is the highest selling model, composing of 44.4% of sales
- Treadmill Tm798 is the lowest selling mode, composing of 22.2% of sales

<b> 3.3 Which gender bought more treadmills? </b>

In [None]:
gender_df = CGF_df["Gender"].value_counts()

In [None]:
gender_df.plot.pie(autopct='%.1f%%',figsize=(8,8),explode=[0.03]*2)
plt.title("Treadmill Gender Composition")

<b> 3.4 Did partnered or single people buy more treadmills? </b>

In [None]:
partner_df = CGF_df["MaritalStatus"].value_counts()

In [None]:
partner_df.plot.pie(autopct='%.1f%%',figsize=(8,8),explode=[0.03]*2)
plt.title("Treadmill Marital Status Composition")

<b>3.5 Distribution of continuous data groups</b>

In [None]:
CGF_df.hist(figsize=(25,25))

## 4. Bivariate Analysis

<b> 4.1 Product purchases by gender </b>

In [None]:
sns.countplot(x='Gender',hue="Product",data=CGF_df)

- Female purchasing TM798 is much lower than males

In [None]:
sns.countplot(x='MaritalStatus',hue="Product",data=CGF_df)

<b> 4.2 Age distribution by gender </b>

In [None]:
sns.violinplot(x="Gender",y="Age", data=CGF_df)

<b> 4.3 Age distribution by model </b>

In [None]:
sns.histplot(CGF_df, x="Age", hue="Product", element = "step",multiple="stack")

plt.title('Age distributions for each Treadmill Model')

In [None]:
sns.violinplot(x="Product",y="Age", data=CGF_df)

In [None]:
col_stats("Age","Product")

<b> 4.4 Years of education distribution by model </b>

In [None]:
sns.violinplot(x="Product",y="Education", data=CGF_df)

In [None]:
col_stats("Education","Product")

<b> 4.5 Income distribution by model </b>

In [None]:
sns.violinplot(x="Product",y="Income", data=CGF_df)

In [None]:
col_stats("Income","Product")

<b> 4.6 Expected miles traveled distribution by model </b>

In [None]:
sns.violinplot(x="Product",y="Miles", data=CGF_df)

In [None]:
col_stats("Miles","Product")

<b> 4.7 Expected weekly usage distribution by model </b>

In [None]:
sns.violinplot(x="Product",y="Usage", data=CGF_df)

In [None]:
col_stats("Usage","Product")

<b>4.8 Self-rated fitness distribution by model

In [None]:
sns.violinplot(x="Product",y="Fitness", data=CGF_df)

In [None]:
col_stats("Fitness","Product")

- TM798 is favored by customers of many different characteristics:
    - higher education
    - higher income
    - higher expected miles traveled
    - higher expected weekly usage
    - higher self fitness rating

## 5. Multivariate Analysis

#### 5.1 Heatmap

In [None]:
matrix = np.triu(CGF_df.corr())
sns.heatmap(CGF_df.corr(),annot=True,vmin=-1,vmax=1,center=0,linewidths=2,mask=matrix)

#### 5.2 Swarm plot of fitness, miles, and product

In [None]:
sns.swarmplot(data=CGF_df,x='Fitness',y='Miles',hue='Product')

Customers with higher expected miles and self fitness rating favored model TM798

#### 5.3 Swarm plot of education, income, and product

In [None]:
sns.swarmplot(data=CGF_df,x='Education',y='Income',hue='Product')

Customers with higher education and income favored model TM798 as well

## 6. Discussion

#### 6.1 TM195 Customer Profile

- TM195 is the most popular model, contributing to 44.4% of sales
- Partnered customers purchased this model more than single
- Mean age is 28.55 (youngest but not by much), median is 26, has oldest customers at 50.
- Mean years of education is 15, median is 16 (probably bachelor's degree)
- Mean income is 46.4k, max no higher than 70k
- Mean expected miles traveled is 83 miles a week
- Normal fitness rating and usage

#### 6.2 TM498 Customer Profile

- TM498 is the second most sold model, contributing to 33.3% of sales
- Mean age is just under 29, median is 26
- Mean years of education is 15, median is 16 (probably bachelor's degree as well)
- Mean income is 49k, max no higher than 70k
- Mean expected miles traveled is 88 miles a week, lowest expected miles at 21. 
- Normal fitness rating and usage

#### 6.3 TM798 Customer Profile

- TM498 is the lowest most sold model, contributing to 22.2% of sales
- More popular among males
- Mean age is just over 29, median is 27, youngest customer is oldest by far (22 vs 19 and 18)
- Mean years of education is 17, median is 18 (probably bachelor's and master's degree, some type of advanced degree)
- Mean income is 75k, max is a bit over 100k
- Mean expected miles traveled is 166 miles a week, lowest expected miles at 80. 
- Higher fitness and usage rating, 5 times per week, and fitness rating 4-5

#### 6.4 Suggestions

- Market TM195 and TM498 models as consumer friendly/budget and beginner treadmills
- TM195 sales are lower in singles, find a way to tap into that market
- TM798 is clearly favored by those who have high income, market it as a premium product
- Find a way to boost TM798 sales, maybe with celebrity or athlete endorsement or sponsor athletic events
- Majority of treadmill sales are purchased by adults under 40, find a way to market to older population (walking focus, recapture youth, indoor training)