# CardioGood Fitness Case Study - Descriptive Statistics
The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill product offered by CardioGood Fitness. The market research team decides to investigate whether there are differences across the product lines with respect to customer characteristics. The team decides to collect data on individuals who purchased a treadmill at a CardioGood Fitness retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file.

### The team identifies the following customer variables to study: 
  - product purchased, TM195, TM498, or TM798.
  - gender.
  - age, in years. 
  - education, in years.
  - relationship status, single or partnered.
  - annual household income.
  - average number of times the customer plans to use the treadmill each week. 
  - average number of miles the customer expects to walk/run each week.
  - and self-rated fitness on a 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

### Perform descriptive analytics to create a customer profile for each CardioGood Fitness treadmill product line.

In [1]:
# Load the necessary packages
import numpy as np
import pandas as pd

In [None]:
# Google Drive is not allowed on my employee laptop.  SO I am not using it and am instead using local files & a Jupyter Notebook.
# from google.colab import drive
# drive.mount('/content/drive')

In [2]:
# Load the Cardio Dataset
# mydata = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/CardioGoodFitness.csv')
mydata = pd.read_csv('./CardioGoodFitness.csv')

In [3]:
mydata.head()

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
0,TM195,18,Male,14,Single,3,4,29562,112
1,TM195,19,Male,15,Single,2,3,31836,75
2,TM195,19,Female,14,Partnered,4,3,30699,66
3,TM195,19,Male,12,Single,3,3,32973,85
4,TM195,20,Male,13,Partnered,4,2,35247,47


In [None]:
mydata.describe(include="all")

In [None]:
mydata.info()

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

mydata.hist(figsize=(20,30))

In [None]:
import seaborn as sns #importing seaborn library

sns.boxplot(x="Gender", y="Age", data=mydata)

In [None]:
sns.boxplot(x="Product", y="Age", data=mydata)

In [None]:
pd.crosstab(mydata['Product'],mydata['Gender'] )

In [None]:
pd.crosstab(mydata['Product'],mydata['MaritalStatus'] )

In [None]:
sns.countplot(x="Product", hue="Gender", data=mydata)

In [None]:
pd.pivot_table(mydata, index=['Product', 'Gender'],
                     columns=[ 'MaritalStatus'], aggfunc=len)

In [None]:
pd.pivot_table(mydata,'Income', index=['Product', 'Gender'],
                     columns=[ 'MaritalStatus'])

In [None]:
pd.pivot_table(mydata,'Miles', index=['Product', 'Gender'],
                     columns=[ 'MaritalStatus'])

In [None]:
sns.pairplot(mydata)

In [None]:
mydata['Age'].std()

In [None]:
mydata['Age'].mean()

In [None]:
sns.displot(data=mydata, x='Age', kde=True)

In [None]:
mydata.hist(by='Gender',column = 'Age')

In [None]:
mydata.hist(by='Gender',column = 'Income')

In [None]:
mydata.hist(by='Gender',column = 'Miles')

In [None]:
mydata.hist(by='Product',column = 'Miles', figsize=(20,30))

In [None]:
corr = mydata.corr()
corr

In [None]:
sns.heatmap(corr, annot=True)

In [None]:
# Simple Linear Regression

#Load function from sklearn
from sklearn import linear_model

# Create linear regression object
regr = linear_model.LinearRegression()

y = mydata['Miles']
x = mydata[['Usage','Fitness']]

# Train the model using the training sets
regr.fit(x,y)

In [None]:
regr.coef_

In [None]:
regr.intercept_

In [None]:
# MilesPredicted = -56.74 + 20.21*Usage + 27.20*Fitness