# Cardio Good Fitness Case Study
* The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill product offered by CardioGood Fitness. 

* The market research team decides to investigate whether there are differences across the product lines with respect to customer characteristics.

* The team decides to collect data on individuals who purchased a treadmill at a CardioGoodFitness retail store during the prior three months. 

* The data are stored in the CardioGoodFitness.csv file

# The team identifies the following customer variables to study:

1. product purchased, TM195, TM498, or TM798
2. gender;
3. age, in years;
4. education, in years;
5. relationship status, single or partnered;
6. annual household income ;
7. average number of times the customer plans to use the treadmill each week;
8. average number of miles the customer expects to walk/run each week;
9. and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

# importing libraries

In [None]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# loading the Dataset

In [None]:
cardiodata = pd.read_csv('../input/cardiogoodfitness/CardioGoodFitness.csv')

In [None]:
cardiodata.head()

In [None]:
cardiodata.info()

**Shape of the data**

In [None]:
cardiodata.shape

**Checking for the null values **

In [None]:
cardiodata.isna().any()

****All columns of the input will be included in the output.

In [None]:
cardiodata.describe(include='all')

# Histograms

This histogram represents the distribution of data by forming bins along the range of the data and then drawing bars to show the number of observations that fall in each bin.

In [None]:
cardiodata.hist(figsize=(20,20))

# Boxplots

**#Which is the most popular model by gender?**

In [None]:
pd.crosstab(cardiodata['Product'],cardiodata['Gender'] )

In [None]:
sns.boxenplot(x='Gender',y='Age',data=cardiodata)

In [None]:
pd.crosstab(cardiodata['Product'],cardiodata['MaritalStatus'] )


**seperated data by Gender**

In [None]:
cardiodata.hist(by='Gender',column = 'Income')

**average of Age**

In [None]:
cardiodata['Age'].mean()

In [None]:
sns.distplot(cardiodata['Age'])

# Count Plot
x= number of each product
hue= seperated by  Gender

In [None]:
sns.countplot(x='Product',hue='Gender',data=cardiodata)

# pairplot
** Quick overview of the data**

In [None]:
sns.pairplot(cardiodata)

# Corelation Heat Map

In [None]:
corr=cardiodata.corr()
corr

# heatmap
**Overview of the correlation of the different variables**

In [None]:
sns.heatmap(corr,annot=True)

# How do income and age affect the decision of which model is bought?
 We can infer that TM798 is the more expensive model

In [None]:
sns.scatterplot(x='Age', y='Income',data=cardiodata, hue = 'Product')
plt.show()