**ABOUT THE DATASET**

* The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill product offered by CardioGood Fitness. 

* The team decides to investigate whether there are differences across the product lines with respect to customer characteristics.

* The team decides to collect data on individuals who purchased a treadmill at a CardioGoodFitness retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file. 

> The team identifies the following customer variables to study:
> 
* product purchased, TM195, TM498, or TM798
* gender;
* age, in years;
* education, in years;
* relationship status, single or partnered;
* annual household income ;
* average number of times the customer plans to use the treadmill each week;
* average number of miles the customer expects to walk/run each week;
* and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

In [None]:
#IMPORT NECESSARY LIBRARIES

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
df = pd.read_csv('../input/cardiogoodfitness/CardioGoodFitness.csv')

**ABOUT THE DATASET**
> 

In [None]:
df.head()

We can see that there are only a few columns available in the data. We need to find out the type of Treadmill one would choose on the basis of his/her age, gender, education, marital status etc.

In [None]:
df.describe()

In [None]:
df.info()

We do not have any null values in our data set which makes it easier for us to conduct our data analysis.

In [None]:
#changing it to  object dtype to category  to save memory
df.Product=df["Product"].astype("category")
df.Gender=df["Gender"].astype("category")
df.MaritalStatus=df["MaritalStatus"].astype("category")

In [None]:
df[df['Product']=='TM195'].describe().T

**OBSERVATIONS**

Some of the important conclusions that can be drawn by looking at our results are:

* Exactly 80 customers purchased TM195 Model.
* The average age of customers who bought TM195 is 28.5, whereas the min and max age is 18 and 50 respectively.
* Average Education is 15 and median is 16.
* Expected usage is 3 day a week

In [None]:
df[df['Product']=='TM498'].describe().T

**OBSERVATIONS**

* There are 60 customers who purchased TM 498 Model
* Average age of customer who purchases TM498 is 28.9 , Median is 26 . Age is right skewed. Customer range is between 24-33.
* Average Education is 15 and median is 16.
* Expected usage is 3 day a week

In [None]:
df[df['Product']=='TM798'].describe().T

**OBSERVATIONS**

* Average age of customer who purchases TM798 is 29 , Median is 27 .
* Average Education is 17 and median is 18.
* Expected usage is 4-5 day a week

In [None]:
pd.crosstab(df['Gender'], df['Product'])

* Equal number of customers who purchased TM195 were Male as well as Female.
* Number of customers who purchased TM498 were more Males than Females.
* There are considerably more Males than there are Females who purchased TM798.

In [None]:
pd.crosstab(df['Fitness'], df['Product'])

Now lets try visualizing the data a little bit.

In [None]:
sns.countplot(x='Product', hue='Gender', data=df)

In [None]:
sns.pairplot(df)

In [None]:
df.corr()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(df.corr(), annot=True)

In [None]:
#Univariate Analysis
#categorical variables 
plt.figure(figsize=(14,7))
df['Product'].value_counts().plot.pie(autopct='%1.1f%%',figsize=(8,8))
plt.title("Pie chart of Product Sales")
plt.show()

In [None]:
sns.scatterplot(x='Age', y='Income', hue='Product', data=df)

We can conclude that people with income groups equal to or higher that 50,000 tend to buy TM798

In [None]:
plt.figure(figsize=(7,7))
sns.countplot(df['Gender'],hue=df["MaritalStatus"]).set(title='MARTIAL STATUS BY GENDER')

In [None]:
plt.figure(figsize=(12,7))
sns.catplot(x='Usage', y='Income', col='Gender',hue='Product' ,kind="bar", data=df) 

In [None]:
#scatter plot between income, age ,product and usage
sns.relplot(x="Age", y="Income", hue="Product", size="Usage",
            sizes=(40, 400), alpha=.5,
            height=6, data=df).set(title='INCOME BY AGE ,PRODUCT AND USAGE');

**OBSERVATION**

* Products TM195 and TM498 are bought by people with lower than 70K as income and age is concentrated more in range of 23-35
* Product TM798 is mainly bought by people with higher than 70K income and age falls in range of 23-30. -Majority of people who buys the TM798 expect that they will run more than consumers of the other two products, on average.