
# Cardio Good Fitness

Cardio Good Fitness is a retail store and this data is of customers who brought various treadmill models.

# About the Dataset

* The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill product offered by CardioGood Fitness.

* The team decides to investigate whether there are differences across the product lines with respect to customer characteristics.

* The team decides to collect data on individuals who purchased a treadmill at a CardioGoodFitness retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file.

# Dataset Information :

cardiogoodfitness.csv: The csv contains data related to customers who have purchased different model from Cardio Good Fitness :

 - Product - the model no. of the treadmill
 - Age - in no of years, of the customer
 - Gender - of the customer
 - Education - in no. of years, of the customer
 - Marital Status - of the customer
 - Usage - Avg. # times the customer wants to use the treadmill every week
 - Fitness - Self rated fitness score of the customer (5 - very fit, 1 - very unfit)
 - Income - of the customer
 - Miles- expected to run   

# Objective : 

Analyse the targeted customers and their profiles. Based on it , make suggestions and inferences for the company to target more customers.

# Based on this , we have a set of questions : 
    
 - How many models does store have?
 - Which is most sold Model?
 - Are Male customers buying treadmill more than female customers?
 - What is the Income ,Age , Education of people buying treadmill.?
 - How many days and miles customer expect to run on treadmill?
 - What is the self rated fitness of customers buying treadmill.?
 - Are married customer buying Treadmill more than Single customers?
 - Is there any relation between Income and model.?
 - Is there any relation between Age and model ?
 - Is there any relation between self rated fitness and model .?
 - Is there any relation between education and model.?
 - Does gender has any effect on model customer buy .?
 - Does Martial status has any effect model customer buy.?
 - Is there different age groups buying different models.?
 - Relation between Age, Income and education and model bought ?

In [None]:
# import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns 
import warnings
warnings.filterwarnings('ignore') # To supress warnings
sns.set(style="whitegrid") # set the background for the graphs

In [None]:
#Reading the csv file cardiogoodfitness.csv in variable 
data=pd.read_csv("../input/cardiogoodfitness/CardioGoodFitness.csv")
data1=data.copy()

In [None]:
data1.head()

In [None]:
data1.tail()

In [None]:
data1.shape

In [None]:
data1.dtypes

In [None]:
data1.info()

# Data Preprocessing

In [None]:
#changing it to  object dtype to category  to save memory

data1["Product"]=data1["Product"].astype("category")
data1.Gender=data1["Gender"].astype("category")
data1.MaritalStatus=data1["MaritalStatus"].astype("category")

In [None]:
data1.info()

In [None]:
data1.columns

In [None]:
data1.isnull().sum()

In [None]:
data1.duplicated().sum()

**Observation:** There are no missing or duplicate values in the dataset

# Examine Data

In [None]:
list_col=['Product','MaritalStatus','Usage','Fitness','Education','Age']
for i in list_col:
    print('{}   :    {}'.format(i.upper(),data1[i].unique()))

**Observation:**

* There are 3 different treadmills products.
* There are both Partnered and single customers
* Age of customers ranges from 18 to 50
* Education in years is from 12 -21
* Usage is from 2 days to 7 days a week
* Fitness level of customers from 1 -5

In [None]:
data1.describe()

**Observation:**

* Age of customer using treadmill is between range 18 - 50 . Average age is 28.78 and median is 26.
* Maximum income of treadmill user is 100K , Average income approx. 54K ,while median is is approx. 51K.
* Expected Treadmill usage is atleast Once a week , maximum is 7 times a week and on Average 3 times a week
* Customer education is between 12 -21 years, with average and median of 16 years and maximum of 21 years
* Customer expects to runs on an average of 103.19 miles per week, median 94 miles per week.
* Average self rated fitness is 3.

In [None]:
data1['Product'].value_counts()

**Observation:** TM195 treadmill model is most sold model.

In [None]:
data1.Gender.value_counts()

**Observation:** There are 76 female and 104 males customers. More Male customers are buying treadmill compared to female customer

In [None]:
data1.MaritalStatus.value_counts()

**Observation:** There are 107 Partnered and 73 single customers. Customers who are Partnered are buying treadmill more compared to single customer.

In [None]:
data1[data1['Product']=='TM195'].describe().T

**Observation**

* 80 customers bought TM195 model
* Average age of customer who purchases TM195 is 28.5 , Median is 26 . Data is right skewed.
* Average Education is 15 and median is 16.
* Expected usage is 3 day a week
* Expected Miles to run is on an Average 82.78 miles per week and median is 85.
* Self rated fitness is 3 that is average fitness level
* Average income and median is around $46K.

In [None]:
data1[data1['Product'] == 'TM498'].describe().T

**Observations**

* There are 60 customers who purchased TM 498 Model
* Average age of customer who purchases TM498 is 28.9 , Median is 26 . Age is right skewed. Customer range is between 24-33.
* Average Education is 15 and median is 16.
* Expected usage is 3 day a week
* Expected Miles to run is on an Average 60 miles per week and median is 85.
* Average Income is 48973.
* Median Income is 49459

In [None]:
data1[data1['Product'] == 'TM798'].describe().T

**Observations**

* Average age of customer who purchases TM798 is 29 , Median is 27 .
* Average Education is 17 and median is 18.
* Expected usage is 4-5 day a week
* Expected Miles to run is on an Average 166 miles per week and median is 160.
* Average Income is 75K and median is 76K

In [None]:
pd.crosstab(data1['Gender'],data1['Product'])

**Observations**

* Equal number of customers who purchased TM195 were Male as well as Female.
* Number of customers who purchased TM498 were more Males than Females.
* There are considerably more Males than there are Females who purchased TM798.

In [None]:
pd.crosstab(data1['Fitness'], data1['Product'])

# Data Visualisation

In [None]:
sns.countplot(x='Product',hue='Gender',data=data1)

In [None]:
sns.pairplot(data1)

In [None]:
data1.corr()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(data.corr(), annot=True)

In [None]:
#Univariate Analysis
#categorical variables 
plt.figure(figsize=(14,7))
data1['Product'].value_counts().plot.pie(autopct='%1.1f%%',figsize=(8,8))
plt.title("Pie chart of Product Sales")
plt.show()

In [None]:
sns.scatterplot(x='Age', y='Income', hue='Product', data=data1)

In [None]:
plt.figure(figsize=(7,7))
sns.countplot(data1['Gender'],hue=data1["MaritalStatus"]).set(title='MARTIAL STATUS BY GENDER')

In [None]:
plt.figure(figsize=(12,7))
sns.catplot(x='Usage', y='Income', col='Gender',hue='Product' ,kind="bar", data=data1) 

In [None]:
#scatter plot between income, age ,product and usage
sns.relplot(x="Age", y="Income", hue="Product", size="Usage",
            sizes=(40, 400), alpha=.5,
            height=6, data=data1).set(title='INCOME BY AGE ,PRODUCT AND USAGE');

**OBSERVATION**

* Products TM195 and TM498 are bought by people with lower than 70K as income and age is concentrated more in range of 23-35

* Product TM798 is mainly bought by people with higher than 70K income and age falls in range of 23-30. -Majority of people who buys the TM798 expect that they will run more than consumers of the other two products, on average.