# Cardio Good Fitness Project

## 1. Objective
Preliminary Data Analysis. Explore the dataset and practice extracting basic observations about the data

### 1.1 Expectations
- Come up with a customer profile (characteristics of a customer) of the different products
- Perform uni-variate and multi-variate analyses
- Generate a set of insights and recommendations that will help the company in targeting new customers


### 1.2. Context
The data is for customers of the treadmill product(s) of a retail store called Cardio Good Fitness. It contains the following variables

**Product** - the model no. of the treadmill

**Age** - in no of years, of the customer

**Gender** - of the customer

**Education** - in no. of years, of the customer

**Marital Status** - of the customer

**Usage** - Avg. # times the customer wants to use the treadmill every week

**Fitness** - Self rated fitness score of the customer (5 - very fit, 1 - very unfit)

**Income** - of the customer

**Miles**- expected to run

Explore the dataset to identify differences between customers of each product. 
You can also explore relationships between the different attributes of customers. 
You can approach it from any other line of questioning that you feel could be relevant for the business.



## 2. Analysis of dataset  

### 2.1 Loading the libraries and some standard initializations

In [26]:
import warnings
warnings.filterwarnings('ignore') #ignore those pesky warnings
%config Completer.use_jedi = False #Autocomplete magic https://tinyurl.com/nx5d58ez
import numpy as np
import pandas as pd
import seaborn as sns

In [27]:
from matplotlib import pyplot as plt
# To enable plotting graphs in Jupyter notebook
%matplotlib inline
from scipy.stats import skew, norm, probplot, boxcox, f_oneway
sns.set(style="darkgrid")

### 2.2 Import the dataset

In [28]:
#Load Dataset 
cardio_dataset = pd.read_csv('CardioGoodFitness.csv')

### 2.3 Check the dataset head() and tail()

In [29]:
cardio_dataset.tail()

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
175,TM798,40,Male,21,Single,6,5,83416,200
176,TM798,42,Male,18,Single,5,4,89641,200
177,TM798,45,Male,16,Single,5,5,90886,160
178,TM798,47,Male,18,Partnered,4,5,104581,120
179,TM798,48,Male,18,Partnered,4,5,95508,180


In [30]:
cardio_dataset.head()

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
0,TM195,18,Male,14,Single,3,4,29562,112
1,TM195,19,Male,15,Single,2,3,31836,75
2,TM195,19,Female,14,Partnered,4,3,30699,66
3,TM195,19,Male,12,Single,3,3,32973,85
4,TM195,20,Male,13,Partnered,4,2,35247,47


### 2.4 Check the dataset shape()

In [31]:
# Exploring the dataset - length / shape
cardio_dataset.shape

(180, 9)

### 2.5 Check the dataset columns

In [32]:
cardio_dataset.columns

Index(['Product', 'Age', 'Gender', 'Education', 'MaritalStatus', 'Usage',
       'Fitness', 'Income', 'Miles'],
      dtype='object')

### 2.6 Check the dataset describe()

In [33]:
cardio_dataset.describe()

Unnamed: 0,Age,Education,Usage,Fitness,Income,Miles
count,180.0,180.0,180.0,180.0,180.0,180.0
mean,28.788889,15.572222,3.455556,3.311111,53719.577778,103.194444
std,6.943498,1.617055,1.084797,0.958869,16506.684226,51.863605
min,18.0,12.0,2.0,1.0,29562.0,21.0
25%,24.0,14.0,3.0,3.0,44058.75,66.0
50%,26.0,16.0,3.0,3.0,50596.5,94.0
75%,33.0,16.0,4.0,4.0,58668.0,114.75
max,50.0,21.0,7.0,5.0,104581.0,360.0


### 2.7 Check dataset info()

In [34]:
# Exploring the dataset - info
cardio_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Product        180 non-null    object
 1   Age            180 non-null    int64 
 2   Gender         180 non-null    object
 3   Education      180 non-null    int64 
 4   MaritalStatus  180 non-null    object
 5   Usage          180 non-null    int64 
 6   Fitness        180 non-null    int64 
 7   Income         180 non-null    int64 
 8   Miles          180 non-null    int64 
dtypes: int64(6), object(3)
memory usage: 12.8+ KB


In [18]:
# Exploring the dataset: This function shows some basic descriptive statistics for all numeric columns
cardio_dataset.describe(include='all')

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
count,180,180.0,180,180.0,180,180.0,180.0,180.0,180.0
unique,3,,2,,2,,,,
top,TM195,,Male,,Partnered,,,,
freq,80,,104,,107,,,,
mean,,28.788889,,15.572222,,3.455556,3.311111,53719.577778,103.194444
std,,6.943498,,1.617055,,1.084797,0.958869,16506.684226,51.863605
min,,18.0,,12.0,,2.0,1.0,29562.0,21.0
25%,,24.0,,14.0,,3.0,3.0,44058.75,66.0
50%,,26.0,,16.0,,3.0,3.0,50596.5,94.0
75%,,33.0,,16.0,,4.0,4.0,58668.0,114.75


### 2.8 List / count the datatypes

In [16]:
cardio_dataset.dtypes

Product          object
Age               int64
Gender           object
Education         int64
MaritalStatus    object
Usage             int64
Fitness           int64
Income            int64
Miles             int64
dtype: object

In [17]:
cardio_dataset.dtypes.value_counts()

int64     6
object    3
dtype: int64

### 2.9 Checking missing values in dataset

In [69]:
def missing_check(df):
    total = df.isnull().sum().sort_values(ascending=False)   # total number of null values
    percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)  # percentage of values that are null
    missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])  # putting the above two together
    return df # return the dataframe
#missing_check(cardio_dataset) -

In [90]:
cardio_dataset.isnull().count()

Product          180
Age              180
Gender           180
Education        180
MaritalStatus    180
Usage            180
Fitness          180
Income           180
Miles            180
dtype: int64

In [92]:
cardio_dataset.isnull().sum() 

Product          0
Age              0
Gender           0
Education        0
MaritalStatus    0
Usage            0
Fitness          0
Income           0
Miles            0
dtype: int64

# Observation(s) from exploratory
1. The dataset doesn't have any null data
2. Product, Gender, Marital are objects
   (I will change it to caterogical for further analyis)

## 3 EDA - Univariate Analysis

### 3.1 Histogram