# Cardio Good Fitness Case Study - Simple Linear Regression

### The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill product offered by CardioGood Fitness. The market research team decides to investigate whether there are differences across the product lines with respect to customer characteristics. The team decides to collect data on individuals who purchased a treadmill at a CardioGoodFitness retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file.

## The team identifies the following customer variables to study:

* product purchased, TM195, TM498, or TM798
* gender;
* age, in years;
* education, in years;
* relationship status, single or partnered;
* annual household income ;
* average number of times the customer plans to use the treadmill each week;
* average number of miles the customer expects to walk/run each week;
* and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
# importing libraries

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


In [None]:
# loading the Dataset

cardiodata = pd.read_csv('/kaggle/input/cardiogoodfitness/CardioGoodFitness.csv')

In [None]:
cardiodata.head()

In [None]:
 cardiodata.describe(include='all')

In [None]:
cardiodata.info()

## Histograms

### Histograms are likely familiar, and a hist function already exists in matplotlib. A histogram represents the distribution of data by forming bins along the range of the data and then drawing bars to show the number of observations that fall in each bin.

In [None]:
cardiodata.hist(figsize=(20,20))

## Boxplots

### This kind of plot shows the three quartile values of the distribution along with extreme values. The “whiskers” extend to points that lie within 1.5 IQRs of the lower and upper quartile, and then observations that fall outside this range are displayed independently. Importantly, this means that each value in the boxplot corresponds to an actual observation in the data:

In [None]:
sns.boxenplot(x='Gender', y = 'Age', data = cardiodata)

## Count Plot

### A special case for the bar plot is when you want to show the number of observations in each category rather than computing a statistic for a second variable. This is similar to a histogram over a categorical, rather than quantitative, variable. In seaborn, it’s easy to do so with the countplot() function:

In [None]:
sns.countplot(x='Product', hue = 'Gender', data = cardiodata)

## Visualizing pairwise relationships in a dataset

### To plot multiple pairwise bivariate distributions in a dataset, you can use the pairplot() function. This creates a matrix of axes and shows the relationship for each pair of columns in a DataFrame. by default, it also draws the univariate distribution of each variable on the diagonal Axes:

In [None]:
sns.pairplot(cardiodata)

## Corelation Heat Map

### Seaborn heatmaps are appealing to the eyes, and they tend to send clear messages about data almost immediately. This is why this method for correlation matrix visualization is widely used by data analysts and data scientists alike.

In [None]:
corr = cardiodata.corr()
corr

In [None]:
sns.heatmap(corr,annot= True)

# Simple Linear Regression

In [None]:
# loading model from sklearn library

from sklearn import linear_model

In [None]:
# creating linear regression parameters 

regr = linear_model.LinearRegression()

y = cardiodata['Miles']
x = cardiodata[['Usage','Fitness']]


In [None]:
# training the model

regr.fit(x,y)

In [None]:
# regression coeficient 

regr.coef_

In [None]:
# regression intercept

regr.intercept_

In [None]:
# Miles Predicted = -56.74 + 20.21*Usage + 27.20*Fitness