<h1>Calories Burned Prediction</h1>

<h2>Importing Libraries</h2>

In [98]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn import metrics
import pickle

%matplotlib inline
from matplotlib import style
style.use("seaborn")



import warnings
warnings.filterwarnings('ignore')

In [99]:
calories = pd.read_csv("calories.csv")
exercise = pd.read_csv("exercise.csv")

## General Overview of Dataset

In [100]:
calories.head()

Unnamed: 0,User_ID,Calories
0,14733363,231.0
1,14861698,66.0
2,11179863,26.0
3,16180408,71.0
4,17771927,35.0


In [101]:
exercise.head()

Unnamed: 0,User_ID,Gender,Age,Height,Weight,Duration,Heart_Rate,Body_Temp
0,14733363,male,68,190.0,94.0,29.0,105.0,40.8
1,14861698,female,20,166.0,60.0,14.0,94.0,40.3
2,11179863,male,69,179.0,79.0,5.0,88.0,38.7
3,16180408,female,34,179.0,71.0,13.0,100.0,40.5
4,17771927,female,27,154.0,58.0,10.0,81.0,39.8


In [102]:
exercise_df = exercise.merge(calories , on = "User_ID")
exercise_df.head()

Unnamed: 0,User_ID,Gender,Age,Height,Weight,Duration,Heart_Rate,Body_Temp,Calories
0,14733363,male,68,190.0,94.0,29.0,105.0,40.8,231.0
1,14861698,female,20,166.0,60.0,14.0,94.0,40.3,66.0
2,11179863,male,69,179.0,79.0,5.0,88.0,38.7,26.0
3,16180408,female,34,179.0,71.0,13.0,100.0,40.5,71.0
4,17771927,female,27,154.0,58.0,10.0,81.0,39.8,35.0


1.**User_ID** : The ID of the person which is unique.\
2.**Gender** : Gender of the person.\
3.**Age** : Age of the person.\
4.**Height** : Height of the person in $cm$.\
5.**Weight** : Weight of the person in $kg$.\
6.**Duration** : Duration of the person's exercise/activity.\
7.**Heart_Rate** : Heart rate per $min$ of the person.\
8.**Body_Temp** : Body temperature of the person in $C^{\circ}$.\
9.**Calories** : Calories burned in kilo calories.

### Dataset's Overall Statistic

In [103]:
exercise_df.describe()

Unnamed: 0,User_ID,Age,Height,Weight,Duration,Heart_Rate,Body_Temp,Calories
count,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0,15000.0
mean,14977360.0,42.7898,174.465133,74.966867,15.5306,95.518533,40.025453,89.539533
std,2872851.0,16.980264,14.258114,15.035657,8.319203,9.583328,0.77923,62.456978
min,10001160.0,20.0,123.0,36.0,1.0,67.0,37.1,1.0
25%,12474190.0,28.0,164.0,63.0,8.0,88.0,39.6,35.0
50%,14997280.0,39.0,175.0,74.0,16.0,96.0,40.2,79.0
75%,17449280.0,56.0,185.0,87.0,23.0,103.0,40.6,138.0
max,19999650.0,79.0,222.0,132.0,30.0,128.0,41.5,314.0


In [104]:
exercise_df.drop(columns = "User_ID" , inplace = True)

* For avoiding any `Data Leakage` in our model, let's split our data into training set and test set before doing any `feature engineering`.

In [105]:
exercise_train_data , exercise_test_data = train_test_split(exercise_df , test_size = 0.2 , random_state = 1)
print("Shape of training data : " , exercise_train_data.shape)
print("Shape of test data : " , exercise_test_data.shape)

Shape of training data :  (12000, 8)
Shape of test data :  (3000, 8)



In this section our purpose is combine the `Weight` column and the `Height` column values to perform a simple BMI calculation to classify individuals of this dataset into different groups according to their BMI value.

* The BMI(Body Mass Index) formula:


$BMI = \frac{Weight(kg)}{Height(m)^2}$

OR

$BMI = \frac{Weight(Ib)}{Height(in)^2}$


* The first formula will be used because the units for `Weight` and `Height` of this dataset is `kg` and `meter` in respect.

* According to [this page](https://en.wikipedia.org/wiki/Body_mass_index) we will classify instances according to below table:


|   Categoty                                       | from          | to    |
| -------------------------------------------------|:-------------:| -----:|
| Very severely underweight                        | --            |  15   |
| Severely underweight                             | 15            |  16   |
| Underweight                                      | 16            |  18.5 |
| Normal (healthy weight)                          | 18.5          |  25   |
| Overweight                                       | 25            |  30   |
| Obese Class I (Moderately obese)                 | 30            |  35   |
| Obese Class II (Severely obese)                  | 35            |  40   |
| Obese Class III (Very severely obese)            | 40            |       |

* We will classify examples according to above category:

In [106]:
for data in [exercise_train_data , exercise_test_data]:         # adding BMI column to both training and test sets
  data["BMI"] = data["Weight"] / ((data["Height"] / 100) ** 2)
  data["BMI"] = round(data["BMI"] , 2)

In [107]:
# exercise_test_data["BMI"] = exercise_test_data["Weight"] / ((exercise_test_data["Height"] / 100) ** 2)
# exercise_test_data["BMI"] = round(exercise_test_data["BMI"] , 2)
# exercise_test_data.head()

Before we feed our data to the model we have to first convert `categorical` column(like `Gender`) into `numerical` column.

In [108]:
exercise_train_data = exercise_train_data[["Gender" , "Age" , "BMI" , "Duration" , "Heart_Rate" , "Body_Temp" , "Calories"]]
exercise_test_data = exercise_test_data[["Gender" , "Age" , "BMI"  , "Duration" , "Heart_Rate" , "Body_Temp" , "Calories"]]
exercise_train_data = pd.get_dummies(exercise_train_data, drop_first = True)
exercise_test_data = pd.get_dummies(exercise_test_data, drop_first = True)

* So now let's seperate X and y for training set and test set.

In [109]:
X_train = exercise_train_data.drop("Calories" , axis = 1)
y_train = exercise_train_data["Calories"]

X_test = exercise_test_data.drop("Calories" , axis = 1)
y_test = exercise_test_data["Calories"]

In [110]:
X_train

Unnamed: 0,Age,BMI,Duration,Heart_Rate,Body_Temp,Gender_male
2643,62,27.38,14.0,88.0,40.5,True
13352,77,25.06,28.0,108.0,40.8,True
13117,73,24.57,16.0,91.0,40.2,False
2560,76,26.15,24.0,94.0,40.7,True
14297,42,22.99,7.0,93.0,39.8,True
...,...,...,...,...,...,...
905,25,23.62,19.0,99.0,40.6,False
5192,24,26.02,6.0,84.0,39.0,True
12172,52,23.74,15.0,99.0,39.9,True
235,70,24.16,9.0,79.0,40.0,False


### Building Regression Model

In [88]:
model = LinearRegression()
model.fit(X_train , y_train)
predictions = model.predict(X_test)
model.score(X_test,y_test)

0.9651112627454046

In [92]:
pickle.dump(model,open('model.pkl','wb'))

In [44]:
print("Linear Regression Mean Absolute Error(MAE) : " , round(metrics.mean_absolute_error(y_test , linreg_prediction) , 2))
print("Linear Regression Mean Squared Error(MSE) : " , round(metrics.mean_squared_error(y_test , linreg_prediction) , 2))
print("Linear Regression Root Mean Squared Error(RMSE) : " , round(np.sqrt(metrics.mean_squared_error(y_test , linreg_prediction)) , 2))

Linear Regression Mean Absolute Error(MAE) :  8.52
Linear Regression Mean Squared Error(MSE) :  140.08
Linear Regression Root Mean Squared Error(RMSE) :  11.84


In [97]:
X_test["Actual"] = y_test
X_test["Predicted"] = predictions
X_test

Unnamed: 0,Age,BMI,Duration,Heart_Rate,Body_Temp,Gender_male,Actual,Predicted
7576,74,24.98,29.0,106.0,41.0,False,198.0,198.368798
10509,43,26.88,13.0,97.0,39.9,True,72.0,79.126885
4253,43,23.74,29.0,108.0,40.5,False,195.0,194.693093
5150,62,25.36,4.0,83.0,38.9,True,17.0,18.058599
506,37,22.39,15.0,94.0,40.0,False,74.0,79.230953
...,...,...,...,...,...,...,...,...
9533,34,25.96,25.0,101.0,41.3,True,139.0,138.319534
13457,77,26.40,4.0,91.0,38.9,False,21.0,40.036150
14764,24,23.88,6.0,94.0,39.5,False,28.0,21.543295
8375,35,25.35,22.0,98.0,40.9,True,108.0,119.622697
