# Models
## Creating General and Individual models to predict Rating Percieved Exertion (RPE)
Using the `data/initial_features.csv` dataset and the Pycaret library, we will create models to predict the Rating Percieved Exertion of our subjects. We will create a general model to predict the RPE of any subject, and an individual model to predict the RPE of a specialized subject. 

#### What is Rating Percieved Exertion?
The Borg Rating of Perceived Exertion (RPE) is a way of measuring physical activity intensity level. Perceived exertion is how hard you feel like your body is working.

An image of the RPE scale for reference:

![RPE Scale](https://www.researchgate.net/publication/327632653/figure/tbl2/AS:670492033830922@1536869166661/Rating-of-Perceived-Exertion-RPE-Scale-Borg-1962.png)
<sub>From researchgate.net</sub>

#### First, we will import necessary libraries

In [13]:
import matplotlib.pyplot as plt
from pycaret.datasets import get_data
from pycaret.regression import *

#### Next, we clean data
The new columns *weight* and *pace* are added using the `experimental_condition` column.
The RPE column has several NA values, so we replace those with necessary values

<sub>Will improve this explanation later if needed</sub>

In [2]:
df = get_data('data/initial_features')

df[['weight', 'pace']] = df['experimental_condition'].str.split('-', expand=True)

df['weight'] = df['weight'].str.replace('Condition ', '').astype(float)
df['pace'] = df['pace'].astype(int)

In [3]:
df['rpe'] = df['rpe'].fillna(method='ffill')
df['rpe'] = df['rpe'].astype(int)       # Change this to float if necessary

First, I will create a general model to predict one of the conditions. In this example, I will use condition 1.5 - 15.

In [42]:
filt_condition = df['experimental_condition'] == 'Condition 1.5-15'
filt_subject = df['subject'] == 1

df_condition = df[filt_condition]

In [41]:
# Training a general model using specific condition (here, Condition 1.5-15)
train = df_condition.loc[df_condition['subject'].isin(range(1, 13))]
test = df_condition.loc[df_condition['subject'].isin(range(13, 16))]

reg = setup(data=train, target='rpe')
model_condition = compare_models()
predictions = predict_model(model_condition, data=test)

Unnamed: 0,Description,Value
0,Session id,4330
1,Target,rpe
2,Target type,Regression
3,Original data shape,"(637, 61)"
4,Transformed data shape,"(637, 61)"
5,Transformed train set shape,"(445, 61)"
6,Transformed test set shape,"(192, 61)"
7,Numeric features,59
8,Categorical features,1
9,Preprocess,True


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,2.6768,12.8598,3.5861,-0.5473,1.1076,0.6974


Processing:   0%|          | 0/77 [00:00<?, ?it/s]

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,2.6768,12.8598,3.5861,-0.5473,1.1076,0.6974


Now, I'll create a general model for all conditions

In [38]:
# Training a general model for all conditions
train = df.loc[df['subject'].isin(range(1, 13))]
test = df.loc[df['subject'].isin(range(13, 16))]

reg = setup(data=train, target='rpe')
model_general = compare_models()
predictions = predict_model(model_general, data=test)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,3.0292,14.2658,3.777,-1.0989,1.1613,0.9938


Processing:   0%|          | 0/77 [00:00<?, ?it/s]

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,3.0292,14.2658,3.777,-1.0989,1.1613,0.9938


We can see that the model for all conditions is worse than the model for specified predictions, although neither are very good.

Now, we'll create an individual model for a sample subject. For this example, we will create an individual model for subject 1.

In [44]:
df_subject = df[filt_subject]

train = df_subject.sample(frac=0.8, random_state=42)
test = df_subject.drop(train.index)

reg = setup(data=train, target='rpe')
model_subject = compare_models()
predictions = predict_model(model_subject, data=test)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,0.1393,0.0764,0.2765,0.9222,0.1628,0.1258


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,0.1393,0.0764,0.2765,0.9222,0.1628,0.1258


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,K Neighbors Regressor,0.1393,0.0764,0.2765,0.9222,0.1628,0.1258


So far, this model has been the best/most accurate at predicting. But what if we create a general model for subject 1 and specify a condition? In this next example, we will create an individual model for subject 1 under Condition 1.5-15.

In [49]:
df_subject_condition = df[(df['experimental_condition'] == 'Condition 2.5-15') & filt_subject]

train = df_subject_condition.sample(frac=0.8, random_state=42)
test = df_subject_condition.drop(train.index)

reg = setup(data=train, target='rpe')
model_subject_condition = compare_models()
predictions = predict_model(model_subject_condition, data=test)

Unnamed: 0,Description,Value
0,Session id,7209
1,Target,rpe
2,Target type,Regression
3,Original data shape,"(78, 61)"
4,Transformed data shape,"(78, 61)"
5,Transformed train set shape,"(54, 61)"
6,Transformed test set shape,"(24, 61)"
7,Numeric features,59
8,Categorical features,1
9,Preprocess,True


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,0.165,0.0404,0.2011,0.9648,0.1133,0.0862


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,0.165,0.0404,0.2011,0.9648,0.1133,0.0862


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Extra Trees Regressor,0.165,0.0404,0.2011,0.9648,0.1133,0.0862


This model is the most accurate at predicting so far. However, it is important to note that this model is relying only on 98 rows of data to make its analysis, so it may not be entirely reliable. We can see the size of each of the datasets used below:

In [67]:
print("--= Number of rows =--")
print("General model: \t\t" + str(df.shape[0]))
print("Subject model: \t\t" + str(df_subject.shape[0]))
print("Condition model: \t" + str(df_condition.shape[0]))
print("Subject-condition model: " + str(df_subject_condition.shape[0]))

--= Number of rows =--
General model: 		2895
Subject model: 		280
Condition model: 	899
Subject-condition model: 98


We can see that the subject-condition model has by far the smallest dataset, being only 35% the size of the next smallest data set (the subject model).