Instructions
Problem statement: Use multiple linear regression for calibrating robot control.
Dataset description: Download the robot data with file name RobotKinematics_Dataset.csv to understand robot dynamics.
The data is obtained from a three link robot:

Task:

1. Read the dataset.
2. Segregate Independent variables to be saved in X and Dependent variables (Only Y1 alone) to be saved in y variable
Use a Train-Test split ratio of 70-30%. Also use stratified K- Fold cross validation with K=5.
3. Create different regression models using Linear Regression, Polynomial Regression, Support Vector Machine Regression and Random Forest Regression to Train with the above data to predict the dependent variable Y1 alone.
4. Compare the performance of all the models with R2Score performance metric and write inference about the best ML regression model.




In [54]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import KBinsDiscretizer, PolynomialFeatures  # Added PolynomialFeatures import
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor


In [55]:
data = pd.read_csv(r"/home/aaliya/Downloads/RobotKinematics_LabTaskRegression.csv", delimiter=',')
data.head()

Unnamed: 0,Time (secs),J1 Joint ang (Rads),J2 Joint angle (Rads),J3 Joint angIe (Rads),J1 Joint Velocity (rads/sec),J2 Joint Velocity (rads/sec),J3 Joint Velocity (rads/sec),J1 Motor current I1 (A),J2 Motor current I2 (A),J3 Motor current I3 (A),Strain gauge messurement for J0-J1,Strain gauge messurement for J1-J2,Strain gauge messurement for J2-J3,Strain gauge messurement J3,J1 Joint accelerations (ran/sec^2),J2 Joint accelerations (ran/sec^2),J3 Joint accelerations (ran/sec^2)
0,0.0,-7e-06,2.4958,-1.1345,-7.879999999999999e-21,-4.94e-321,3.91e-29,-0.081623,-0.40812,-0.30609,-269.25,-113.2,3.5918,1.5786,-9.9e-19,-6.2103e-319,4.92e-27
1,0.01,-7e-06,2.4958,-1.1345,-2.26e-21,-4.94e-321,2.63e-31,-0.037411,-0.37241,-0.26698,-270.91,-116.05,1.4585,-1.7398,4.2499999999999995e-19,-1.7669e-319,-1.38e-27
2,0.02,-7e-06,2.4958,-1.1345,-6.47e-22,-4.94e-321,1.7600000000000002e-33,-0.066319,-0.40302,-0.31459,-269.25,-112.97,3.5918,0.86753,3.23e-19,-4.9906e-320,-4.120000000000001e-28
3,0.03,-7e-06,2.4958,-1.1345,-1.85e-22,-4.94e-321,1.18e-35,-0.06802,-0.43703,-0.28398,-269.97,-114.39,1.6956,-0.08059,1.5e-19,-1.3943e-320,-1.17e-28
4,0.04,-7e-06,2.4958,-1.1345,-5.31e-23,-4.94e-321,-0.0052709,-0.052715,-0.40472,-0.30779,-269.97,-114.15,3.1177,0.86753,5.93e-20,-3.58e-321,-0.37708


In [56]:
#INDEPENDENT VAR = X
#DEPENDENT VAR   = Y = J1 Joint accelerations (ran/sec^2)
dependent_var = 'J1 Joint accelerations (ran/sec^2)'
data.columns = data.columns.str.strip()
independent_vars = [col for col in data.columns if col != dependent_var]
y = data[[dependent_var]]
x = data[independent_vars]

In [57]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)
#Total=8000; 70%=5600=train; 30%=2400=test
print(f"Training set size: {x_train.shape[0]} samples")
print(f"Testing set size: {x_test.shape[0]} samples")

Training set size: 5600 samples
Testing set size: 2400 samples


In [58]:
# Convert continuous target variable into discrete bins for stratification

n_bins = 10  # Number of bins
binning = KBinsDiscretizer(n_bins=n_bins, encode='ordinal', strategy='uniform', subsample=None)
y_binned = binning.fit_transform(y.values.reshape(-1, 1)).ravel()

# Split data into training and test sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)

# Initialize Stratified K-Fold Cross-Validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

In [66]:
#LINEAR REGRESSION
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import StratifiedKFold, train_test_split

print("Linear Regression:")
linear_model = LinearRegression()
mse_scores_linear = []
r2_scores_linear = []

# Perform Stratified K-Fold Cross-Validation
for train_index, val_index in skf.split(x_train, y_binned[x_train.index]):
    X_train_fold, X_val_fold = x_train.iloc[train_index], x_train.iloc[val_index]
    y_train_fold, y_val_fold = y_train.iloc[train_index], y_train.iloc[val_index]
    
    # Train the model
    linear_model.fit(X_train_fold, y_train_fold)
    
    # Predict and evaluate
    y_val_pred = linear_model.predict(X_val_fold)
    mse = mean_squared_error(y_val_fold, y_val_pred)
    r2 = r2_score(y_val_fold, y_val_pred)
    
    mse_scores_linear.append(mse)
    r2_scores_linear.append(r2)
    
    print(f"Fold Mean Squared Error: {mse}")
    print(f"Fold R^2 Score: {r2}")

# Print the average scores across all folds
avg_mse_linear = np.mean(mse_scores_linear)
avg_r2_linear = np.mean(r2_scores_linear)
print(f"Average Mean Squared Error: {avg_mse_linear}")
print(f"Average R^2 Score: {avg_r2_linear}")

# Evaluate on the test set
linear_model.fit(x_train, y_train)
y_test_pred_linear = linear_model.predict(x_test)
test_mse_linear = mean_squared_error(y_test, y_test_pred_linear)
test_r2_linear = r2_score(y_test, y_test_pred_linear)
print(f"Test Mean Squared Error: {test_mse_linear}")
print(f"Test R^2 Score: {test_r2_linear}")


Linear Regression:
Fold Mean Squared Error: 0.0031888689063741893
Fold R^2 Score: 0.8071279937392073
Fold Mean Squared Error: 0.00268879127243764
Fold R^2 Score: 0.833909229564493
Fold Mean Squared Error: 0.0028841573261441495
Fold R^2 Score: 0.82219054206689
Fold Mean Squared Error: 0.0029735873051123505
Fold R^2 Score: 0.8174713438733499
Fold Mean Squared Error: 0.0027247124013735423
Fold R^2 Score: 0.8337993422769798
Average Mean Squared Error: 0.0028920234422883744
Average R^2 Score: 0.8228996903041841
Test Mean Squared Error: 0.002875399042630222
Test R^2 Score: 0.8213428492570343


In [67]:
#POLYNOMIAL REGRESSION - DEGREE 2

from sklearn.preprocessing import PolynomialFeatures

print("Polynomial Regression (Degree 2):")
poly = PolynomialFeatures(degree=2)
x_poly_train = poly.fit_transform(x_train)
x_poly_test = poly.transform(x_test)

poly_model = LinearRegression()
mse_scores_poly = []
r2_scores_poly = []

# Perform Stratified K-Fold Cross-Validation
for train_index, val_index in skf.split(x_poly_train, y_binned[x_train.index]):
    X_train_fold, X_val_fold = x_poly_train[train_index], x_poly_train[val_index]
    y_train_fold, y_val_fold = y_train.iloc[train_index], y_train.iloc[val_index]
    
    # Train the model
    poly_model.fit(X_train_fold, y_train_fold)
    
    # Predict and evaluate
    y_val_pred = poly_model.predict(X_val_fold)
    mse = mean_squared_error(y_val_fold, y_val_pred)
    r2 = r2_score(y_val_fold, y_val_pred)
    
    mse_scores_poly.append(mse)
    r2_scores_poly.append(r2)
    
    print(f"Fold Mean Squared Error: {mse}")
    print(f"Fold R^2 Score: {r2}")

# Print the average scores across all folds
avg_mse_poly = np.mean(mse_scores_poly)
avg_r2_poly = np.mean(r2_scores_poly)
print(f"Average Mean Squared Error: {avg_mse_poly}")
print(f"Average R^2 Score: {avg_r2_poly}")

# Evaluate on the test set
poly_model.fit(x_poly_train, y_train)
y_test_pred_poly = poly_model.predict(x_poly_test)
test_mse_poly = mean_squared_error(y_test, y_test_pred_poly)
test_r2_poly = r2_score(y_test, y_test_pred_poly)
print(f"Test Mean Squared Error: {test_mse_poly}")
print(f"Test R^2 Score: {test_r2_poly}")


Polynomial Regression (Degree 2):
Fold Mean Squared Error: 0.0029960415386608677
Fold R^2 Score: 0.8187907501474483
Fold Mean Squared Error: 0.002601988286397949
Fold R^2 Score: 0.8392711834562743
Fold Mean Squared Error: 0.0028194913551745455
Fold R^2 Score: 0.8261772251582024
Fold Mean Squared Error: 0.0028818920092211894
Fold R^2 Score: 0.823099905410243
Fold Mean Squared Error: 0.0027283010906958434
Fold R^2 Score: 0.833580441182894
Average Mean Squared Error: 0.0028055428560300796
Average R^2 Score: 0.8281839010710124
Test Mean Squared Error: 0.0028118621854597305
Test R^2 Score: 0.8252905844064692


In [68]:
#SUPPORT VECTOR MACHINE REGRESSOR
from sklearn.svm import SVR

print("Support Vector Machine Regression:")
svm_model = SVR(kernel='rbf')
mse_scores_svr = []
r2_scores_svr = []

# Perform Stratified K-Fold Cross-Validation
for train_index, val_index in skf.split(x_train, y_binned[x_train.index]):
    X_train_fold, X_val_fold = x_train.iloc[train_index], x_train.iloc[val_index]
    y_train_fold, y_val_fold = y_train.iloc[train_index], y_train.iloc[val_index]
    
    # Train the model
    svm_model.fit(X_train_fold, y_train_fold.values.ravel())
    
    # Predict and evaluate
    y_val_pred = svm_model.predict(X_val_fold)
    mse = mean_squared_error(y_val_fold, y_val_pred)
    r2 = r2_score(y_val_fold, y_val_pred)
    
    mse_scores_svr.append(mse)
    r2_scores_svr.append(r2)
    
    print(f"Fold Mean Squared Error: {mse}")
    print(f"Fold R^2 Score: {r2}")

# Print the average scores across all folds
avg_mse_svr = np.mean(mse_scores_svr)
avg_r2_svr = np.mean(r2_scores_svr)
print(f"Average Mean Squared Error: {avg_mse_svr}")
print(f"Average R^2 Score: {avg_r2_svr}")

# Evaluate on the test set
svm_model.fit(x_train, y_train.values.ravel())
y_test_pred_svr = svm_model.predict(x_test)
test_mse_svr = mean_squared_error(y_test, y_test_pred_svr)
test_r2_svr = r2_score(y_test, y_test_pred_svr)
print(f"Test Mean Squared Error: {test_mse_svr}")
print(f"Test R^2 Score: {test_r2_svr}")


Support Vector Machine Regression:
Fold Mean Squared Error: 0.017477403756003807
Fold R^2 Score: -0.05708388322653102
Fold Mean Squared Error: 0.016742587663498756
Fold R^2 Score: -0.03421537871680158
Fold Mean Squared Error: 0.016929201765116076
Fold R^2 Score: -0.04369209051430589
Fold Mean Squared Error: 0.016676926514049173
Fold R^2 Score: -0.023685088949249478
Fold Mean Squared Error: 0.017071245817778996
Fold R^2 Score: -0.04130339834617125
Average Mean Squared Error: 0.01697947310328936
Average R^2 Score: -0.039995967950611847
Test Mean Squared Error: 0.01689317856955581
Test R^2 Score: -0.04962375847083389


In [69]:
from sklearn.ensemble import RandomForestRegressor

print("Random Forest Regression:")
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
mse_scores_rf = []
r2_scores_rf = []

# Perform Stratified K-Fold Cross-Validation
for train_index, val_index in skf.split(x_train, y_binned[x_train.index]):
    X_train_fold, X_val_fold = x_train.iloc[train_index], x_train.iloc[val_index]
    y_train_fold, y_val_fold = y_train.iloc[train_index], y_train.iloc[val_index]
    
    # Train the model
    rf_model.fit(X_train_fold, y_train_fold.values.ravel())
    
    # Predict and evaluate
    y_val_pred = rf_model.predict(X_val_fold)
    mse = mean_squared_error(y_val_fold, y_val_pred)
    r2 = r2_score(y_val_fold, y_val_pred)
    
    mse_scores_rf.append(mse)
    r2_scores_rf.append(r2)
    
    print(f"Fold Mean Squared Error: {mse}")
    print(f"Fold R^2 Score: {r2}")

# Print the average scores across all folds
avg_mse_rf = np.mean(mse_scores_rf)
avg_r2_rf = np.mean(r2_scores_rf)
print(f"Average Mean Squared Error: {avg_mse_rf}")
print(f"Average R^2 Score: {avg_r2_rf}")

# Evaluate on the test set
rf_model.fit(x_train, y_train.values.ravel())
y_test_pred_rf = rf_model.predict(x_test)
test_mse_rf = mean_squared_error(y_test, y_test_pred_rf)
test_r2_rf = r2_score(y_test, y_test_pred_rf)
print(f"Test Mean Squared Error: {test_mse_rf}")
print(f"Test R^2 Score: {test_r2_rf}")


Random Forest Regression:
Fold Mean Squared Error: 0.002032717998509777
Fold R^2 Score: 0.8770553415503123
Fold Mean Squared Error: 0.0017254122745916523
Fold R^2 Score: 0.8934186312848296
Fold Mean Squared Error: 0.002023285249270569
Fold R^2 Score: 0.8752636514812381
Fold Mean Squared Error: 0.002132795751842803
Fold R^2 Score: 0.8690819194354253
Fold Mean Squared Error: 0.0017431846834765209
Fold R^2 Score: 0.8936700839397054
Average Mean Squared Error: 0.0019314791915382643
Average R^2 Score: 0.8816979255383022
Test Mean Squared Error: 0.0017560492903950305
Test R^2 Score: 0.8908914004161304


In [73]:
# Compare performance of all models
models = ["Linear Regression","Polynomial Regression (Degree 2)","Support Vector Machine Regression","Random Forest Regression"]
avg_r2_scores = [avg_r2_linear, avg_r2_poly, avg_r2_svr, avg_r2_rf]
test_r2_scores = [test_r2_linear, test_r2_poly, test_r2_svr, test_r2_rf]
print("\nModel Performance Comparison:")
for i in range(len(models)):
    print(f"{models[i]}: Avg R² (CV) = {avg_r2_scores[i]}, Test R² = {test_r2_scores[i]}")
#Best  model
best_index = np.argmax(test_r2_scores)
print(f"\nBest Model: {models[best_index]} with Test R² = {test_r2_scores[best_index]}")


Model Performance Comparison:
Linear Regression: Avg R² (CV) = 0.8228996903041841, Test R² = 0.8213428492570343
Polynomial Regression (Degree 2): Avg R² (CV) = 0.8281839010710124, Test R² = 0.8252905844064692
Support Vector Machine Regression: Avg R² (CV) = -0.039995967950611847, Test R² = -0.04962375847083389
Random Forest Regression: Avg R² (CV) = 0.8816979255383022, Test R² = 0.8908914004161304

Best Model: Random Forest Regression with Test R² = 0.8908914004161304
