# Project Description

### Comparison of Machine Learning Models for Predicting Energy Efficiency in Metallurgical Heat Treatment Furnaces

This project aims to compare different machine learning approaches to predict the energy efficiency of a metallurgical heat treatment furnace. Input features include soaking time, total cycle time, and load weight in tons. The methods to be evaluated are Support Vector Machine, XGBoost Regressor, Stochastic Gradient Descent Regression, and Linear Regression, all implemented using the Scikit Learn library.  
  
The context involves the pursuit of accurate predictive models that can optimize the energy efficiency of the heat treatment process, reducing costs and environmental impacts. The problem lies in selecting the most suitable machine learning model for this task, considering the complexity of the process and the limited availability of data.    

The main objective is to identify the model that best fits the provided data and provides the most accurate predictions in terms of energy efficiency. The methodology will involve implementing and evaluating the different models using performance metrics such as RMSE and R².    

Expected results include insights into the relative effectiveness of different models in predicting the energy efficiency of metallurgical heat treatment furnaces, providing useful guidance for industrial process optimization.  
  
Two equipments will be evaluated, one that uses natural gas as energy source, and other uses electricity. Both are able to perform the same jobs and the decision of which one to use is based on cost and availability.  
So, the report will be interesting to the client to find out which one has a better energy efficiency, and drive the decision by this KPI.

# Importing necessary libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import math

# Data Load

In this section, we will load two files, one for each equipment used for comparison.   

In [2]:
dfGas = pd.read_csv('data/GasFurnace.csv')
dfElectric = pd.read_csv('data/ElectricFurnace.csv')
dfFull = pd.concat([dfGas, dfElectric])
dfFull.head()

Unnamed: 0,id,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,EnergyConsumption_m3,EnergyConsumption_kWh
0,6329,,1111.111,3.0,930,3.1,
1,6328,7.1167,1.76,3.0,1080,645.6,
2,6327,1.7489,1.76,3.0,1080,93.86,
3,6326,5.8317,6.734,4.0,620,276.58,
4,6325,10.3722,6.734,4.0,930,579.58,


In [3]:
dfFull.describe()

Unnamed: 0,id,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,EnergyConsumption_m3,EnergyConsumption_kWh
count,1385.0,1354.0,1385.0,1385.0,1385.0,864.0,486.0
mean,3895.258484,7.269606,8.511727,3.583755,781.371841,324.590382,1604.041152
std,2667.066748,3.87329,51.784968,1.506121,161.302487,152.546839,1013.389585
min,101.0,0.0139,0.94,1.25,480.0,1.07,0.0
25%,447.0,4.8585,4.853,3.0,620.0,183.715,658.0
50%,5637.0,7.3039,5.696,3.0,900.0,334.77,1634.0
75%,5983.0,8.583375,6.675,4.0,930.0,448.91,2527.75
max,6329.0,99.7442,1111.111,17.0,1080.0,1108.06,3841.0


I know that both equipments have 12T of capacity. So, I'll exclude all lines which have a higher weight. Also, all lines with no value on `TotalCycle_h` will be removed.  
Also, all rows where `TotalCycle_h` is lower than `SoakingTime_h` will be removed, beacuse they are a mistake in the appointment. 

In [4]:
dfFull = dfFull[dfFull['TotalCycle_h'].notna()]
dfFull = dfFull[dfFull['TotalCycle_h'] > dfFull['SoakingTime_h']]
dfFull = dfFull[dfFull['Weight_T'] < 12]
dfFull.describe()


Unnamed: 0,id,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,EnergyConsumption_m3,EnergyConsumption_kWh
count,1309.0,1309.0,1309.0,1309.0,1309.0,844.0,465.0
mean,3917.432391,7.440719,5.689717,3.555577,781.18411,328.639858,1656.836559
std,2658.075475,3.781877,1.56722,1.455067,161.168479,149.99287,982.441171
min,101.0,3.1122,0.94,2.0,480.0,1.07,0.0
25%,453.0,4.9133,4.853,3.0,620.0,185.005,716.0
50%,5640.0,7.3472,5.696,3.0,900.0,347.47,1760.0
75%,5979.0,8.6592,6.641,4.0,930.0,449.6225,2552.0
max,6328.0,99.7442,10.26,17.0,1080.0,1108.06,3576.0


The higher variance is still on the column TotalCycle_h. Therefore, I'll use the outlier main rule to eliminate these values.  
Anything above Q3 + 1.5 x IQR or below Q1 - 1.5 x IQR is an outlier.  
IQR = Q3 - Q1  

  
And the columns `id` will be removed, it is no longer important.

In [5]:
iqr = dfFull['TotalCycle_h'].quantile(.75) - dfFull['TotalCycle_h'].quantile(.25)
upperLimit = dfFull['TotalCycle_h'].quantile(.75) + 1.5*iqr
lowerLimit = dfFull['TotalCycle_h'].quantile(.25) - 1.5*iqr

dfFull = dfFull[(dfFull['TotalCycle_h']>=lowerLimit) & (dfFull['TotalCycle_h']<=upperLimit)]

dfFull.drop(['id'], axis=1, inplace=True)
dfFull.reset_index(inplace=True, drop=True)

dfFull.describe()


Unnamed: 0,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,EnergyConsumption_m3,EnergyConsumption_kWh
count,1281.0,1281.0,1281.0,1281.0,840.0,441.0
mean,7.148611,5.692038,3.444379,779.601874,327.146286,1625.331066
std,2.312704,1.550441,1.033768,161.178998,146.181424,969.005047
min,3.1122,0.94,2.0,480.0,1.07,0.0
25%,4.8894,4.863,3.0,620.0,185.005,706.0
50%,7.3122,5.696,3.0,900.0,345.715,1704.0
75%,8.515,6.63,4.0,930.0,449.03,2511.0
max,13.7039,10.26,7.0,1080.0,731.92,3562.0


# Feature Engineering
  
In this section, we will perform some feature engineering. The objective here is to calculate the "extra time" needed to the process, and calculate the energy consumption using the same basis.  
  
After that, we calculate the energy that is absorbed by the metal, considering that at rest, it's temperature would be environmental temperature = 25°C, and it's final temperature, the informed on the column `TargetTemp_C`.  

When we divide the total heat delivered by the total heat "used", we have the efficiency. 
  
Due to possible errors on data, it is needed to remove errors on effciency column: all data NaN or higher than 1 is definietely a mistake.

In [6]:
# dfNew = dfFull.copy()

dfFull['ExtraTime_h'] = dfFull['TotalCycle_h'] - dfFull['SoakingTime_h']

pciGas = 9400 #kcal / m³ --> Source: https://www.cegas.com.br/gas-natural/equivalencia-energetica/
pciGas = pciGas * 4.1868 # Converting from kcal/m³ to J/m³
convkWhtoJ = 3600 # 1kWh = 3600 kJ
tEnv = 25
cpSteel = 450 / 1000 # kJ/kg * °C

dfFull['EnergyAbsorved_kJ'] = ( ( dfFull['Weight_T'] * 1000 ) * ( dfFull['TargetTemp_C'] - tEnv ) ) * cpSteel # J

# pd.options.display.float_format = '{:.0f}'.format
for idx, row in dfFull.iterrows():
    if not math.isnan(row['EnergyConsumption_m3']):
        dfFull.loc[idx, 'EnergyUsed_kJ'] = round(row['EnergyConsumption_m3'] * pciGas,0)
        dfFull.loc[idx, 'Equipment'] = 'A'
    if not math.isnan(row['EnergyConsumption_kWh']):
        dfFull.loc[idx, 'EnergyUsed_kJ'] = round(row['EnergyConsumption_kWh'] * convkWhtoJ,0)
        dfFull.loc[idx, 'Equipment'] = 'B'

dfFull.drop(['EnergyConsumption_kWh', 'EnergyConsumption_m3'], axis=1, inplace=True)

dfFull['EnergyUsed_kWh'] = dfFull['EnergyUsed_kJ'] / convkWhtoJ
dfFull['SpecificConsumption_kWh_t'] = ( dfFull['EnergyUsed_kWh'] ) / ( dfFull['Weight_T'] )

dfFull['Efficiency_pct'] = dfFull['EnergyAbsorved_kJ'] / dfFull['EnergyUsed_kJ']

dfFull = dfFull[dfFull['Efficiency_pct'].notna()]
dfFull = dfFull[dfFull['Efficiency_pct']<1]
# Avaliar consumo kWh/T e comparar


dfFull.describe()

Unnamed: 0,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,ExtraTime_h,EnergyAbsorved_kJ,EnergyUsed_kJ,EnergyUsed_kWh,SpecificConsumption_kWh_t,Efficiency_pct
count,1268.0,1268.0,1268.0,1268.0,1268.0,1268.0,1268.0,1268.0,1268.0,1268.0
mean,7.137731,5.694912,3.443218,779.045741,3.694514,1949827.0,10562080.0,2933.90982,517.852706,0.235175
std,2.310244,1.548269,1.034132,161.104968,1.812653,712479.2,6026112.0,1673.920033,301.535165,0.127647
min,3.1122,0.94,2.0,480.0,0.1122,243225.0,543600.0,151.0,72.502425,0.032885
25%,4.8848,4.87,3.0,620.0,1.8562,1440853.0,6554434.0,1820.675972,310.663065,0.151317
50%,7.30875,5.696,3.0,900.0,4.08635,1872478.0,8593282.0,2387.022917,424.795699,0.202317
75%,8.502475,6.63275,4.0,930.0,5.0102,2445129.0,16206280.0,4501.743333,732.05589,0.264006
max,13.7039,10.26,7.0,1080.0,9.9789,4045473.0,28805380.0,8001.495833,4010.129735,0.85342


In [7]:
dfFull[(dfFull['Equipment'] == 'A') & (dfFull['Weight_T'] > 8)].describe()

Unnamed: 0,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,ExtraTime_h,EnergyAbsorved_kJ,EnergyUsed_kJ,EnergyUsed_kWh,SpecificConsumption_kWh_t,Efficiency_pct
count,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0
mean,8.260593,9.146698,4.627907,761.860465,3.632686,3033608.0,16024450.0,4451.235904,486.669743,0.199545
std,1.858867,0.647622,1.000554,134.982465,1.47149,595848.0,5271465.0,1464.295937,160.500303,0.03602
min,4.9061,8.039,3.0,620.0,1.9061,2152442.0,7508716.0,2085.754444,259.454465,0.133241
25%,7.2025,8.432,5.0,650.0,2.2796,2639109.0,12329620.0,3424.894306,352.391756,0.17462
50%,7.7511,9.475,5.0,650.0,2.7697,2753438.0,14095720.0,3915.476667,422.158333,0.192892
75%,9.74735,9.5425,5.0,900.0,5.03225,3695738.0,21044400.0,5845.665972,631.352196,0.219539
max,12.4625,10.26,7.0,930.0,6.4447,4039875.0,24579740.0,6827.705556,820.882323,0.294847


In [8]:
dfFull.sample(20)

Unnamed: 0,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,ExtraTime_h,EnergyAbsorved_kJ,EnergyUsed_kJ,Equipment,EnergyUsed_kWh,SpecificConsumption_kWh_t,Efficiency_pct
347,10.2778,10.26,5.0,900,5.2778,4039875.0,21122322.0,A,5867.311667,571.862736,0.191261
182,7.5831,3.659,3.0,930,4.5831,1490127.75,16250846.0,A,4514.123889,1233.70426,0.091695
485,7.7606,6.65,3.0,930,4.7606,2708212.5,16603082.0,A,4611.967222,693.528906,0.163115
836,6.8492,4.458,3.0,930,3.8492,1815520.5,14328310.0,A,3980.086111,892.796346,0.126709
1100,5.9719,3.258,4.5,560,1.4719,784363.5,1238400.0,B,344.0,105.586249,0.633368
1027,10.0792,3.57,5.0,930,5.0792,1453882.5,9061200.0,B,2517.0,705.042017,0.160451
148,7.6814,6.102,3.0,930,4.6814,2485039.5,16929736.0,A,4702.704444,770.682472,0.146785
205,7.9961,5.625,3.0,930,4.9961,2290781.25,16564907.0,A,4601.363056,818.020099,0.138291
469,8.3811,6.549,3.0,930,5.3811,2667080.25,18944366.0,A,5262.323889,803.530904,0.140785
668,7.6161,6.433,3.0,930,4.6161,2619839.25,17493706.0,A,4859.362778,755.380503,0.149759


Now, the dataset seems to be clear and making sense. Let`s go to the modeling! 

In [9]:
px.scatter(dfFull, x='Efficiency_pct', y='SpecificConsumption_kWh_t', color='TargetTemp_C', color_continuous_scale=['yellow', 'red'])

In [10]:
dfCut = dfFull[['TotalCycle_h', 'Weight_T', 'SoakingTime_h', 'TargetTemp_C', 'Equipment', 'SpecificConsumption_kWh_t', 'Efficiency_pct']]
dfCut.head()

Unnamed: 0,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,Equipment,SpecificConsumption_kWh_t,Efficiency_pct
0,7.1167,1.76,3.0,1080,A,4010.129735,0.032885
1,5.8317,6.734,4.0,620,A,449.009174,0.165642
2,10.3722,6.734,4.0,930,A,940.909481,0.120229
3,7.5389,5.248,3.0,930,A,992.376302,0.113994
4,4.7797,6.612,3.0,620,A,363.183143,0.204786


# Modeling
  
Now, we are able to perform our modeling on the dataset.   
  
There will be explored some methods of linear regression, all  implemented using Scikit-learn, and evaluated some metrics. 

In [11]:
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

from sklearn.svm import LinearSVR
from sklearn import ensemble
from sklearn.linear_model import SGDRegressor
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

from sklearn.metrics import mean_absolute_error
from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import r2_score


In [12]:
le = LabelEncoder()

dfCut['Equipment'] = le.fit_transform(dfCut['Equipment'])



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [13]:
std=StandardScaler()
dfFullSc = dfCut.copy()

dfFullSc[['TotalCycle_h', 'Weight_T', 'SoakingTime_h', 'TargetTemp_C', 'SpecificConsumption_kWh_t', 'Efficiency_pct']] = std.fit_transform(
                dfFullSc[['TotalCycle_h', 'Weight_T', 'SoakingTime_h', 'TargetTemp_C', 'SpecificConsumption_kWh_t', 'Efficiency_pct']])
            

dfFullSc

Unnamed: 0,TotalCycle_h,Weight_T,SoakingTime_h,TargetTemp_C,Equipment,SpecificConsumption_kWh_t,Efficiency_pct
0,-0.009107,-2.542494,-0.428758,1.868800,0,11.586227,-1.585382
1,-0.565545,0.671393,0.538618,-0.987608,0,-0.228400,-0.544940
2,1.400607,0.671393,0.538618,0.937363,0,1.403563,-0.900851
3,0.173716,-0.288767,-0.428758,0.937363,0,1.574313,-0.949719
4,-1.021088,0.592564,-0.428758,-0.987608,0,-0.513143,-0.238161
...,...,...,...,...,...,...,...
1276,1.934095,-0.454178,0.538618,0.937363,1,0.174709,-0.289102
1277,-0.428189,0.674624,-0.428758,-0.863416,1,-1.281387,2.734273
1278,1.696407,0.000703,-0.428758,1.061555,1,-0.211251,0.152085
1279,-1.014592,-0.335935,-1.396134,-1.173895,1,-1.139155,1.328949


In [14]:
X_scaled = dfFullSc.drop(['Efficiency_pct', 'SpecificConsumption_kWh_t'], axis=True).values
y1_scaled = dfFullSc['Efficiency_pct'].values
y2_scaled = dfFullSc['SpecificConsumption_kWh_t'].values

X_Std = dfCut.drop(['Efficiency_pct', 'SpecificConsumption_kWh_t'], axis=True).values
y1_Std = dfCut['Efficiency_pct'].values
y2_Std = dfCut['SpecificConsumption_kWh_t'].values

In [15]:
X_train_scaled, X_test_scaled, y1_train_scaled, y1_test_scaled = train_test_split(X_scaled,y1_scaled,train_size=0.7, test_size=0.3, random_state = 13)
X_train_scaled, X_test_scaled, y2_train_scaled, y2_test_scaled = train_test_split(X_scaled,y2_scaled,train_size=0.7, test_size=0.3, random_state = 13)

X_train_Std, X_test_Std, y1_train_Std, y1_test_Std = train_test_split(X_Std,y1_Std,train_size=0.7, test_size=0.3, random_state = 13)
X_train_Std, X_test_Std, y2_train_Std, y2_test_Std = train_test_split(X_Std,y2_Std,train_size=0.7, test_size=0.3, random_state = 13)

## Linear SVR

In [16]:
# lsvr1Sc = LinearSVR(dual=False, random_state=13, tol=1e-5, max_iter=100000, loss='squared_epsilon_insensitive')
lsvr1Sc = LinearSVR()
lsvr1Sc.fit(X_train_scaled, y1_train_scaled)

lsvr1Std = LinearSVR()
lsvr1Std.fit(X_train_Std, y1_train_Std)

lsvr2Sc = LinearSVR()
lsvr2Sc.fit(X_train_scaled, y2_train_scaled)

lsvr2Std = LinearSVR()
lsvr2Std.fit(X_train_Std, y2_train_Std)




Liblinear failed to converge, increase the number of iterations.




Liblinear failed to converge, increase the number of iterations.




Liblinear failed to converge, increase the number of iterations.




Liblinear failed to converge, increase the number of iterations.



In [17]:
print("PREDICTION OF EFFICIENCY")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y1_test_scaled, lsvr1Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_scaled, lsvr1Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y1_test_scaled, lsvr1Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y1_test_Std, lsvr1Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_Std, lsvr1Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y1_test_Std, lsvr1Std.predict(X_test_Std)):.3f}")

print("-*"*50)
print("PREDICTION OF SPECIFIC CONSUMPTION")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y2_test_scaled, lsvr2Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_scaled, lsvr2Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y2_test_scaled, lsvr2Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y2_test_Std, lsvr2Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_Std, lsvr2Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y2_test_Std, lsvr2Std.predict(X_test_Std)):.3f}")


PREDICTION OF EFFICIENCY
----------
Scaled Data
MAE  -> 0.288
RMSE -> 0.616
R²   -> 0.614
----------
Regular Data
MAE  -> 0.089
RMSE -> 0.119
R²   -> 0.108
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
PREDICTION OF SPECIFIC CONSUMPTION
----------
Scaled Data
MAE  -> 0.262
RMSE -> 0.606
R²   -> 0.685
----------
Regular Data
MAE  -> 340.965
RMSE -> 389.534
R²   -> -0.430


## Gradient Boost Regression

In [18]:
# gbr1Sc = ensemble.GradientBoostingRegressor(n_estimators=500, max_depth=4, min_samples_split=5,learning_rate=0.01, loss="squared_error")
gbr1Sc = ensemble.GradientBoostingRegressor()
gbr1Sc.fit(X_train_scaled, y1_train_scaled)

gbr1Std = ensemble.GradientBoostingRegressor()
gbr1Std.fit(X_train_Std, y1_train_Std)

gbr2Sc = ensemble.GradientBoostingRegressor()
gbr2Sc.fit(X_train_scaled, y2_train_scaled)

gbr2Std = ensemble.GradientBoostingRegressor()
gbr2Std.fit(X_train_Std, y2_train_Std)

In [19]:
print("PREDICTION OF EFFICIENCY")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y1_test_scaled, gbr1Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_scaled, gbr1Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y1_test_scaled, gbr1Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y1_test_Std, gbr1Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_Std, gbr1Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y1_test_Std, gbr1Std.predict(X_test_Std)):.3f}")

print("-*"*50)
print("PREDICTION OF SPECIFIC CONSUMPTION")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y2_test_scaled, gbr2Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_scaled, gbr2Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y2_test_scaled, gbr2Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y2_test_Std, gbr2Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_Std, gbr2Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y2_test_Std, gbr2Std.predict(X_test_Std)):.3f}")

PREDICTION OF EFFICIENCY
----------
Scaled Data
MAE  -> 0.233
RMSE -> 0.486
R²   -> 0.760
----------
Regular Data
MAE  -> 0.030
RMSE -> 0.062
R²   -> 0.760
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
PREDICTION OF SPECIFIC CONSUMPTION
----------
Scaled Data
MAE  -> 0.167
RMSE -> 0.343
R²   -> 0.899
----------
Regular Data
MAE  -> 50.709
RMSE -> 103.686
R²   -> 0.899


## Stochastic Gradient Descent Regression

In [20]:
sgd1Sc = SGDRegressor()
sgd1Sc.fit(X_train_scaled, y1_train_scaled)

sgd1Std = SGDRegressor()
sgd1Std.fit(X_train_Std, y1_train_Std)

sgd2Sc = SGDRegressor()
sgd2Sc.fit(X_train_scaled, y2_train_scaled)

sgd2Std = SGDRegressor()
sgd2Std.fit(X_train_Std, y2_train_Std)

In [21]:
print("PREDICTION OF EFFICIENCY")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y1_test_scaled, sgd1Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_scaled, sgd1Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y1_test_scaled, sgd1Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y1_test_Std, sgd1Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_Std, sgd1Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y1_test_Std, sgd1Std.predict(X_test_Std)):.3f}")

print("-*"*50)
print("PREDICTION OF SPECIFIC CONSUMPTION")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y2_test_scaled, sgd2Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_scaled, sgd2Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y2_test_scaled, sgd2Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y2_test_Std, sgd2Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_Std, sgd2Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y2_test_Std, sgd2Std.predict(X_test_Std)):.3f}")

PREDICTION OF EFFICIENCY
----------
Scaled Data
MAE  -> 0.330
RMSE -> 0.586
R²   -> 0.651
----------
Regular Data
MAE  -> 171530890391518.406
RMSE -> 175065620579648.344
R²   -> -1913707503963129776472431001600.000
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
PREDICTION OF SPECIFIC CONSUMPTION
----------
Scaled Data
MAE  -> 0.270
RMSE -> 0.596
R²   -> 0.696
----------
Regular Data
MAE  -> 194688890354793.500
RMSE -> 198686181805388.469
R²   -> -372115608574979320315904.000


## Linear Regression

In [22]:
lr1Sc = LinearRegression()
lr1Sc.fit(X_train_scaled, y1_train_scaled)

lr1Std = LinearRegression()
lr1Std.fit(X_train_Std, y1_train_Std)

lr2Sc = LinearRegression()
lr2Sc.fit(X_train_scaled, y2_train_scaled)

lr2Std = LinearRegression()
lr2Std.fit(X_train_Std, y2_train_Std)

In [23]:
print("PREDICTION OF EFFICIENCY")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y1_test_scaled, lr1Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_scaled, lr1Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y1_test_scaled, lr1Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y1_test_Std, lr1Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_Std, lr1Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y1_test_Std, lr1Std.predict(X_test_Std)):.3f}")

print("-*"*50)
print("PREDICTION OF SPECIFIC CONSUMPTION")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y2_test_scaled, lr2Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_scaled, lr2Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y2_test_scaled, lr2Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y2_test_Std, lr2Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_Std, lr2Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y2_test_Std, lr2Std.predict(X_test_Std)):.3f}")

PREDICTION OF EFFICIENCY
----------
Scaled Data
MAE  -> 0.331
RMSE -> 0.589
R²   -> 0.648
----------
Regular Data
MAE  -> 0.042
RMSE -> 0.075
R²   -> 0.648
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
PREDICTION OF SPECIFIC CONSUMPTION
----------
Scaled Data
MAE  -> 0.272
RMSE -> 0.594
R²   -> 0.698
----------
Regular Data
MAE  -> 81.885
RMSE -> 179.042
R²   -> 0.698


## Decision Tree Regressor

In [24]:
dtr1Sc = DecisionTreeRegressor()
dtr1Sc.fit(X_train_scaled, y1_train_scaled)

dtr1Std = DecisionTreeRegressor()
dtr1Std.fit(X_train_Std, y1_train_Std)

dtr2Sc = DecisionTreeRegressor()
dtr2Sc.fit(X_train_scaled, y2_train_scaled)

dtr2Std = DecisionTreeRegressor()
dtr2Std.fit(X_train_Std, y2_train_Std)

In [25]:
print("PREDICTION OF EFFICIENCY")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y1_test_scaled, dtr1Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_scaled, dtr1Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y1_test_scaled, dtr1Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y1_test_Std, dtr1Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y1_test_Std, dtr1Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y1_test_Std, dtr1Std.predict(X_test_Std)):.3f}")

print("-*"*50)
print("PREDICTION OF SPECIFIC CONSUMPTION")
print("-"*10)
print("Scaled Data")
print(f"MAE  -> {mean_absolute_error(y2_test_scaled, dtr2Sc.predict(X_test_scaled)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_scaled, dtr2Sc.predict(X_test_scaled)):.3f}")
print(f"R²   -> {r2_score(y2_test_scaled, dtr2Sc.predict(X_test_scaled)):.3f}")
print("-"*10)
print("Regular Data")
print(f"MAE  -> {mean_absolute_error(y2_test_Std, dtr2Std.predict(X_test_Std)):.3f}")
print(f"RMSE -> {root_mean_squared_error(y2_test_Std, dtr2Std.predict(X_test_Std)):.3f}")
print(f"R²   -> {r2_score(y2_test_Std, dtr2Std.predict(X_test_Std)):.3f}")

PREDICTION OF EFFICIENCY
----------
Scaled Data
MAE  -> 0.269
RMSE -> 0.557
R²   -> 0.684
----------
Regular Data
MAE  -> 0.035
RMSE -> 0.072
R²   -> 0.672
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
PREDICTION OF SPECIFIC CONSUMPTION
----------
Scaled Data
MAE  -> 0.235
RMSE -> 0.423
R²   -> 0.847
----------
Regular Data
MAE  -> 73.700
RMSE -> 140.020
R²   -> 0.815


## Results

In [26]:
print(f"{'Target':^30s} | {'Model':^38s} | {'Data Type':^10s} | {'MAE':10s} | {'RMSE':10s} | {'R²':10s}")
print('---'*42)
print(f"{'Efficiency [%]':30s} | {'Linear SVR':<38s} | {'Scaled':<10s} | {0.287:10.3f} | {0.616:10.3f} | {0.614:10.3f}")
print(f"{'Efficiency [%]':30s} | {'Gradient Boost Regression':<38s} | {'Scaled':<10s} | {0.233:10.3f} | {0.485:10.3f} | {0.760:10.3f}")
print(f"{'Efficiency [%]':30s} | {'Stochastic Gradient Descent Regression':<38s} | {'Scaled':<10s} | {0.333:10.3f} | {0.587:10.3f} | {0.650:10.3f}")
print(f"{'Efficiency [%]':30s} | {'Linear Regression':<38s} | {'Scaled':<10s} | {0.331:10.3f} | {0.589:10.3f} | {0.648:10.3f}")
print(f"{'Efficiency [%]':30s} | {'Decision Tree Regression':<38s} | {'Scaled':<10s} | {0.282:10.3f} | {0.578:10.3f} | {0.661:10.3f}")
print(' - '*42)
print(f"{'Specific Consuption [kWh/t]':30s} | {'Linear SVR':<38s} | {'Scaled':<10s} | {0.262:10.3f} | {0.606:10.3f} | {0.685:10.3f}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Gradient Boost Regression':<38s} | {'Scaled':<10s} | {0.166:10.3f} | {0.342:10.3f} | {0.900:10.3f}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Stochastic Gradient Descent Regression':<38s} | {'Scaled':<10s} | {0.271:10.3f} | {0.596:10.3f} | {0.696:10.3f}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Linear Regression':<38s} | {'Scaled':<10s} | {0.272:10.3f} | {0.594:10.3f} | {0.698:10.3f}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Decision Tree Regression':<38s} | {'Scaled':<10s} | {0.236:10.3f} | {0.423:10.3f} | {0.847:10.3f}")
print('---'*42)



print(f"{'Efficiency [%]':30s} | {'Linear SVR':<38s} | {'Regular':<10s} | {0.312:10.3f} | {0.331:10.3f} | {-5.822:10.3f}")
print(f"{'Efficiency [%]':30s} | {'Gradient Boost Regression':<38s} | {'Regular':<10s} | {0.030:10.3f} | {0.062:10.3f} | {0.760:10.3f}")
print(f"{'Efficiency [%]':30s} | {'Stochastic Gradient Descent Regression':<38s} | {'Regular':<10s} | {'Error':>10s} | {'Error':>10s} | {'Error':>10s}")
print(f"{'Efficiency [%]':30s} | {'Linear Regression':<38s} | {'Regular':<10s} | {0.042:10.3f} | {0.075:10.3f} | {0.648:10.3f}")
print(f"{'Efficiency [%]':30s} | {'Decision Tree Regression':<38s} | {'Regular':<10s} | {0.037:10.3f} | {0.074:10.3f} | {0.656:10.3f}")

print(' - '*42)

print(f"{'Specific Consuption [kWh/t]':30s} | {'Linear SVR':<38s} | {'Regular':<10s} | {217.343:10.3f} | {318.783:10.3f} | {0.042:10.3f}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Gradient Boost Regression':<38s} | {'Regular':<10s} | {50.668:10.3f} | {103.590:10.3f} | {0.899:10.3f}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Stochastic Gradient Descent Regression':<38s} | {'Regular':<10s} | {'Error':>10s} | {'Error':>10s} | {'Error':>10s}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Linear Regression':<38s} | {'Regular':<10s} | {81.885:10.3f} | {179.042:10.3f} | {0.698:10.3f}")
print(f"{'Specific Consuption [kWh/t]':30s} | {'Decision Tree Regression':<38s} | {'Regular':<10s} | {72.151:10.3f} | {136.478:10.3f} | {0.824:10.3f}")

            Target             |                 Model                  | Data Type  | MAE        | RMSE       | R²        
------------------------------------------------------------------------------------------------------------------------------
Efficiency [%]                 | Linear SVR                             | Scaled     |      0.287 |      0.616 |      0.614
Efficiency [%]                 | Gradient Boost Regression              | Scaled     |      0.233 |      0.485 |      0.760
Efficiency [%]                 | Stochastic Gradient Descent Regression | Scaled     |      0.333 |      0.587 |      0.650
Efficiency [%]                 | Linear Regression                      | Scaled     |      0.331 |      0.589 |      0.648
Efficiency [%]                 | Decision Tree Regression               | Scaled     |      0.282 |      0.578 |      0.661
 -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 
Sp