# Learning Supervised - Regression Models
### Evaluation Metrics in Regression
![Evaluation Metrics in Regression ](./images/metricsregression.png)

Here are some commonly used metrics in regression:


1. Mean Squared Error (MSE): It calculates the average squared difference between the predicted and actual values. It penalizes large errors more than smaller ones.

2. Root Mean Squared Error (RMSE): It is the square root of the MSE and provides an interpretable metric in the same units as the target variable.

3. Mean Absolute Error (MAE): It calculates the average absolute difference between the predicted and actual values. It is less sensitive to outliers compared to MSE.

4. R-squared (R²): It measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with higher values indicating a better fit.

5. Mean Squared Logarithmic Error (MSLE): It calculates the mean logarithmic difference between the predicted and actual values, which is useful when the target variable has a wide range of values.

6. Median Absolute Error: It calculates the median absolute difference between the predicted and actual values, which is less sensitive to outliers compared to MAE.

7. Mean Percentage Error (MPE): It calculates the average percentage difference between the predicted and actual values, indicating the average magnitude of error as a percentage.

# 1. Machine Learning

## Simple Linear Regression

It is the basic model that involves a single independent variable and a dependent variable. The general equation is of the form: Y = β₀ + β₁X + ɛ, where Y is the dependent variable, X is the independent variable, β₀ is the intercept coefficient, β₁ is the slope coefficient, and ɛ is the error term

In [15]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.datasets import load_boston

# Example data
x = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]]) #arrray 2 arra.reshape((-1, 1))
y = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30])
# We create an instance of the  linear regression model.
model = LinearRegression()
# We performed k-fold cross-validation.
rls_cv = cross_val_predict(model, x, y, cv=5)
# We run the model
model.fit(x, y)
#We performed a prediction for a new value
x_test = [[22]]
y_pred = model.predict(x_test)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse = mean_squared_error(y, rls_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse = np.sqrt(mean_squared_error(y, rls_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2 = r2_score(y, rls_cv)
print("Predicción del nuevo valor:", y_pred)
# Print Results
print("MSE:", mse)
print("rmse:", rmse)
print("R^2:", r2)

Predicción del nuevo valor: [35.12267658]
MSE: 6.230718103442017
rmse: 2.4961406417591974
R^2: 0.9193381275908431


## Weighted Linear Regression Model (ponderada)

In certain cases, it can be useful to assign different weights to observations based on their relative importance. This is achieved by using a weighted linear regression model, where different weights are applied to each observation based on some criterion.

In [16]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Example data
x1 = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]])
y1 = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30])

# We calculate the weights based on the values of y.
#For this case, we used the logarithm function as the weighting factor
weights =  np.log(y1)

# We create an instance of the  linear regression model
model1 = LinearRegression()

# We performed k-fold cross-validation weighted
mrp_cv = cross_val_predict(model1, x1, y1, cv=5, fit_params={'sample_weight': weights})

# We run the model
model1.fit(x1, y1, sample_weight=weights)
# We performed a prediction for a new value
X_test = [[22]]
y_pred1 = model1.predict(X_test)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_mrp = mean_squared_error(y1, mrp_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_mrp = np.sqrt(mean_squared_error(y1, mrp_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_mrp = r2_score(y1, mrp_cv)
print("Predicción del nuevo valor:", y_pred1)
# Print Results
print("MSE:", mse_mrp)
print("rmse:", rmse_mrp)
print("R^2:", r2_mrp)


Predicción del nuevo valor: [34.45724063]
MSE: 5.822059073994034
rmse: 2.4128943354390873
R^2: 0.9246285615255726


## Polynomial Linear Regression


Instead of assuming an exact linear relationship between variables, this model allows for fitting a higher-degree polynomial relationship. For example, a polynomial linear regression model of degree 2 would have an equation of the form: Y = β₀ + β₁X + β₂X² + ɛ.

In [17]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Example data
x2 = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]])
y2 = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30])

# Second-degree Polynomial Transformation (degree 2)
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(x2)
# We create an instance of the  linear regression model
model2 = LinearRegression()
# We performed k-fold cross-validation polynomial
rlp_cv = cross_val_predict(model2, X_poly, y2, cv=5)
# We run the model
model2.fit(X_poly, y2)
# We performed a prediction for a new value
X_new = np.array([[22]])
X_new_poly = poly.transform(X_new)
y_pred2 = model2.predict(X_new_poly)


# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_rlp = mean_squared_error(y2, rlp_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_rlp = np.sqrt(mean_squared_error(y2, rlp_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_rlp = r2_score(y2, rlp_cv)
print("Predicción del nuevo valor:", y_pred2)
# Print Results
print("MSE:", mse_rlp)
print("rmse:", rmse_rlp)
print("R^2:", r2_rlp)


Predicción del nuevo valor: [30.56049676]
MSE: 0.38279677082133096
rmse: 0.6187057223117716
R^2: 0.9950443746974253


### Comparison of Model Results - Parte one

In [18]:
import numpy as np
import plotly.graph_objects as go

# Create the figure and the strokes
xbase = x1.flatten() # flatten input x for plot exe x
fig = go.Figure()
fig.add_trace(go.Scatter(x=xbase, y=y, name='Real'))
fig.add_trace(go.Scatter(x=xbase, y=rls_cv, name='rls_cv'))
fig.add_trace(go.Scatter(x=xbase, y=mrp_cv, name='mrp_cv'))
fig.add_trace(go.Scatter(x=xbase, y=rlp_cv, name='rlp_cv'))

# Configured graphic design
fig.update_layout(
    title='Lineal regression models',
    xaxis_title='x- Input-value',
    yaxis_title='y- Output-Value',
    hovermode='x'  
)
# show interactive graphic
fig.show()


In [19]:
print("Variable Independiente   ",x.flatten() )
print("Valor real               ", y)
print("valor Regresion Simple   ",rls_cv.astype(int))
print("valor Regresion Ponderada",mrp_cv.astype(int))
print("valor RegresionPolinomial",rlp_cv.astype(int))

Variable Independiente    [ 1  2  3  4  5  6  7  8  9 10 14 16 18 20]
Valor real                [ 2  4  6  8 10 12 14 16 18 20 24 26 28 30]
valor Regresion Simple    [ 5  7  8  8  9 11 12 13 15 17 22 25 31 34]
valor Regresion Ponderada [ 6  7  8  9 10 12 12 14 15 17 22 25 30 33]
valor RegresionPolinomial [ 1  3  5  8 10 12 13 15 17 19 24 26 27 28]


## Multiple Linear Regression Model

The general form of a multiple linear regression model is expressed as: Y = β0 + β1X1 + β2X2 + ... + βn*Xn + ε

Where:

Y is the dependent variable you want to predict.

β0 is the intercept or bias term.

β1, β2, ..., βn are the regression coefficients that indicate the relationship
between the independent variables (X1, X2, ..., Xn) and the dependent variable.

X1, X2, ..., Xn are the independent variables.

ε is the error term or residual.


In [20]:
from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
# Independent variables
xm=np.array([[1,2], [2,3], [3,4], [4,5], [5,6],[6,7],[7,8],[8,9],[9,10],[10,11],[14,13],[16,15],[18,17],[20,19]])
ym = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30])  # Dependent variable(target)

# We create an instance of the  linear regression model
modelm = LinearRegression()

# We performed k-fold cross-validation multiple linear
rlm_cv = cross_val_predict(modelm, xm, ym, cv=5)
# we run model
modelm.fit(xm, ym)
# We performed a prediction for a new value
X_new = np.array([[22,21]])
y_pred3 = modelm.predict(X_new)
# Model coefficients
coeficientes = modelm.coef_
intercepto = modelm.intercept_

# Print Model coefficients and Intercept
print("Coeficientes:", coeficientes)
print("Intercepto:", intercepto)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_rlm = mean_squared_error(ym, rlm_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_rlm = np.sqrt(mean_squared_error(ym, rlm_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_rlm = r2_score(ym, rlm_cv)
print("Predicción del nuevo valor:", y_pred3)
print("Predicción test:", rlm_cv)
# Print Results
print("MSE:", mse_rlm)
print("rmse:", rmse_rlm)
print("R^2:", r2_rlm)

Coeficientes: [-0.57317073  2.37804878]
Intercepto: -1.3048780487804894
Predicción del nuevo valor: [36.02439024]
Predicción test: [ 4.5         6.08333333  7.66666667  8.34482759 10.14367816 11.94252874
 13.30232558 15.03100775 16.75968992 19.83870968 19.16129032 23.09677419
 30.92899408 34.8816568 ]
MSE: 5.768028437840462
rmse: 2.4016720087973007
R^2: 0.9253280334335052


## K-Nearest Neighbors Regression (KNN)

K-Nearest Neighbors Regression (KNN) is a machine learning algorithm used for regression tasks. It is a non-parametric algorithm that predicts the value of a dependent variable based on the values of its k nearest neighbors in the feature space.

In KNN regression, the training data consists of feature vectors and their corresponding target values. When a new data point needs to be predicted, the algorithm identifies the k nearest neighbors in the feature space, based on a distance metric (such as Euclidean distance). The predicted value for the new data point is then determined by averaging the target values of its k nearest neighbors.

The choice of the value of k determines the level of smoothing in the regression model. Smaller values of k can lead to more flexible and detailed predictions, but they may also be more sensitive to noise and outliers. Larger values of k provide smoother predictions but may oversimplify the relationship between the features and the target variable.

In [21]:
from sklearn.neighbors import KNeighborsRegressor
import numpy as np

# Example data
x4 = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]]) 
y4 = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30]) 

#  We create an instance of the  K-Nearest Neiggbors regression
model4 = KNeighborsRegressor(n_neighbors=2)
# We performed k-fold cross-validation 
knn_cv = cross_val_predict(model4, x4, y4, cv=5)
# we run the model
model4.fit(x, y4)

# We performed a prediction for a new value
X_pred = np.array([[22]])  
y_pred4 = model4.predict(X_pred)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_knn = mean_squared_error(y4, knn_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_knn = np.sqrt(mean_squared_error(y4, knn_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_rlm = r2_score(y4, knn_cv)
print("Predicción del nuevo valor:", y_pred4)
print("valor real:    ", y4)
print("Predicción test:", knn_cv.astype(int))
# Print Results
print("MSE:", mse_knn)
print("rmse:", rmse_knn)
print("R^2:", r2_rlm)

Predicción del nuevo valor: [29.]
valor real:     [ 2  4  6  8 10 12 14 16 18 20 24 26 28 30]
Predicción test: [ 9  9  9  5 10 15 11 16 16 17 23 29 25 25]
MSE: 11.928571428571429
rmse: 3.453776401067595
R^2: 0.8455746367239102


## Decision Tree Regression

Decision Tree Regression is a machine learning algorithm that uses a decision tree as a predictive model for regression tasks. In this method, the training data is partitioned into subsets based on different features, creating a tree-like structure. Each internal node of the tree represents a decision based on a specific feature, while each leaf node represents a predicted value. During the training process, the algorithm recursively splits the data based on the selected features, aiming to minimize the prediction error.

To make predictions with a decision tree regression model, a new data point is traversed down the tree from the root node to a leaf node, following the decision rules. The predicted value is then determined based on the value associated with the leaf node.

In [22]:
from sklearn.tree import DecisionTreeRegressor
import numpy as np

# Example data
x5 = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]])#Independent variable
y5 = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30]) # Dependent variable(target)

# We create an instance of the  DecisionTreeRegressor
model5 = DecisionTreeRegressor()

# We performed k-fold cross-validation 
rad_cv = cross_val_predict(model5, x5, y5, cv=5)
# we run model
model5.fit(x5, y5)

#  We performed a prediction for a new value
X_pred5 = np.array([[22]]) 
y_pred5 = model5.predict(X_pred5)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_rad = mean_squared_error(y5, rad_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_rad = np.sqrt(mean_squared_error(y5, rad_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_rad = r2_score(y5, rad_cv)
print("Predicción del nuevo valor:", y_pred5)
print("valor real:    ", y5)
print("Predicción test:", rad_cv.astype(int))
# Print Results
print("MSE:", mse_rad)
print("rmse:", rmse_rad)
print("R^2:", r2_rad)

Predicción del nuevo valor: [30.]
valor real:     [ 2  4  6  8 10 12 14 16 18 20 24 26 28 30]
Predicción test: [ 8  8  8  6  6 14 12 12 20 18 28 28 26 26]
MSE: 10.571428571428571
rmse: 3.251373336211726
R^2: 0.8631439894319684


## Random Forest Regression

Random Forest Regression is a machine learning algorithm used for regression tasks. It is a variant of the Random Forest algorithm, which is an ensemble learning method that combines multiple decision trees to make predictions.


In Random Forest Regression, a random subset of the training data is used to train each decision tree. Each tree is constructed independently and makes predictions based on a random subset of features at each split. This randomness helps to reduce overfitting and increase the diversity among the trees.


During prediction, the value of the target variable is determined by aggregating the predictions of all the individual decision trees in the random forest. Typically, the predictions are averaged to obtain the final prediction, although other aggregation methods can be used.


The Random Forest Regression algorithm is effective in handling non-linear relationships, capturing interactions between features, and handling high-dimensional datasets. It is known for its robustness, ability to handle noisy data, and resistance to overfitting. Additionally, Random Forest Regression provides feature importance measures, which can be used to understand the relative importance of different features in the prediction process.


Random Forest Regression is widely used in various domains such as finance, healthcare, marketing, and environmental sciences for tasks such as predicting stock prices, estimating disease progression, forecasting sales, and predicting environmental variables.

In [23]:
from sklearn.ensemble import RandomForestRegressor
import numpy as np

# Example data
x6 = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]])
y6 = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30]) 

# We create an instance of the Random Forest Regression
model6 = RandomForestRegressor()
#  We performed k-fold cross-validation 
raf_cv = cross_val_predict(model6, x6, y6, cv=5)
# we run model
model6.fit(x6, y6)

#  We performed a prediction for a new value
X_pred6 = np.array([[22]]) 
y_pred6 = model6.predict(X_pred6)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_raf = mean_squared_error(y6, raf_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_raf = np.sqrt(mean_squared_error(y6, raf_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_raf = r2_score(y6, raf_cv)
print("Predicción del nuevo valor:", y_pred6)
print("valor real:    ", y6)
print("Predicción test:", raf_cv.astype(int))
# Print Results
print("MSE:", mse_raf)
print("rmse:", rmse_raf)
print("R^2:", r2_raf)

Predicción del nuevo valor: [28.6]
valor real:     [ 2  4  6  8 10 12 14 16 18 20 24 26 28 30]
Predicción test: [ 8  8  8  5  8 13 11 13 17 17 24 27 24 24]
MSE: 10.517085714285715
rmse: 3.2430056605386484
R^2: 0.8638475033025099


## Support Vector Machines (SVM)

The main idea behind SVM is to find an optimal hyperplane that separates the data points of different classes or predicts the value of the target variable in regression. The hyperplane is selected in such a way that it maximizes the margin, which is the distance between the hyperplane and the closest data points of each class or regression target.

In SVM, the data points closest to the hyperplane, known as support vectors, play a crucial role in defining the decision boundary or regression line. These support vectors are used to construct the hyperplane and determine the predictions for new data points.

Types of kernel
1. Kernel lineal (linear): Uses a dot product function to measure similarities between samples. It is the simplest kernel and suitable for linearly separable data.

2. Kernel polinómico (poly): Uses a polynomial function to map samples to a higher-dimensional space. You can specify the degree of the polynomial using the degree parameter.

3. Kernel radial (rbf o gaussian): Uses a radial basis function (Gaussian) to map samples to an infinite-dimensional space. It is suitable for non-linearly separable data. The gamma parameter controls the shape of the radial basis function.

4. Kernel sigmoidal (sigmoid): Uses a sigmoid function to map samples to a higher-dimensional space. This kernel is less common and mainly used in binary classification problems.


Default parameters:
C: The default value for the regularization parameter C is 1.0. This implies a moderate balance between fitting the training data and model complexity.


kernel: The default kernel is 'rbf' (radial basis function). This means a radial kernel is used to model the relationship between samples.


degree: In the case of SVM models, the degree of the polynomial kernel (degree) does not have a default value since it is only applied when the 'poly' kernel is selected.


gamma: The default value for the gamma kernel coefficient is 'scale'. This means the gamma value is automatically calculated as 1 / (n_features * X.var()), where n_features is the number of features and X.var() is the variance of X. You can also use 'auto' as the default value, which is equivalent to 'scale'.


epsilon: For SVM regression models (SVR), the default value for the epsilon tolerance margin is 0.1. This value defines the range within which the prediction is considered acceptable and not penalized.


class_weight: For SVM classification models (SVC), the default value for the class_weight parameter is None, which means all classes have the same weight.

In [24]:
from sklearn.svm import SVR
import numpy as np

# Datos de ejemplo
x7 = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]])
y7 = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30])

# # We create an instance of the SVM
model7 = SVR(kernel='linear', epsilon=0.01, gamma=1,C=10) # type of kernel: linear, poly(SVR(kernel='poly', degree=3), rbf, sigmod
# We performed k-fold cross-validation 
mvs_cv = cross_val_predict(model7, x7, y7, cv=5)
# we run the model
model7.fit(x7, y7)
#  We performed a prediction for a new value
X_pred7 = np.array([[22]])
y_pred7 = model7.predict(X_pred7)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_mvs = mean_squared_error(y7, mvs_cv)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_mvs = np.sqrt(mean_squared_error(y7, mvs_cv))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_mvs = r2_score(y7, mvs_cv)
print("Predicción del nuevo valor:", y_pred7)
print("valor real:    ", y7)
print("Predicción test:", mvs_cv.astype(int))
# Print Results
print("MSE:", mse_mvs)
print("rmse:", rmse_mvs)
print("R^2:", r2_mvs)


Predicción del nuevo valor: [34.74818182]
valor real:     [ 2  4  6  8 10 12 14 16 18 20 24 26 28 30]
Predicción test: [ 5  7  8  7  9 10 12 13 15 16 22 25 35 39]
MSE: 16.19868696829434
rmse: 4.024759243519336
R^2: 0.7902944091290826


# Neural networks

* pre-requistes Install packages TensorFlow: pip install tensorflow

In a regression model, the neural network takes a set of input variables (features) and uses them to predict a continuous target variable. The network consists of multiple interconnected layers of artificial neurons, known as nodes or units. Each node applies a mathematical transformation to its input and produces an output, which is then passed to the nodes in the next layer.


The neural network learns to make predictions by adjusting the weights and biases associated with the connections between the nodes during the training process. This adjustment is achieved through an optimization algorithm, such as gradient descent, which minimizes a loss function that quantifies the difference between the predicted values and the actual target values.

In [25]:
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
from sklearn.model_selection import KFold

# Example data
# one dimension independents varaibles(we recommend to have more variables)
x9 = np.array([[1], [2], [3], [4], [5],[6],[7],[8],[9],[10],[14],[16],[18],[20]])
y9 = np.array([2, 4, 6, 8, 10,12,14,16,18,20,24,26,28,30])  
# Defined the network model
def create_model():
    model9 = keras.Sequential()
    model9.add(layers.Dense(6, input_dim=1, activation='linear'))  # Hidden layer with 6 Neurons
    model9.add(layers.Dense(6, activation='linear'))
    model9.add(layers.Dense(1, activation='linear'))  # Output layer
    #To create the optimizer with the desired learning rate
    optimizer = keras.optimizers.Adam(learning_rate=0.001)
    model9.compile(loss='mean_squared_error', optimizer=optimizer)
    return model9

# We create the neural model 
model9 = create_model()
# We run the model
model9.fit(x9, y9, epochs=500, batch_size=1,verbose=0)

# We performed a prediction for a new value
y_test=model9.predict(x9)
X_pred10 = np.array([[22]])  # Nuevos datos a predecir
y_pred10 = model9.predict(X_pred10)

# we calculated metrics of performance
#MSE -Mean Squared Error(MSE)
mse_nn = mean_squared_error(y9, y_test)
# RMSE-"Root Mean Squared Error" (RMSE)- same scale of data
rmse_nn = np.sqrt(mean_squared_error(y9, y_test))
#R2- Coefficient of Determination (R2).: 
#It is a measure of how well the model fits the data. 
#A value of R2 closer to 1 indicates a better fit, 
#while a value close to 0 indicates that the model does not explain the variability well
r2_nn = r2_score(y9, y_test)
print("Predicción del nuevo valor:", y_pred10)
print("valor real:    ", y9)
print("Predicción test:", y_test.flatten().astype(int))
# Print Results
print("MSE:", mse_nn)
print("rmse:", rmse_nn)
print("R^2:", r2_nn)

Predicción del nuevo valor: [[35.03834]]
valor real:     [ 2  4  6  8 10 12 14 16 18 20 24 26 28 30]
Predicción test: [ 4  5  7  8  9 11 12 14 15 17 23 26 29 32]
MSE: 2.136247078747374
rmse: 1.4615905988844393
R^2: 0.9723444896014211


In [26]:
#coss validation NN - inactive
'''num_folds = 5
kfold = KFold(n_splits=num_folds, shuffle=True)
fold_scores = []
for fold, (train_indices, val_indices) in enumerate(kfold.split(x9)):
    print(f"Fold {fold + 1}")
    # Dividir los datos en entrenamiento y validación para el fold actual
    X_train, X_val = x9[train_indices], x9[val_indices]
    y_train, y_val = y9[train_indices], y9[val_indices]
    # Crear y ajustar el modelo para el fold actual
    model9 = create_model()
    model9.fit(X_train, y_train, epochs=500, batch_size=1, verbose=0)
    # Evaluar el modelo en el conjunto de validación
    score = model9.evaluate(X_val, y_val, verbose=0)
    fold_scores.append(score)
    print("Fold Loss:", score)
    print("")

# Calcular el promedio de las puntuaciones de pérdida de cada fold
mean_loss = np.mean(fold_scores)
print("fold_scores:", fold_scores)
print("Mean Loss:", mean_loss)'''


'num_folds = 5\nkfold = KFold(n_splits=num_folds, shuffle=True)\nfold_scores = []\nfor fold, (train_indices, val_indices) in enumerate(kfold.split(x9)):\n    print(f"Fold {fold + 1}")\n    # Dividir los datos en entrenamiento y validación para el fold actual\n    X_train, X_val = x9[train_indices], x9[val_indices]\n    y_train, y_val = y9[train_indices], y9[val_indices]\n    # Crear y ajustar el modelo para el fold actual\n    model9 = create_model()\n    model9.fit(X_train, y_train, epochs=500, batch_size=1, verbose=0)\n    # Evaluar el modelo en el conjunto de validación\n    score = model9.evaluate(X_val, y_val, verbose=0)\n    fold_scores.append(score)\n    print("Fold Loss:", score)\n    print("")\n\n# Calcular el promedio de las puntuaciones de pérdida de cada fold\nmean_loss = np.mean(fold_scores)\nprint("fold_scores:", fold_scores)\nprint("Mean Loss:", mean_loss)'

### Comparison of Model Results - Parte two

In [27]:
import numpy as np
import plotly.graph_objects as go

xbase = x1.flatten() # flatten input x for plot exe x
fig = go.Figure()
fig.add_trace(go.Scatter(x=xbase, y=y, name='Real'))
fig.add_trace(go.Scatter(x=xbase, y=knn_cv.astype(int), name='Kvecinos'))
fig.add_trace(go.Scatter(x=xbase, y=rlm_cv.astype(int), name='LinealMiltiple'))
fig.add_trace(go.Scatter(x=xbase, y=rad_cv.astype(int), name='DecisionTree'))
fig.add_trace(go.Scatter(x=xbase, y=raf_cv.astype(int), name='RandomForesr'))
fig.add_trace(go.Scatter(x=xbase, y=mvs_cv.astype(int), name='SVM'))
fig.add_trace(go.Scatter(x=xbase, y=y_test.flatten().astype(int), name='NN'))
fig.update_layout(
    title='Gráfico Modelos de Regresiosn Lineal',
    xaxis_title='x- Valor entrada',
    yaxis_title='y-Prediccion',
    hovermode='x'  
)
fig.show()