## DecisionTreeRegressor from SAS® Viya® on Simulated Data

A 1D regression with a decision tree:

The decision tree is utilized to fit a sine curve with additional noisy observations. Consequently, it learns local linear regressions that approximate the sine curve
.
It can be observed that if the maximum depth of the tree (controlled by the max_depth parameter) is set too high, the decision trees learn excessive fine details from the training data and incorporate the noise, leading to overfitting.

### Import the necessary modules and libraries

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sasviya.ml.tree import DecisionTreeRegressor

### Create a random dataset

In [None]:
rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

### Fit regression tree model

The provided code snippet below creates two decision tree regressor models. The first model, regr_1, is set to have a maximum depth of 2, which controls how many splits the tree can make. The second model, regr_2, has a higher maximum depth of 5, allowing it to create a more complex tree with more splits.

Both models are then trained or fitted using the same input data X and output data y. This process involves the models learning from the input-output relationship in the data to make predictions. The difference between the two models lies in their complexity due to the maximum depth parameter, which affects how detailed and overfitted the models may become.

In [None]:
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_1.fit(X, y)
regr_2.fit(X, y)

### Predict

The code snippet below generates test input data, predicts output values using two decision tree regressor models with different depths, and visualizes the data points along with the model predictions in a plot to compare how the models fit the data.

In [None]:
X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

### Plot the results

In [None]:
plt.figure()
plt.scatter(X, y, s=20, edgecolor="black", c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue", label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()