# Assignment 3: Exploring Tree-Based Regression Methods for 3D Sinusoidal Data
## DTSC 680: Applied Machine Learning

## Name: 

## Directions and Overview

The main purpose of this assignment is for you to gain experience using tree-based methods to solve simple regression problems.  In this assignment, you will fit a `Gradient-Boosted Regression Tree`, a `Random Forest`, and a `Decision Tree` to a noisy 3D sinusoidal data set.  Since these models can be trained very quickly on the supplied data, I want you to first manually adjust hyperparameter values and observe their influence on the model's predictions.  That is, you should manually sweep the hyperparameter space and try to hone in on the optimal hyperparameter values, again, _manually_.  (Yep, that means guess-and-check: pick some values, train the model, observe the prediction curve, repeat.)

But wait, there's more! Merely attempting to identify the optimal hyperparameter values is not enough.  Be sure to really get a visceral understanding of how altering a hyperparameter in turn alters the model predictions (i.e. the prediction curve).  This is how you will build your machine learning intuition!

So, play around and build some models.  When you are done playing with hyperparameter values, you should try to set these values to the optimal values manually (you're likely going to be _way_ off).  Then, retrain the model.  Next in this assignment, we will perform several grid searches, so you'll be able to compare your "optimal" hyperparameter values with those computed from the grid search.

We will visualize model predictions for the optimal `Gradient-Boosted Regression Tree`, a `Random Forest`, and `Decision Tree` models that were determined by the grid searches.  Next, you will compute the generalization error on the test set for the three models.

## Preliminaries

Let's import some common packages:

In [1]:
# Common imports
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import cm
import numpy as np
import pandas as pd
%matplotlib inline
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)
import os

# Where to save the figures
PROJECT_ROOT_DIR = "."
FOLDER = "figures"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, FOLDER)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)
    
def plot3Ddata(data_df):   
    x = data_df['x'].values
    y = data_df['y'].values
    z = data_df['z'].values
    
    # Graph Size as a whole 
    fig = plt.figure(figsize=(20,20))
    
    
    # First subplot - Top Left
    ax=fig.add_subplot(2, 2, 1, projection = '3d')
    ax.set_xlabel('x', c ='r', size = 16)
    ax.set_ylabel('y', c ='r', size = 16)
    ax.set_zlabel('z', c ='r', size = 16)
    ax.scatter3D(x, y, z, color = "blue")
    plt.ylim(-6, 6)
    
    # Graph Angle
    ax.view_init(0, 90)

    # Second subplot - Top Right
    ax=fig.add_subplot(2, 2, 2, projection ='3d')

    ax.set_xlabel('x',c = 'r', size = 16)
    ax.set_ylabel('y', c = 'r', size = 16)
    ax.set_zlabel('z', c = 'r', size = 16)
    ax.scatter3D(x, y, z,  color = 'blue')
    plt.ylim(-6, 6)
    
    # Graph Angle
    ax.view_init(45, 0)

    # Third subplot - Bottom Left
    ax=fig.add_subplot(2, 2, 3, projection = '3d')

    ax.set_xlabel('x', c = 'r', size = 16)
    ax.set_ylabel('y', c = 'r', size = 16)
    ax.set_zlabel('z', c = 'r', size = 16)
    ax.scatter3D(x,y,z,  color ='blue')
    plt.ylim(-6,6)
    
    # Graph Angle
    ax.view_init(45, 45)

    # Fourth subplot - Bottom Right
    ax=fig.add_subplot(2, 2, 4, projection='3d')

    ax.set_xlabel('x', c = 'r', size = 16)
    ax.set_ylabel('y', c = 'r', size = 16)
    ax.set_zlabel('z', c = 'r', size = 16)
    ax.scatter3D(x, y, z,  color = 'blue')
    plt.ylim(-6,6)

    fig.show()

def plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z):

    pred =  np.argsort(fit_x)
    fit_x = fit_x[pred]
    fit_y = fit_y[pred]
    fit_z = fit_z[pred]

    scat_x = scat_x[pred]
    scat_y = scat_y[pred]
    scat_z = scat_z[pred]



    # Graph Size as a whole 
    fig = plt.figure(figsize=(20,20))
    
    
    # First subplot - Top Left
    ax=fig.add_subplot(2, 2, 1, projection = '3d')
    ax.set_xlabel('x', c ='r', size = 16)
    ax.set_ylabel('y', c ='r', size = 16)
    ax.set_zlabel('z', c ='r', size = 16)
    ax.scatter3D(x, y, z, color = "blue")
    ax.plot(fit_x, fit_y, fit_z, color = "black")
    plt.ylim(-6, 6)
    
    # Graph Angle
    ax.view_init(0, 90)

    # Second subplot - Top Right
    ax=fig.add_subplot(2, 2, 2, projection ='3d')

    ax.set_xlabel('x',c = 'r', size = 16)
    ax.set_ylabel('y', c = 'r', size = 16)
    ax.set_zlabel('z', c = 'r', size = 16)
    ax.scatter3D(x, y, z,  color = 'blue')
    ax.plot(fit_x, fit_y, fit_z, color = "black")
    plt.ylim(-6, 6)
    
    # Graph Angle
    ax.view_init(45, 0)

    # Third subplot - Bottom Left
    ax=fig.add_subplot(2, 2, 3, projection = '3d')

    ax.set_xlabel('x', c = 'r', size = 16)
    ax.set_ylabel('y', c = 'r', size = 16)
    ax.set_zlabel('z', c = 'r', size = 16)
    ax.scatter3D(x,y,z,  color ='blue')
    ax.plot(fit_x, fit_y, fit_z, color = "black")
    plt.ylim(-6,6)
    
    # Graph Angle
    ax.view_init(45, 45)

    # Fourth subplot - Bottom Right
    ax =fig.add_subplot(2, 2, 4, projection='3d')

    ax.set_xlabel('x', c = 'r', size = 16)
    ax.set_ylabel('y', c = 'r', size = 16)
    ax.set_zlabel('z', c = 'r', size = 16)
    ax.scatter3D(x, y, z,  color = 'blue')
    ax.plot(fit_x, fit_y, fit_z, color = "black")
    plt.ylim(-6,6)

    fig.show()   

# Import and Split Data

Complete the following:



1. Begin by importing the data from the file called `3DSinusoidal.csv`.  Name the returned DataFrame `data`. 

2. Call [train_test_split()](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) with a `test_size` of 20%.  `x` and `y` will be your feature data and `z` will be your response data. Save the output into `X_train`, `X_test`, `z_train`, and `z_test`, respectively.  Specify the `random_state` parameter to be `42` (do this throughout the entire note book).

In [2]:
from sklearn.model_selection import train_test_split

#Importing data
data=pd.read_csv('3DSinusoidal.csv')

#Creating Feature and Response Data
X=data[['x','y']]
y=data['z']



#train_test_split() the X and y
X_train,X_test,z_train,z_test=train_test_split(X,y, test_size=0.2, random_state=42)

#Reshapping training and testing data to num
X_train = np.array(X_train).reshape(-1,2)
X_test = np.array(X_test).reshape(-1,2)
z_train = np.array(z_train)
z_test = np.array(z_test)

FileNotFoundError: [Errno 2] File b'3DSinusoidal.csv' does not exist: b'3DSinusoidal.csv'

# Plot Data

Simply plot your training data here, so that you know what you are working with.  You must define a function called `plot3Ddata`, which accepts a Pandas DataFrame (composed of 3 spatial coordinates) and uses `scatter3D()` to plot the data.  Use this function to plot only the training data (recall that you don't even want to look at the test set, until you are ready to calculate the generalization error).  You must place the definition of this function in the existing code cell of the above __Preliminaries__ section, and have nothing other than the function invocation in the below cell. 

You must emulate the graphs shown in the respective sections below. Each of the graphs will have four subplots. Note the various viewing angles that each subplot presents - you can achieve this with the view_init() method. Be sure to label your axes as shown.

In [None]:
# train_df = pd.DataFrame(X_train)
#
viz_train_data = np.hstack([X_train, z_train.reshape(-1,1)])
train_df = pd.DataFrame(viz_train_data, columns=['x', 'y', 'z'])

plot3Ddata(train_df)


## A Quick Note

In the following sections you will be asked to plot the training data along with the model's predictions for that data superimposed on it.  You must write a function called `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` that will plot this figure.  The function accepts six parameters as input, shown in the function signature.  All six input parameters must be NumPy arrays. The Numpy arrays called fit_x and fit_y represent the x and y coordinates from the training data and fit_z represents the model predictions from those coordinates (i.e. the prediction curve). The three Numpy arrays called `scat_x, scat_y,` and  `scat_z` represent the x, y, and z coordinates of the training data.   

You must place the definition of the `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` function in the existing code cell of the above __Preliminaries__ section. (The function header is already there - you must complete the function definition.)

You will use the `plotscatter3Ddata()` function in each of the below __Plot Model Predictions for Training Set__ portion of the three __Explore 3D Data__ sections, as well as the __Visualize Optimal Model Predictions__ section.

___Important: Below, you will be asked to plot the model's prediction curve along with the training data.  Even if you correctly train the model, you may find that your trendline is very ugly when you first plot it.  If this happens to you, try plotting the model's predictions using a scatter plot rather than a connected line plot.  You should be able to infer the problem and solution with the trendline from examining this new scatter plot of the model's predictions.___

# Explore 3D Data: GradientBoostingRegressor

Fit a `GradientBoostingRegressor` model to this data.  You must manually assign values to the following hyperparameters.  You should "play around" by using different combinations of hyperparameter values to really get a feel for how they affect the model's predictions.  When you are done playing, set these to the best values you can for submission.  (It is totally fine if you don't elucidate the optimal values here; however, you will want to make sure your model is not excessively overfitting or underfitting the data.  Do this by examining the prediction curve generated by your model.  You will be graded, more exactly, on the values that you calculate later from performing several rounds of grid searches.)

 - `learning_rate = <value>`
 - `max_depth = <value>`
 - `n_estimators = <value>`
 - `random_state = 42`

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
# X_train=data[:100]
#Creating Gradient Boosting Regressor Model
gbrt = GradientBoostingRegressor(max_depth=3, n_estimators=42, random_state=42,learning_rate=0.1)
gbrt.fit(X_train, z_train)

### Plot Model Predictions for Training Set

Use the `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` function to plot the data and the prediction curve.

In [None]:
#Plot Model Predictions

fit_x=np.array(train_df['x'])
fit_y=np.array(train_df['y'])
fit_z=np.array(gbrt.predict(X_train))

scat_x=np.array(train_df['x'])
scat_y=np.array(train_df['y'])
scat_z=np.array(train_df['z'])


plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)

# Explore 3D Data: RandomForestRegressor

Fit a `RandomForestRegressor` model to this data.  You must manually assign values to the following hyperparameters.  You should "play around" by using different combinations of hyperparameter values to really get a feel for how they affect the model's predictions.  When you are done playing, set these to the best values you can for submission.  (It is totally fine if you don't elucidate the optimal values here; however, you will want to make sure your model is not excessively overfitting or underfitting the data.  Do this by examining the prediction curve generated by your model.  You will be graded, more exactly, on the values that you calculate later from performing several rounds of grid searches.)

 - `min_samples_split = <value>`
 - `max_depth = <value>`
 - `n_estimators = <value>`
 - `random_state = 42`

In [None]:
from sklearn.ensemble import RandomForestRegressor
#Creating RandomForestRegressor Model
rnd_clf=RandomForestRegressor(n_estimators=50, min_samples_split=0.1,max_depth=8,random_state=42)
rnd_clf.fit(X_train,z_train)


### Plot Model Predictions for Training Set

Use the `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` function to plot the data and the prediction curve.

In [None]:
#Plotting Model Predictions

fit_x=np.array(train_df['x'])
fit_y=np.array(train_df['y'])
fit_z=np.array(rnd_clf.predict(X_train))

scat_x=np.array(train_df['x'])
scat_y=np.array(train_df['y'])
scat_z=np.array(train_df['z'])


plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)

# plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z) 

# Explore 3D Data: DecisionTreeRegressor

Fit a `DecisionTreeRegressor` model to this data.  You must manually assign values to the following hyperparameters.  You should "play around" by using different combinations of hyperparameter values to really get a feel for how they affect the model's predictions.  When you are done playing, set these to the best values you can for submission.  (It is totally fine if you don't elucidate the optimal values here; however, you will want to make sure your model is not excessively overfitting or underfitting the data.  Do this by examining the prediction curve generated by your model.  You will be graded, more exactly, on the values that you calculate later from performing several rounds of grid searches.)
 - `splitter = <value>`
 - `max_depth = <value>`
 - `min_samples_split = <value>`
 - `random_state = 42`

In [None]:
from sklearn.tree import DecisionTreeRegressor
#Creating DecisionTreeRegressor Model
tree_clf = DecisionTreeRegressor(splitter='best',min_samples_split=5, max_depth=6, random_state=42)
tree_clf.fit(X_train, z_train)

### Plot Model Predictions for Training Set

Use the `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` function to plot the data and the prediction curve.

In [None]:
#Plotting Model Predictions for Training Set

fit_x=np.array(train_df['x'])
fit_y=np.array(train_df['y'])
fit_z=np.array(tree_clf.predict(X_train))

scat_x=np.array(train_df['x'])
scat_y=np.array(train_df['y'])
scat_z=np.array(train_df['z'])


plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)
# plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)

# Perform Grid Searches

You will perform a series of grid searches, which will yield the optimal hyperparamter values for each of the three model types.  You can compare the values computed by the grid search with the values you manually found earlier.  How do these compare?

You must perform a course-grained grid search, with a very broad range of values first.  Then, you perform a second grid search using a tighter range of values centered on those identified in the first grid search.  You may have to use another round of grid searching too (it took me at least three rounds of grid searches per model to ascertain the optimal hyperparameter values below).

Note the following:

1. Be sure to clearly report the optimal hyperparameters in the designated location after you calculate them!

2. You must use `random_state=42` everywhere that it is needed in this notebook.

3. You must use grid search to compute the following hyperparameters:

   GradientBoostingRegressor:
    
     - `max_depth = <value>`
     - `n_estimators = <value>`
     - `learning_rate = <value>`

   RandomForestRegressor:
    
     - `max_depth = <value>`
     - `n_estimators = <value>`
     - `min_samples_split = <value>`

   DecisionTreeRegressor:
    
     - `splitter = <value>`
     - `max_depth = <value>`
     - `min_samples_split = <value>`
     
     
4. `learning rate` should be rounded to two decimals.
5. The number of cross-folds. Specify `cv=3`


## Perform Individual Model Grid Searches

In this section you will perform a series of grid searches to compute the optimal hyperparameter values for each of the three model types.

In [None]:
from sklearn.model_selection import GridSearchCV
#Beginning GradientBoostingRegressor Model
param_grid={'max_depth':[1,2,3,4,5,8,16,32],
            'n_estimators':[100,200,300,400,500,1000],  
            'learning_rate':[.01,.02,.03,.04,.05,.06,.07,1]
            }

grid_search_cv=GridSearchCV(GradientBoostingRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_cv.fit(X_train, z_train)

In [None]:
print("The best parameters are: ", grid_search_cv.best_params_)

In [None]:
from sklearn.model_selection import GridSearchCV
#Middle GradientBoostingRegressor Model
param_grid={'max_depth':[-20,-10,-5,0,1,2,3,4],
            'n_estimators':[-20,-15,-10,-5,0,5,10,25,50,100,200],  
            'learning_rate':[.010,.003,.002,.001,.01,.02,.03,.04]
            }

grid_search_cv=GridSearchCV(GradientBoostingRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_cv.fit(X_train, z_train)


In [None]:
print("The best parameters are: ", grid_search_cv.best_params_)

In [None]:
from sklearn.model_selection import GridSearchCV
#Final GradientBoostingRegressor Model
param_grid={'max_depth':[-20,-10,-5,0,1,2,3,4],
            'n_estimators':[-20,-15,-10,-5,0,5,10,25,50,100,200],  
            'learning_rate':[.010,.002,.001,.01,.02,.03,.04,.05,.06,.07]
            }

grid_search_cv=GridSearchCV(GradientBoostingRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_cv.fit(X_train, z_train)

In [None]:
print("The best parameters are: ", grid_search_cv.best_params_)

On this dataset, the optimal model parameters for the `GradientBoostingRegressor` class are:

- `learning_rate = <value>`
- `max_depth = <value>`
- `n_estimators = <value>`

In [None]:
from sklearn.model_selection import GridSearchCV
#Initial RandomForestRegressor Model
param_grid={'max_depth':[1,2,3,4,5,8,16,32],
            'n_estimators':[1,25,50,100,200,400,500,1000],  
            'min_samples_split':[1,2,3,4,5,8,10,15,20]
            }

grid_search_rf=GridSearchCV(RandomForestRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_rf.fit(X_train, z_train)

In [None]:
print("The best parameters are: ", grid_search_rf.best_params_)

In [None]:
from sklearn.model_selection import GridSearchCV
#Middle RandomForestRegressor Model
param_grid={'max_depth':[-10,-5,3,2,1,2,3,5,10],
            'n_estimators':[-10,-7,-6,-5,1,5,6,7,10],  
            'min_samples_split':[-10,-6,-5,-2,-1,1,2,3,4]
            }

grid_search_rf=GridSearchCV(RandomForestRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_rf.fit(X_train, z_train)

In [None]:
print("The best parameters are: ", grid_search_rf.best_params_)

In [None]:
from sklearn.model_selection import GridSearchCV
#Final RandomForestRegressor Model
param_grid={'max_depth':[-10,-5,3,2,1,2,3,5,10],
            'n_estimators':[-10,-7,-6,-5,1,5,6,7,10],  
            'min_samples_split':[-10,-6,-5,-2,-1,1,2,3,4,5,6]
            }

grid_search_rf=GridSearchCV(RandomForestRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_rf.fit(X_train, z_train)

In [None]:
print("The best parameters are: ", grid_search_rf.best_params_)

On this dataset, the optimal model parameters for the `RandomForestRegressor` class are:

- `max_depth = <value>`
- `n_estimators = <value>`
- `min_samples_split = <value>`

In [None]:
from sklearn.model_selection import GridSearchCV
#Initial DecisionTreeRegressor Model
param_grid={'splitter':["random","best"],
            'max_depth':[1,2,3,10,15,20,32],  
            'min_samples_split':[1,2,3,4,5,8,10,15,20]
            }

grid_search_dt=GridSearchCV(DecisionTreeRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_dt.fit(X_train, z_train)


In [None]:
print("The best parameters are: ", grid_search_dt.best_params_)

In [None]:
from sklearn.model_selection import GridSearchCV
#Final DecisionTree GridSearch
param_grid={'splitter':["random"],
            'max_depth':[-10,-8,-5,-1,0,1,2,3,10],  
            'min_samples_split':[-8,-6,-4,-2,2,3,4,5]
            }

grid_search_dt=GridSearchCV(DecisionTreeRegressor(random_state=42),param_grid,verbose=1,cv=3)
grid_search_dt.fit(X_train, z_train)

In [None]:
print("The best parameters are: ", grid_search_dt.best_params_)

On this dataset, the optimal model parameters for the `RandomForestRegressor` class are:

- `splitter = <value>`
- `max_depth = <value>`
- `min_samples_split = <value>`

# Visualize Optimal Model Predictions

In the previous section you performed a series of grid searches designed to identify the optimal hyperparameter values for all three models.  Now, use the `best_params_` attribute of the grid search objects from above to create the three optimal models below.  For each model, visualize the models predictions on the training set - this is what we mean by the "prediction curve" of the model.

### Create Optimal GradientBoostingRegressor Model

In [None]:
#Creating GradientBoostingRegressor Model
gbrtoptimal = GradientBoostingRegressor(max_depth=3, n_estimators=10, random_state=42,learning_rate=0.07)
gbrtoptimal.fit(X_train, z_train)

### Plot Model Predictions for Training Set

Use the `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` function to plot the data and the prediction curve.

In [None]:
#Creating Model Predictions
fit_x=np.array(train_df['x'])
fit_y=np.array(train_df['y'])
fit_z=np.array(gbrtoptimal.predict(X_train))

scat_x=np.array(train_df['x'])
scat_y=np.array(train_df['y'])
scat_z=np.array(train_df['z'])

#Plotting Model Predictions
plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)

### Create Optimal RandomForestRegressor Model

In [None]:

#Creating a Random Forest Regressor
rnd_clf=RandomForestRegressor(n_estimators=1, min_samples_split=2,max_depth=1,random_state=42)
rnd_clf.fit(X_train,z_train)


### Plot Model Predictions for Training Set

Use the `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` function to plot the data and the prediction curve.

In [None]:
#Creating Model Predictions for Training Set

fit_x=np.array(train_df['x'])
fit_y=np.array(train_df['y'])
fit_z=np.array(rnd_clf.predict(X_train))
scat_x=np.array(train_df['x'])
scat_y=np.array(train_df['y'])
scat_z=np.array(train_df['z'])

plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)

### Create Optimal DecisionTreeRegressor Model

In [None]:
#Creating DecisionTreeRegressor Model


tree_clf = DecisionTreeRegressor(splitter='random',min_samples_split=2,max_depth=1, random_state=42)
tree_clf.fit(X_train, z_train)

### Plot Model Predictions for Training Set

Use the `plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)` function to plot the data and the prediction curve.

In [None]:
#Creating Model Predictions

fit_x=np.array(train_df['x'])
fit_y=np.array(train_df['y'])
fit_z=np.array(tree_clf.predict(X_train))

scat_x=np.array(train_df['x'])
scat_y=np.array(train_df['y'])
scat_z=np.array(train_df['z'])


plotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)

# Compute Generalization Error

Compute the generalization error for each of the optimal models computed above.  Use MSE as the generalization error metric.  Round your answers to four significant digits.  Print the generalization error for all three models.

In [None]:
X_test.shape

In [None]:

from sklearn.metrics import mean_squared_error

models = [gbrtoptimal, rnd_clf, tree_clf]

for model in models:
    pred = model.predict(X_test)
    mse = mean_squared_error(z_test, pred)
    print(round(mse, 4))
