# Übung 7: Regularisierung und  Decision Trees

#### Aufgabe 1

In der Vorlesung haben wir gelernt, dass die L1 Regulierung dazu neigt viele Koeffizieten auf 0 zu setzen. 

Benutzen Sie die Daten der männlichen Raucher und führen Sie darauf eine Regression mit unterschiedlich starker Regularisierung, sowohl L1 als auch L2, durch. Schauen Sie sich dabei verschiedene statistische Kennzahlen der Regressionskoeffizieten wie Mittelwert, Min/Max und Quantile an. Zählen Sie wie viele Koeffizieten dabei 0 sind. Was stellen Sie fest?

#### Aufgabe 2

Nehme Sie nun alle Daten aus der 'insurance.csv' und trainieren Sie eine Regression mit einem DecisionTree. Variieren Sie der Parameter 'max_depth' und 'min_sample_leaves' und suchen Sie die beste Kombination.

#### Aufgabe 3

Wie Sie in Aufgabe 2 gesehen haben ist es sehr aufwändig die richtigen Parameter zu finden. Zum Glück kann scikit-learn dies automatisieren. Lesen Sie sich dafür in 'GridSearchCV' ein und finden Sie die besten Parameter für 'max_depth', 'min_samples_split' und 'min_samples_leaf'.

In [1]:
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import Lasso, Ridge
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler, OrdinalEncoder, PolynomialFeatures

In [2]:
insurance = pd.read_csv("data/insurance.csv")
insurance

Unnamed: 0,age,sex,bmi,children,smoker,region,expenses
0,19,female,27.9,0,yes,southwest,16884.92
1,18,male,33.8,1,no,southeast,1725.55
2,28,male,33.0,3,no,southeast,4449.46
3,33,male,22.7,0,no,northwest,21984.47
4,32,male,28.9,0,no,northwest,3866.86
...,...,...,...,...,...,...,...
1333,50,male,31.0,3,no,northwest,10600.55
1334,18,female,31.9,0,no,northeast,2205.98
1335,18,female,36.9,0,no,southeast,1629.83
1336,21,female,25.8,0,no,southwest,2007.95


In [3]:
male_smokers = insurance[
    (insurance["sex"] == "male") & (insurance["smoker"] == "yes")
].sort_values(by=["bmi"])

male_smokers

Unnamed: 0,age,sex,bmi,children,smoker,region,expenses
250,18,male,17.3,2,yes,northeast,12829.46
99,38,male,19.3,0,yes,southwest,15820.70
266,40,male,19.8,1,yes,southeast,17179.52
98,56,male,20.0,0,yes,northeast,22412.65
1011,43,male,20.1,2,yes,southeast,18767.74
...,...,...,...,...,...,...,...
265,46,male,42.4,3,yes,southeast,46151.12
951,51,male,42.9,2,yes,southeast,47462.89
1156,19,male,44.9,0,yes,southeast,39722.75
292,25,male,45.5,2,yes,southeast,42112.24


In [4]:
X = male_smokers["bmi"].to_numpy().reshape(-1, 1)
y = male_smokers["expenses"].to_numpy().reshape(-1, 1)

## Aufgabe 1

In [13]:
eps = 1e-8

for lam in [0, 1, 5, 10]:
    print(f"### Lambda = {lam}")
    poly_features = PolynomialFeatures(degree=50, include_bias=False)
    std_scaler = MinMaxScaler()

    ridge_reg = Ridge(alpha=lam)
    pipe_ridge = Pipeline(
        [
            ("poly_features", poly_features),
            ("std_scaler", std_scaler),
            ("ridge", ridge_reg),
        ]
    )
    pipe_ridge.fit(X, y)

    ridge_coeffs = pipe_ridge.get_params()["ridge"].coef_
    num_non_zero_coeffs = len(ridge_coeffs[np.abs(ridge_coeffs) < eps])

    print("Ridge L2: mean/max/min etc")
    print(pd.Series(ridge_coeffs.flatten()).describe())
    print(f"Number of zero coefficients for Ridge (L2) is {num_non_zero_coeffs}")

    lasso_reg = Lasso(alpha=lam)
    pipe_lasso = Pipeline(
        [
            ("poly_features", poly_features),
            ("std_scaler", std_scaler),
            ("lasso", lasso_reg),
        ]
    )
    pipe_lasso.fit(X, y)

    lasso_coeffs = pipe_lasso.get_params()["lasso"].coef_
    num_non_zero_coeffs = len(lasso_coeffs[np.abs(lasso_coeffs) < eps])

    print("Lasso L1: mean/max/min etc")
    print(pd.Series(lasso_coeffs.flatten()).describe())
    print(f"Number of zero coefficients for Lasso (L1) is {num_non_zero_coeffs}")

### Lambda = 0


ValueError: could not convert string to float: 'female'

## Aufgabe 3

In [6]:
X = insurance.drop(columns=["smoker"], axis=1)
y = insurance["smoker"].to_numpy().reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [7]:
from sklearn.tree import DecisionTreeClassifier

numeric_features = ["age", "bmi", "children", "expenses"]
ordinal_features = ["sex"]

numeric_transformer = Pipeline(
    steps=[
        (
            "imputer",
            SimpleImputer(strategy="mean"),
        ),  # Decision Tree braucht kein Scaling
        ("poly_features", PolynomialFeatures(degree=8)),
    ]
)

ordinal_transfomer = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="most_frequent")),
        ("ordinal_encoding", OrdinalEncoder()),
    ]
)

y_preprocessor = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="most_frequent")),
        ("ordinal_encoding", OrdinalEncoder()),
    ]
)

X_preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat_ordinal", ordinal_transfomer, ordinal_features),
    ]
)


X_train_prepared = X_preprocessor.fit_transform(X_train)
y_train_prepared = y_preprocessor.fit_transform(y_train)

clf = DecisionTreeClassifier()
clf.fit(X_train_prepared, y_train_prepared)

X_test_prepared = X_preprocessor.transform(X_test)
y_test_prepared = y_preprocessor.transform(y_test)

predicted = clf.predict(X_test_prepared)

accuracy = accuracy_score(y_pred=predicted, y_true=y_test_prepared)
precision = precision_score(y_pred=predicted, y_true=y_test_prepared)
recall = recall_score(y_pred=predicted, y_true=y_test_prepared)

print(f"train accuracy: {clf.score(X_train_prepared, y_train_prepared)}")
print(f"test accuracy: {clf.score(X_test_prepared, y_test_prepared)}")

print(f"precision: {precision}")
print(f"recall: {recall}")
print("\n")

train accuracy: 1.0
test accuracy: 0.9641791044776119
precision: 0.8548387096774194
recall: 0.9464285714285714




In [8]:
X = insurance.drop(columns=["expenses"], axis=1)
y = insurance["expenses"].to_numpy().reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [9]:
from sklearn.tree import DecisionTreeRegressor

numeric_features = ["age", "bmi", "children"]
ordinal_features = ["sex", "smoker"]

numeric_transformer = Pipeline(
    steps=[
        (
            "imputer",
            SimpleImputer(strategy="mean"),
        ),  # Decision Tree braucht kein Scaling
        ("poly_features", PolynomialFeatures(degree=8)),
    ]
)

ordinal_transfomer = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="most_frequent")),
        ("ordinal_encoding", OrdinalEncoder()),
    ]
)

y_preprocessor = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="mean")),
    ]
)

X_preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat_ordinal", ordinal_transfomer, ordinal_features),
    ]
)


X_train_prepared = X_preprocessor.fit_transform(X_train)
y_train_prepared = y_preprocessor.fit_transform(y_train)

squares = [x**2 for x in range(1, 5)]
for depth in squares:
    for min_sample_leaves in squares:
        print(f"max_depath = {depth}, min_sample_leavaes = {min_sample_leaves}")
        reg = DecisionTreeRegressor(max_depth=depth, min_samples_leaf=min_sample_leaves)
        reg.fit(X_train_prepared, y_train_prepared)

        X_test_prepared = X_preprocessor.transform(X_test)
        y_test_prepared = y_preprocessor.transform(y_test)

        predicted = reg.predict(X_test_prepared)

        print(f"train accuracy: {reg.score(X_train_prepared, y_train_prepared)}")
        print(f"test accuracy: {reg.score(X_test_prepared, y_test_prepared)}")
        print("\n")

max_depath = 1, min_sample_leavaes = 1
train accuracy: 0.6426739705074738
test accuracy: 0.5461501588798293


max_depath = 1, min_sample_leavaes = 4
train accuracy: 0.6426739705074738
test accuracy: 0.5461501588798293


max_depath = 1, min_sample_leavaes = 9
train accuracy: 0.6426739705074738
test accuracy: 0.5461501588798293


max_depath = 1, min_sample_leavaes = 16
train accuracy: 0.642673970507474
test accuracy: 0.5461501588798293


max_depath = 4, min_sample_leavaes = 1
train accuracy: 0.903573840072967
test accuracy: 0.790062995859356


max_depath = 4, min_sample_leavaes = 4
train accuracy: 0.8989470925857097
test accuracy: 0.789687470996246


max_depath = 4, min_sample_leavaes = 9
train accuracy: 0.8978808395997706
test accuracy: 0.7894167947382865


max_depath = 4, min_sample_leavaes = 16
train accuracy: 0.8975478150511199
test accuracy: 0.7920094383777518


max_depath = 9, min_sample_leavaes = 1
train accuracy: 0.977936144667711
test accuracy: 0.7019517811109771


max_depath = 

In [10]:
from sklearn.model_selection import GridSearchCV

square = lambda n: [x**2 for x in np.arange(1, n)]


def grid_search(X_input, y_input):
    param_grid = {
        "max_depth": square(10),
        "min_samples_split": square(10),
        "min_samples_leaf": square(10),
    }
    gs = GridSearchCV(DecisionTreeRegressor(), param_grid, cv=2, verbose=3)
    gs.fit(X_input, y_input)

    return gs.best_params_


best_params = grid_search(X_train_prepared, y_train_prepared)
best_params

Fitting 2 folds for each of 729 candidates, totalling 1458 fits
[CV 1/2] END max_depth=1, min_samples_leaf=1, min_samples_split=1;, score=nan total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=1, min_samples_split=1;, score=nan total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=1, min_samples_split=4;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=1, min_samples_split=4;, score=0.654 total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=1, min_samples_split=9;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=1, min_samples_split=9;, score=0.654 total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=1, min_samples_split=16;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=1, min_samples_split=16;, score=0.654 total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=1, min_samples_split=25;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=1, 

[CV 1/2] END max_depth=1, min_samples_leaf=25, min_samples_split=81;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=25, min_samples_split=81;, score=0.654 total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=36, min_samples_split=1;, score=nan total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=36, min_samples_split=1;, score=nan total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=36, min_samples_split=4;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=36, min_samples_split=4;, score=0.654 total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=36, min_samples_split=9;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=36, min_samples_split=9;, score=0.654 total time=   0.0s
[CV 1/2] END max_depth=1, min_samples_leaf=36, min_samples_split=16;, score=0.622 total time=   0.0s
[CV 2/2] END max_depth=1, min_samples_leaf=36, min_samples_split=16;, score=0.654 total time=   0.0s


[CV 1/2] END max_depth=4, min_samples_leaf=1, min_samples_split=25;, score=0.865 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=1, min_samples_split=25;, score=0.833 total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=1, min_samples_split=36;, score=0.875 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=1, min_samples_split=36;, score=0.829 total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=1, min_samples_split=49;, score=0.864 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=1, min_samples_split=49;, score=0.830 total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=1, min_samples_split=64;, score=0.862 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=1, min_samples_split=64;, score=0.827 total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=1, min_samples_split=81;, score=0.862 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=1, min_samples_split=81;, score=0.826 total time=   0.0s


[CV 1/2] END max_depth=4, min_samples_leaf=36, min_samples_split=49;, score=0.868 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=36, min_samples_split=49;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=36, min_samples_split=64;, score=0.868 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=36, min_samples_split=64;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=36, min_samples_split=81;, score=0.868 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=36, min_samples_split=81;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=49, min_samples_split=1;, score=nan total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=49, min_samples_split=1;, score=nan total time=   0.0s
[CV 1/2] END max_depth=4, min_samples_leaf=49, min_samples_split=4;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=4, min_samples_leaf=49, min_samples_split=4;, score=0.834 total time=   0.0

[CV 1/2] END max_depth=9, min_samples_leaf=4, min_samples_split=16;, score=0.853 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=4, min_samples_split=16;, score=0.832 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=4, min_samples_split=25;, score=0.861 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=4, min_samples_split=25;, score=0.845 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=4, min_samples_split=36;, score=0.875 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=4, min_samples_split=36;, score=0.841 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=4, min_samples_split=49;, score=0.872 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=4, min_samples_split=49;, score=0.842 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=4, min_samples_split=64;, score=0.865 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=4, min_samples_split=64;, score=0.837 total time=   0.0s


[CV 2/2] END max_depth=9, min_samples_leaf=36, min_samples_split=81;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=49, min_samples_split=1;, score=nan total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=49, min_samples_split=1;, score=nan total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=49, min_samples_split=4;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=49, min_samples_split=4;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=49, min_samples_split=9;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=49, min_samples_split=9;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=49, min_samples_split=16;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=9, min_samples_leaf=49, min_samples_split=16;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=9, min_samples_leaf=49, min_samples_split=25;, score=0.851 total time=   0.0s


[CV 2/2] END max_depth=16, min_samples_leaf=9, min_samples_split=9;, score=0.843 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=9, min_samples_split=16;, score=0.870 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=9, min_samples_split=16;, score=0.846 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=9, min_samples_split=25;, score=0.871 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=9, min_samples_split=25;, score=0.846 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=9, min_samples_split=36;, score=0.875 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=9, min_samples_split=36;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=9, min_samples_split=49;, score=0.877 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=9, min_samples_split=49;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=9, min_samples_split=64;, score=0.864 total time

[CV 1/2] END max_depth=16, min_samples_leaf=64, min_samples_split=16;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=64, min_samples_split=16;, score=0.713 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=64, min_samples_split=25;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=64, min_samples_split=25;, score=0.713 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=64, min_samples_split=36;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=64, min_samples_split=36;, score=0.713 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=64, min_samples_split=49;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=64, min_samples_split=49;, score=0.713 total time=   0.0s
[CV 1/2] END max_depth=16, min_samples_leaf=64, min_samples_split=64;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=16, min_samples_leaf=64, min_samples_split=64;, score=0.713

[CV 2/2] END max_depth=25, min_samples_leaf=16, min_samples_split=49;, score=0.840 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=16, min_samples_split=64;, score=0.871 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=16, min_samples_split=64;, score=0.837 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=16, min_samples_split=81;, score=0.871 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=16, min_samples_split=81;, score=0.838 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=25, min_samples_split=1;, score=nan total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=25, min_samples_split=1;, score=nan total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=25, min_samples_split=4;, score=0.878 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=25, min_samples_split=4;, score=0.844 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=25, min_samples_split=9;, score=0.878 total ti

[CV 1/2] END max_depth=25, min_samples_leaf=81, min_samples_split=16;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=81, min_samples_split=16;, score=0.707 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=81, min_samples_split=25;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=81, min_samples_split=25;, score=0.707 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=81, min_samples_split=36;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=81, min_samples_split=36;, score=0.707 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=81, min_samples_split=49;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=81, min_samples_split=49;, score=0.707 total time=   0.0s
[CV 1/2] END max_depth=25, min_samples_leaf=81, min_samples_split=64;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=25, min_samples_leaf=81, min_samples_split=64;, score=0.707

[CV 2/2] END max_depth=36, min_samples_leaf=25, min_samples_split=25;, score=0.844 total time=   0.0s
[CV 1/2] END max_depth=36, min_samples_leaf=25, min_samples_split=36;, score=0.878 total time=   0.0s
[CV 2/2] END max_depth=36, min_samples_leaf=25, min_samples_split=36;, score=0.846 total time=   0.0s
[CV 1/2] END max_depth=36, min_samples_leaf=25, min_samples_split=49;, score=0.878 total time=   0.0s
[CV 2/2] END max_depth=36, min_samples_leaf=25, min_samples_split=49;, score=0.844 total time=   0.0s
[CV 1/2] END max_depth=36, min_samples_leaf=25, min_samples_split=64;, score=0.868 total time=   0.0s
[CV 2/2] END max_depth=36, min_samples_leaf=25, min_samples_split=64;, score=0.837 total time=   0.0s
[CV 1/2] END max_depth=36, min_samples_leaf=25, min_samples_split=81;, score=0.866 total time=   0.0s
[CV 2/2] END max_depth=36, min_samples_leaf=25, min_samples_split=81;, score=0.844 total time=   0.0s
[CV 1/2] END max_depth=36, min_samples_leaf=36, min_samples_split=1;, score=nan to

[CV 1/2] END max_depth=49, min_samples_leaf=1, min_samples_split=9;, score=0.815 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=1, min_samples_split=9;, score=0.774 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=1, min_samples_split=16;, score=0.819 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=1, min_samples_split=16;, score=0.775 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=1, min_samples_split=25;, score=0.816 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=1, min_samples_split=25;, score=0.798 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=1, min_samples_split=36;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=1, min_samples_split=36;, score=0.823 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=1, min_samples_split=49;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=1, min_samples_split=49;, score=0.823 total time=

[CV 2/2] END max_depth=49, min_samples_leaf=36, min_samples_split=9;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=36, min_samples_split=16;, score=0.866 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=36, min_samples_split=16;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=36, min_samples_split=25;, score=0.866 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=36, min_samples_split=25;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=36, min_samples_split=36;, score=0.866 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=36, min_samples_split=36;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=36, min_samples_split=49;, score=0.866 total time=   0.0s
[CV 2/2] END max_depth=49, min_samples_leaf=36, min_samples_split=49;, score=0.839 total time=   0.0s
[CV 1/2] END max_depth=49, min_samples_leaf=36, min_samples_split=64;, score=0.866 

[CV 1/2] END max_depth=64, min_samples_leaf=4, min_samples_split=25;, score=0.862 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=4, min_samples_split=25;, score=0.845 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=4, min_samples_split=36;, score=0.875 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=4, min_samples_split=36;, score=0.842 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=4, min_samples_split=49;, score=0.872 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=4, min_samples_split=49;, score=0.842 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=4, min_samples_split=64;, score=0.865 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=4, min_samples_split=64;, score=0.838 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=4, min_samples_split=81;, score=0.865 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=4, min_samples_split=81;, score=0.837 total tim

[CV 1/2] END max_depth=64, min_samples_leaf=49, min_samples_split=4;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=49, min_samples_split=4;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=49, min_samples_split=9;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=49, min_samples_split=9;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=49, min_samples_split=16;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=49, min_samples_split=16;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=49, min_samples_split=25;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=49, min_samples_split=25;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=64, min_samples_leaf=49, min_samples_split=36;, score=0.851 total time=   0.0s
[CV 2/2] END max_depth=64, min_samples_leaf=49, min_samples_split=36;, score=0.834 tot

[CV 1/2] END max_depth=81, min_samples_leaf=9, min_samples_split=4;, score=0.872 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=9, min_samples_split=4;, score=0.845 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=9, min_samples_split=9;, score=0.870 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=9, min_samples_split=9;, score=0.844 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=9, min_samples_split=16;, score=0.869 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=9, min_samples_split=16;, score=0.845 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=9, min_samples_split=25;, score=0.869 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=9, min_samples_split=25;, score=0.845 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=9, min_samples_split=36;, score=0.875 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=9, min_samples_split=36;, score=0.838 total time=  

[CV 2/2] END max_depth=81, min_samples_leaf=49, min_samples_split=81;, score=0.834 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=64, min_samples_split=1;, score=nan total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=64, min_samples_split=1;, score=nan total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=64, min_samples_split=4;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=64, min_samples_split=4;, score=0.713 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=64, min_samples_split=9;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=64, min_samples_split=9;, score=0.713 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=64, min_samples_split=16;, score=0.707 total time=   0.0s
[CV 2/2] END max_depth=81, min_samples_leaf=64, min_samples_split=16;, score=0.713 total time=   0.0s
[CV 1/2] END max_depth=81, min_samples_leaf=64, min_samples_split=25;, score=0.707 total tim

162 fits failed out of a total of 1458.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
162 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/lib64/python3.8/site-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/lib64/python3.8/site-packages/sklearn/tree/_classes.py", line 1315, in fit
    super().fit(
  File "/usr/lib64/python3.8/site-packages/sklearn/tree/_classes.py", line 250, in fit
    raise ValueError(
ValueError: min_samples_split must be an integer greater than 1 or a float in (0.0, 1.0]; got the integer 1

 0.63792186 0.63792186 0.63792186        nan 0.63792186 0.63792186
 0.63792186 0.63792186 0.63792

{'max_depth': 4, 'min_samples_leaf': 16, 'min_samples_split': 9}

In [11]:
clf = DecisionTreeRegressor(**best_params)
clf.fit(X_train_prepared, y_train_prepared)

print(f"Train score: {clf.score(X_train_prepared, y_train_prepared)}")
print(f"Test score: {clf.score(X_test_prepared, y_test_prepared)}")

Train score: 0.8975478150511199
Test score: 0.7920094383777518
