# Cross-Validation with Decision Trees
This notebook demonstrates how to use cross-validation techniques with a Decision Tree Regressor on the happiness dataset. We use scikit-learn's cross-validation utilities to evaluate model performance.

In [3]:
import pandas as pd
import numpy as np

from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import (cross_val_score, KFold)

**Importing Required Libraries**
We import `pandas` and `numpy` for data manipulation, `DecisionTreeRegressor` for modeling, and cross-validation utilities from scikit-learn.

In [2]:
dataset = pd.read_csv('../data/felicidad.csv')

X = dataset.drop(['country', 'score'], axis=1)
y = dataset['score']

**Loading and Preparing the Dataset**
We load the happiness dataset, separate the features (`X`) from the target variable (`y`), and drop unnecessary columns.

In [6]:
model = DecisionTreeRegressor()
score = cross_val_score(model, X, y, cv = 3, scoring='neg_mean_squared_error')
score

array([-0.84230062, -0.15444293, -0.72795257])

**Model Training and Cross-Validation Scoring**
We train a Decision Tree Regressor and use `cross_val_score` to evaluate its performance using 3-fold cross-validation. The negative mean squared error is used as the scoring metric.

In [7]:
np.abs(np.mean(score))

np.float64(0.5748987084795831)

**Calculating the Mean Absolute Error**
We calculate the mean absolute value of the cross-validation scores to interpret the model's average error across folds.

In [8]:
kf = KFold(n_splits=3, shuffle=True, random_state=42)
for train, test in kf.split(dataset):
    print(f"Train: {train}, Test: {test}") 

Train: [  0   1   2   3   4   5   6   7   8  10  13  14  16  17  20  21  23  25
  28  32  33  34  35  37  38  39  40  41  43  44  46  47  48  49  50  52
  53  54  57  58  59  61  62  63  64  67  70  71  72  73  74  77  80  83
  87  88  89  91  92  94  97  98  99 100 101 102 103 104 105 106 107 108
 110 111 112 113 114 115 116 120 121 123 125 127 128 129 130 132 134 135
 136 139 140 143 144 145 146 148 149 150 151 152 154], Test: [  9  11  12  15  18  19  22  24  26  27  29  30  31  36  42  45  51  55
  56  60  65  66  68  69  75  76  78  79  81  82  84  85  86  90  93  95
  96 109 117 118 119 122 124 126 131 133 137 138 141 142 147 153]
Train: [  1   2   3   6   8   9  11  12  13  14  15  17  18  19  20  21  22  24
  26  27  29  30  31  36  37  38  42  45  48  50  51  52  54  55  56  57
  58  59  60  63  65  66  68  69  71  72  74  75  76  78  79  81  82  83
  84  85  86  87  88  89  90  91  92  93  95  96  99 100 102 103 106 107
 109 112 115 116 117 118 119 120 121 122 124 126 128 129

**Manual KFold Splitting and Index Output**
We use `KFold` to manually split the dataset into 3 folds, shuffling the data for randomness. The indices for training and testing sets in each fold are printed.