## Training a BDT
Now we will train a BDT on the data using engineered variables from Charlie and UCSB. The model will be built using <br>
Scikit-Learn's Gradient Boosting classifier to best mimic Charlie's BDT.

```python
params = {} # BDT Parameters
bdt = GradientBoostingClassifier(**params)
bdt.fit(X, y)
```

In [19]:
%matplotlib inline
from __future__ import print_function, division
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
import pickle

In [20]:
best_5 = ["Top Mass", "(b, j2) mass", "(b, j3) mass", "Top Pt", "W ptDR"]
best_10 = []
best_15 = []
best_20 = []

In [21]:
df_train = pd.read_csv("ttH_hadT_cut_train.csv", header=0, index_col=0)
df_test = pd.read_csv("ttH_hadT_cut_test.csv", header=0, index_col=0)
df_raw_train = pd.read_csv("ttH_hadT_cut_raw_train.csv", header=0, index_col=0)
df_raw_test = pd.read_csv("ttH_hadT_cut_raw_test.csv", header=0, index_col=0)

In [22]:
train_X = df_train.iloc[:, 1:]
train_y = df_train.iloc[:, 0]

train5_X = df_train.loc[:, best_5]
train5_y = df_train.iloc[:, 0]

raw_train_X = df_raw_train.iloc[:, 1:]
raw_train_y = df_raw_train.iloc[:, 0]

test_X = df_test.iloc[:, 1:]
test_y = df_test.iloc[:, 0]

test5_X = df_test.loc[:, best_5]
test5_y = df_test.iloc[:, 0]

raw_test_X = df_raw_test.iloc[:, 1:]
raw_test_y = df_raw_test.iloc[:, 0]

## Train a BDT on all features

In [10]:
params = dict(max_depth=8, learning_rate=0.1, n_estimators=1000, min_samples_leaf=0.045, subsample=0.5, min_samples_split=20)
bdt = GradientBoostingClassifier(**params).fit(train_X, train_y)

In [11]:
bdt.score(test_X, test_y)*100

81.64574813143422

In [12]:
with open("bdt.pkl", 'wb') as f:
    pickle.dump(bdt, f)

## Train a BDT on the 5 best engineered features

In [13]:
params = dict(max_depth=8, learning_rate=0.1, n_estimators=1000, min_samples_leaf=0.045, subsample=0.5, min_samples_split=20)
bdt5 = GradientBoostingClassifier(**params).fit(train5_X, train5_y)

In [14]:
bdt5.score(test5_X, test5_y)*100

79.73781319042918

In [23]:
with open("bdt_eng5.pkl", 'wb') as f:
    pickle.dump(bdt5, f)

## Train a BDT on the basic features

In [16]:
params = dict(max_depth=8, learning_rate=0.1, n_estimators=1000, min_samples_leaf=0.045, subsample=0.5, min_samples_split=20)
basic_bdt = GradientBoostingClassifier(**params).fit(raw_train_X, raw_train_y)

In [17]:
basic_bdt.score(raw_test_X, raw_test_y)*100

70.84097212428901

In [24]:
with open("basic_bdt.pkl", 'wb') as f:
    pickle.dump(basic_bdt, f)