# How does training with weight affect the results?

To understand this we need to look at a few things:

- Training with weighting turned on and off
- Evaluating with weight turned on and off

And there are two different weights we have to look at:

- multijet cross section weight per sample.
- Flatten by pT weight

To really do this right, we need to explore all of this.

## Initalization

In [6]:
import bdt_training_scikit_tools

from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

## Load data samples
Load up the following:

- Job 8: Standard weights, with flat in pT

In [2]:
flat_samples = bdt_training_scikit_tools.load_default_samples("8")

BIB length: 156359
Multijet length: 1499995
Signal length: 1322937


## Training with the Flattened Weighted Sample

In [3]:
flat_training, flat_testing = bdt_training_scikit_tools.test_train_samples(flat_samples)

In [9]:
def train_samples (samples, use_weight = False):
    '''Train samples with various options
    
    Args:
        samples - the list bib, mj, and signal samples for training
        use_weight - if true, then do a weighted training
    '''
    all_events, all_events_class = bdt_training_scikit_tools.prep_samples(samples[0], samples[1], samples[2])
    
    bdt = AdaBoostClassifier(
        DecisionTreeClassifier(min_samples_leaf=0.01),
        n_estimators=10,
        learning_rate=1)
    
    bdt.fit(all_events, all_events_class.Class)
    
    # The BDT is sent back for use
    return bdt

In [None]:
%%time
b = train_samples(flat_training)