# Train the intent classifier using pretrained BERT model as featurizer
This notebook creates the BERT language classifier model.  See the [README.md](../README.md) for instructions on how to run this sample.
The resulting model is placed in the `<home dir>/models/bert` directory which is packaged with the bot.

## `model_corebot101` package
This sample creates a separate python package (`model_corebot101`) which contains all the code to train, evaluate and infer intent classifiers for this sample.

## See also:
- [The BERT runtime model](bert_model_runtime.ipynb) to test the resulting intent classifier model.
- [The BiDAF runtime model](bidaf_model_runtime.ipynb) to test the associated BiDAF model to test the entity classifier model.
- [The model runtime](model_runtime.ipynb) to test the both the BERT and BiDAF model together.



In [1]:
from model_corebot101.bert.train import BertTrainEval

## `BertTrainEvan.train_eval` method
This method performs all the training and performs evaluation that's listed at the bottom of the output.  Training may take several minutes to complete.

The evaluation output should look something like the following:
```bash
06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -   ***** Eval results *****
06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     acc = 1.0
06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     acc_and_f1 = 1.0
06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     eval_loss = 0.06498947739601135
06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     f1 = 1.0
06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     global_step = 12
06/02/2019 19:46:52 - INFO - model_corebot101.bert.train.bert_train_eval -     loss = 0.02480666587750117
```

In [2]:
BertTrainEval.train_eval(cleanup_output_dir=True)

Bert Model training_data_dir is set to d:\python\daveta-docker-wizard\apub\samples\flask\101.corebot-bert-bidaf\model\model_corebot101\bert\training_data
Bert Model model_dir is set to C:\Users\daveta\models\bert
07/02/2019 07:16:09 - INFO - model_corebot101.bert.train.bert_train_eval -   device: cpu n_gpu: 0, distributed training: False, 16-bits training: None
07/02/2019 07:16:09 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at C:\Users\daveta\.pytorch_pretrained_bert\26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
07/02/2019 07:16:10 - INFO - pytorch_pretrained_bert.modeling -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at C:\Users\daveta\.pytorch_pretrained_bert\distributed_-1\9c41111e2de84547a463fd39217199738d1e3deb72d

## Verify the output directory

In [4]:
import os
from pathlib import Path
from tqdm import tqdm_notebook
home_dir = str(Path.home())
path = os.path.abspath(os.path.join(home_dir, "models/bert"))
files_with_size = {file:os.path.getsize(os.path.join(path, file)) for file in os.listdir(path)}
expected = {'config.json':326, 'eval_results.txt':119, 'pytorch_model.bin':437982182, 'vocab.txt':262030}
for f in tqdm_notebook(expected.keys(), desc='Verify Output'):
    if f in files_with_size:
        delta = abs(expected[f] - files_with_size[f]) / expected[f]
        if delta > float(.30):
            raise Exception(f'Size of output file {f} is out of range of expected.')
    else:
        raise Exception(f'Expected file {f} missing from output.')

HBox(children=(IntProgress(value=0, description='Verify Output', max=4, style=ProgressStyle(description_width=…


