# 03-build-model-basic

## Main objectives:

- Use the default training config as recommended by spaCy 3.0 for a text classification training model
    - [https://nightly.spacy.io/usage/training#quickstart](https://nightly.spacy.io/usage/training#quickstart)
- Use spaCy generated docs to train this model
- Run basic validation and evaluation 

In [55]:
import spacy
# load an english language model in spacy
nlp = spacy.load("en_core_web_lg")

In [56]:
!ls data

dev.spacy  train.spacy


## Fill in the config based on our base_config generated from spacy.io

In [58]:
!python -m spacy init fill-config ./config/base_config-basic.cfg ./config/config-basic.cfg

2021-03-16 22:52:07.002680: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[38;5;2m✔ Auto-filled config with all values[0m
[38;5;2m✔ Saved config[0m
config/config-basic.cfg
You can now add your data and train your pipeline:
python -m spacy train config-basic.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


# Validate configuration file

In [59]:
# validate configuration
!python -m spacy debug config ./config/config-basic.cfg

2021-03-16 22:52:12.642228: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[1m
[1m
[1m
[38;5;2m✔ Config is valid[0m


# Train the model

In [60]:
!python -m spacy train ./config/config-basic.cfg --output ./models/basic

2021-03-16 22:52:26.284382: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[38;5;4mℹ Using CPU[0m
[1m
[2021-03-16 22:52:27,188] [INFO] Set up nlp object from config
[2021-03-16 22:52:27,195] [INFO] Pipeline: ['tok2vec', 'textcat']
[2021-03-16 22:52:27,199] [INFO] Created vocabulary
[2021-03-16 22:52:27,199] [INFO] Finished initializing nlp object
[2021-03-16 22:52:33,932] [INFO] Initialized pipeline components: ['tok2vec', 'textcat']
[38;5;2m✔ Initialized pipeline[0m
[1m
[38;5;4mℹ Pipeline: ['tok2vec', 'textcat'][0m
[38;5;4mℹ Initial learn rate: 0.001[0m
E    #       LOSS TOK2VEC  LOSS TEXTCAT  CATS_SCORE  SCORE 
---  ------  ------------  ------------  ----------  ------
  0       0          0.00          0.25        0.00    0.00
  0     200          0.00         22.34       55.26    0.55
  0     400          0.00         17.06       66.35    0.66
  0     600          0.00          7.66       69.77    0.7

# Evaluate our best model output and save metrics to disk

In [61]:
!python -m spacy evaluate ./models/basic/model-best ./data/dev.spacy --output ./evaluate/basic-metrics.json

2021-03-16 23:03:38.076305: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[38;5;4mℹ Using CPU[0m
[1m

TOK                 100.00
TEXTCAT (macro F)   69.77 
SPEED               48068 

[1m

                P       R       F
positive   100.00   53.58   69.77

[1m

           ROC AUC
positive      0.92

[38;5;2m✔ Saved results to evaluate/basic-metrics.json[0m
