#### Training and updating models
You can usually make the model more accurate by showing it examples from your domain.
You often also want to predict categories specific to your problem, so the model needs to learn about them.
This is essential for text classification, very useful for entity recognition and a little less critical for tagging and parsing.

##### Creating Training Data
`Matcher` is a great way to quickly create training data for named entity models.

In [1]:
import spacy
from spacy.matcher import Matcher
from spacy.tokens import Span, DocBin

TEXTS = [
    'How to preorder the iPhone X',
    'iPhone X is coming',
    'Should I pay $1,000 for the iPhone X?',
    'The iPhone 8 reviews are here',
    "iPhone 11 vs iPhone 8: What's the difference?",
    'I need a new phone! Any tips?'
]

nlp = spacy.blank("en")
matcher = Matcher(nlp.vocab)

# Two tokens whose lowercase forms match "iphone" and "x"
pattern1 = [{"LOWER": "iphone"}, {"LOWER": "x"}]
# Token whose lowercase form matches "iphone" and a digit
pattern2 = [{"LOWER": "iphone"}, {"IS_DIGIT": True}]
# Add patterns to the matcher and create docs with matched entities
matcher.add("GADGET", [pattern1, pattern2])

docs = []
for doc in nlp.pipe(TEXTS):
    matches = matcher(doc)
    spans = [Span(doc, start, end, label=match_id) for match_id, start, end in matches]
    print(spans)
    doc.ents = spans
    docs.append(doc)

doc_bin = DocBin(docs=docs)
# doc_bin.to_disk('./train.spacy')

[iPhone X]
[iPhone X]
[iPhone X]
[iPhone 8]
[iPhone 11, iPhone 8]
[]


##### Configuring Training
spaCy uses a config file, usually called config.cfg, as the "single source of truth" for all settings. The config file defines how to initialize the nlp object, which pipeline components to add and how their internal model implementations should be configured. It also includes all settings for the training process and how to load the data, including hyperparameters.

The quickstart widget in the documentation lets you generate a config interactively by selecting the language and pipeline components you need, as well as optional hardware and optimization settings.

To train a pipeline, all you need is the config file and the training and development data.
The first argument of spacy train is the path to the config file. The --output argument lets you specify a directory for saving the final trained pipeline.
You can also override different config settings on the command line. In this case, we override paths.train using the path to the train.spacy file and paths.dev using the dev.spacy file.

`$ python -m spacy train ./config.cfg --output ./output --paths.train train.spacy --paths.dev dev.spacy`

- `train`: the command to run
- `config.cfg`: the path to the config file
- `--output`: the path to the output directory to save the trained pipeline
- `--paths.train`: override with path to the training data
- `--paths.dev`: override with path to the evaluation data

##### Example output

```
============================ Training pipeline ============================
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001

E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     26.50    0.73    0.39    5.43    0.01
  0     200         33.58    847.68   10.88   44.44    6.20    0.11
  1     400         70.88    267.65   33.50   45.95   26.36    0.33
  2     600         67.56    156.63   45.32   62.16   35.66    0.45
  3     800        138.28    134.12   48.17   74.19   35.66    0.48
  4    1000        177.95    109.77   51.43   66.67   41.86    0.51
  6    1200         94.95     52.13   54.63   67.82   45.74    0.55
  8    1400        126.85     66.19   56.00   65.62   48.84    0.56
 10    1600         38.34     24.16   51.96   70.67   41.09    0.52
 13    1800        105.14     23.23   56.88   69.66   48.06    0.57

✔ Saved pipeline to output directory
/path/to/output/model-last
```

You can load your trained pipeline by passing the path to spacy.load.

```
import spacy

nlp = spacy.load("/path/to/output/model-best")
doc = nlp("iPhone 11 vs iPhone 8: What's the difference?")
print(doc.ents)
```

To make it easy to deploy your pipelines, spaCy provides a handy command to package them as Python packages. The spacy package command takes the path to your exported pipeline and an output directory. It then generates a Python package containing your pipeline. The Python package is a .tar.gz file and can be installed into your environment.
You can also provide an optional name and version on the command. This lets you manage multiple different versions of a pipeline, for example, if you decide to customize your pipeline later or train it with more data.
The package behaves just like any other Python package. After installation, you can load your pipeline using its name. Note that spaCy will automatically add the language code to the name. So your pipeline my_pipeline will become en_my_pipeline.


`python -m spacy package /path/to/output/model-best ./packages --name my_pipeline --version 1.0.0`

`cd ./packages/en_my_pipeline-1.0.0`

`pip install dist/en_my_pipeline-1.0.0.tar.gz`

Load and use the pipeline after installation:

`nlp = spacy.load("en_my_pipeline")`