# Training loop in spaCy

1. Loop for a number of times.
2. Shuffle the training data - a la SGD; prevents the model from getting stuck in a suboptimal solution
3. Divide the data into batches ("minibatching") - gives more accurate estimate of the gradient
4. Update the model for each batch.
5. Save the updated model.

Example code using 10 loops:

```
# Loop for 10 iterations
for i in range(10):
    # Shuffle the training data
    random.shuffle(TRAINING_DATA)
    # Create batches and iterate over them
    for batch in spacy.util.minibatch(TRAINING_DATA):
        # Split the batch in texts and annotations
        texts = [text for text, annotation in batch]
        annotations = [annotation for text, annotation in batch]
        # Update the model
        nlp.update(texts, annotations)

# Save the model
nlp.to_disk(path_to_model)
```

## Example: Setting up a new pipeline from scratch

```
# Start with blank English model
nlp = spacy.blank('en')
# Create blank entity recognizer and add it to the pipeline
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)
# Add a new label
ner.add_label('GADGET')

# Start the training
nlp.begin_training()
# Train for 10 iterations
for itn in range(10):
    random.shuffle(examples)
    # Divide examples into batches
    for batch in spacy.util.minibatch(examples, size=2):
        texts = [text for text, annotation in batch]
        annotations = [annotation for text, annotation in batch]
        # Update the model
        nlp.update(texts, annotations)
```

In [1]:
import requests
import json

req = requests.get("https://raw.githubusercontent.com/ines/spacy-course/master/exercises/gadgets.json")
TRAINING_DATA = [item for item in req.json()]
print(TRAINING_DATA)

[['How to preorder the iPhone X', {'entities': [[20, 28, 'GADGET']]}], ['iPhone X is coming', {'entities': [[0, 8, 'GADGET']]}], ['Should I pay $1,000 for the iPhone X?', {'entities': [[28, 36, 'GADGET']]}], ['The iPhone 8 reviews are here', {'entities': [[4, 12, 'GADGET']]}], ['Your iPhone goes up to 11 today', {'entities': [[5, 11, 'GADGET']]}], ['I need a new phone! Any tips?', {'entities': []}]]


In [3]:
import spacy
import random

# Create a blank 'en' model
nlp = spacy.blank("en")

# Create a new entity recognizer and add it to the pipeline
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)

# Add the label 'GADGET' to the entity recognizer
ner.add_label("GADGET")

# Start the training
nlp.begin_training()

# Loop for 10 iterations
for itn in range(10):
    # Shuffle the training data
    random.shuffle(TRAINING_DATA)
    losses = {}

    # Batch the examples and iterate over them
    for batch in spacy.util.minibatch(TRAINING_DATA, size=2):
        texts = [text for text, entities in batch]
        annotations = [entities for text, entities in batch]

        # Update the model
        nlp.update(texts, annotations, losses=losses)
        print(losses)

{'ner': 12.799999833106995}
{'ner': 21.858524560928345}
{'ner': 31.59492552280426}
{'ner': 5.870618999004364}
{'ner': 12.188948512077332}
{'ner': 17.54018685221672}
{'ner': 2.850046828389168}
{'ner': 5.846153929363936}
{'ner': 7.571244296152145}
{'ner': 3.6504366922890767}
{'ner': 4.718048918351997}
{'ner': 6.092694667197065}
{'ner': 3.459057238243986}
{'ner': 6.7113398791407235}
{'ner': 8.444608137884643}
{'ner': 1.0007012260030024}
{'ner': 3.914769775874447}
{'ner': 5.604732101724949}
{'ner': 0.7986384892937366}
{'ner': 3.5540181753348605}
{'ner': 3.916439884911526}
{'ner': 0.12369860630360563}
{'ner': 0.23911744905819887}
{'ner': 2.3206883766008506}
{'ner': 0.001701773064610279}
{'ner': 1.4195155765795562}
{'ner': 1.4199137895919876}
{'ner': 0.0015995474786905106}
{'ner': 0.8970869711274645}
{'ner': 0.8970974739667594}
