# <center> Training and updating models

Training and updating spaCy neural network models - focusing in NER.
    

#### Why updating the model?
    
- Better results on your specic domain
- Learn classication schemes specically for your problem
- Essential for text classication
- Very useful for named entity recognition
- Less critical for part-of-speech tagging and dependency parsing
    
#### How training works
1. Initialize the model weights randomly with nlp.begin_training
2. Predict a few examples with the current weights by calling nlp.update
3. Compare prediction with true labels
4. Calculate how to change weights to improve predictions
5. Update weights slightly
6. Go back to 2.
    
    
<img src="https://d33wubrfki0l68.cloudfront.net/a634ac2555f216f30e47a08312745a85e552f4f1/b1d15/training-73950e71e6b59678754a87d6cf1481f9.svg" width="800" height="800">   
    
- Training data: Examples and their annotations.
- Text: The input text the model should predict a label for.
- Label: The labelthe model should predict.
- Gradient: How to change the weights.
    
    
#### Example: Training the entity recognizer
- The entity recognizer tags words and phrases in context
- Each token can only be part of one entity
- Examples need to come with context
    
` ("iPhone X is coming" , {'entities': [(0, 8, 'GADGET')]}) `
- Texts with no entities are also important
    
` ("I need a new phone! Any tips?", {'entities': []})`
- Goal:teach the model to generalize
       
#### The training data
- Examples of what we want the modelto predict in context
- Update an existing model: a few hundred to a few thousand examples
- Train a new category: a few thousand to a million examples
    - spaCy's English models: 2 million words
- Usually created manually by human annotators
- Can be semi-automated – for example, using spaCy's Matcher !

#### Creating training data example:
spaCy's rule-based Matcher is a great way to quickly create training data for named entity models

In [1]:
TEXT=['How to preorder the iPhone X',
 'iPhone X is coming',
 'Should I pay $1,000 for the iPhone X?',
 'The iPhone 8 reviews are here',
 'Your iPhone goes up to 11 today',
 'I need a new phone! Any tips?']

In [2]:
from spacy.matcher import Matcher
from spacy.lang.en import English
nlp=English()

# Initialize the Matcher and add the patterns
matcher = Matcher(nlp.vocab)
# Two tokens whose lowercase forms match 'iphone' and 'x'
pattern1 = [{'LOWER': 'iphone'}, {'LOWER': 'x'}]

# Token whose lowercase form matches 'iphone' and an optional digit
pattern2 = [{'LOWER': 'iphone'}, {'IS_DIGIT': True,'OP': '?'}]

# Add patterns to the matcher
matcher.add('GADGET', None, pattern1, pattern2)

In [3]:
# Create a Doc object for each text in TEXTS
for doc in nlp.pipe(TEXT):
    # Find the matches in the doc
    matches = matcher(doc)
    
    # Get a list of (start, end, label) tuples of matches in the text
    entities = [(start, end, 'GADGET') for index, start, end in matches]
    print(doc.text, entities) 

How to preorder the iPhone X [(4, 6, 'GADGET'), (4, 5, 'GADGET')]
iPhone X is coming [(0, 2, 'GADGET'), (0, 1, 'GADGET')]
Should I pay $1,000 for the iPhone X? [(7, 9, 'GADGET'), (7, 8, 'GADGET')]
The iPhone 8 reviews are here [(1, 2, 'GADGET'), (1, 3, 'GADGET')]
Your iPhone goes up to 11 today [(1, 2, 'GADGET')]
I need a new phone! Any tips? []


In [5]:
TRAINING_DATA = []

# Create a Doc object for each text in TEXTS
for doc in nlp.pipe(TEXT):
    # Match on the doc and create a list of matched spans
    spans = [doc[start:end] for match_id, start, end in matcher(doc)]
    # Get (start character, end character, label) tuples of matches
    entities = [(span.start_char, span.end_char, 'GADGET') for span in spans]
    
    # Format the matches as a (doc.text, entities) tuple
    training_example = (doc.text, {'entities': entities})
    # Append the example to the training data
    TRAINING_DATA.append(training_example)
    
print(*TRAINING_DATA, sep='\n')    

('How to preorder the iPhone X', {'entities': [(20, 28, 'GADGET'), (20, 26, 'GADGET')]})
('iPhone X is coming', {'entities': [(0, 8, 'GADGET'), (0, 6, 'GADGET')]})
('Should I pay $1,000 for the iPhone X?', {'entities': [(28, 36, 'GADGET'), (28, 34, 'GADGET')]})
('The iPhone 8 reviews are here', {'entities': [(4, 10, 'GADGET'), (4, 12, 'GADGET')]})
('Your iPhone goes up to 11 today', {'entities': [(5, 11, 'GADGET')]})
('I need a new phone! Any tips?', {'entities': []})
