# Efficient Phrase Matching

        If you need to match large terminology lists you can also use the PhraseMatcher and can create Doc objects instead of token patterns which is much more efficient. The Doc pattern can contain single or multiple tokens.

In [3]:
import spacy
from spacy.matcher import PhraseMatcher

In [4]:
nlp = spacy.load('en_core_web_sm')


In [15]:
doc = nlp(open('mytext.txt').read())

In [16]:
matcher = PhraseMatcher(nlp.vocab)

In [18]:
terms = ['BARAC OBAMA', 'ANGELA MERKEL', 'WASHINGTON D. C.']

In [19]:
pattern = [nlp.make_doc(text) for text in terms]
pattern

[BARAC OBAMA, ANGELA MERKEL, WASHINGTON D. C.]

In [20]:
matcher.add('term', None, *pattern)

In [21]:
doc

BERLIN — After Donald Trump was elected president on Nov. 8, 2016, his future German counterpart, Chancellor ANGELA MERKEL, offered him her “close cooperation,” at least if Trump respected common values such as “democracy, freedom, as well as respect for the rule of law and the dignity of each and every person, regardless of their origin, skin color, creed, gender, sexual orientation, or political views.”

It appeared to be pure coincidence that less than two weeks later, Merkel also announced that she would run for a fourth term, after thinking about it “for an eternity.”

In interviews at the time, her reasoning behind another run appeared to be mainly associated with the rise of populism in Germany. “Can I do something to facilitate cohesion in our polarized society? I think I can help to tone down the rhetoric: Instead of hating each other, we should debate like democrats.”

While Merkel was referring to populists in Germany, the underlying message may very well also have been dire

In [24]:
matches = matcher(doc)

In [28]:
for match_id, start, end in matches:
    span = doc[start:end]
    print('{:<20},{:<10},{:<10}'.format(span.text, start, end))

ANGELA MERKEL       ,20        ,22        
BARAC OBAMA         ,283       ,285       
WASHINGTON D. C.    ,287       ,290       
