- The EntityRuler is an exciting new component that lets you add named entities based on pattern dictionaries, and makes it easy to combine rule-based and statistical named entity recognition for even more powerful models.
#### Entity Patterns
- Entity patterns are dictionaries with two keys: "label", specifying the label to assign to the entity if the pattern is matched, and "patter", the match pattern. The entity ruler accepts two types of patterns:
- Phrase Pattern `{"label":"ORG", "pattern":"Apple"}`
- Token Pattern `{"label":"GPE", "pattern":[{"LOWER":"san"},{{"LOWER":"francisco"}]}`
#### Using the Entity Ruler
- The EntityRuler is a pipeline component that's typically added via nlp_add.pipe. When the nlp object is called on a text, it will find matches in the doc and add them as entities to the doc.ents, using the specified pattern label as the entity label
- https://spacy.io/api/annotation#named-entities


In [1]:
import spacy
from spacy.matcher import Matcher
from spacy.tokens import Span
from spacy import displacy

In [2]:
from spacy.pipeline import EntityRuler

In [3]:
nlp = spacy.load("en_core_web_sm")

In [4]:
ruler = nlp.add_pipe("entity_ruler", name="entity_ruler", before="ner")

In [5]:
patterns = [{"label":"ORG", "pattern":"Apple"},
          {"label":"GPE", "pattern":[{"LOWER":"san"}, {"LOWER":"francisco"}]}]
patterns

[{'label': 'ORG', 'pattern': 'Apple'},
 {'label': 'GPE', 'pattern': [{'LOWER': 'san'}, {'LOWER': 'francisco'}]}]

In [6]:
ruler.add_patterns(patterns)

In [7]:
doc = nlp("Apple is opening a new store in San Francisco")

In [8]:
print([(ent.text, ent.label_) for ent in doc.ents])

[('Apple', 'ORG'), ('San Francisco', 'GPE')]
