Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Displacy serve entity linking support without manual=True support. #9748

Merged
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions spacy/displacy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,11 +181,19 @@ def parse_deps(orig_doc: Doc, options: Dict[str, Any] = {}) -> Dict[str, Any]:
def parse_ents(doc: Doc, options: Dict[str, Any] = {}) -> Dict[str, Any]:
"""Generate named entities in [{start: i, end: i, label: 'label'}] format.

doc (Doc): Document do parse.
doc (Doc): Document to parse.
options (Dict[str, Any]): NER-specific visualisation options.
RETURNS (dict): Generated entities keyed by text (original text) and ents.
"""
kb_url_template = options.get("kb_url_template", None)
ents = [
{"start": ent.start_char, "end": ent.end_char, "label": ent.label_}
{
"start": ent.start_char,
"end": ent.end_char,
"label": ent.label_,
"kb_id": ent.kb_id_ if ent.kb_id_ else "",
"kb_url": kb_url_template.format(ent.kb_id_) if kb_url_template else "#",
}
for ent in doc.ents
]
if not ents:
Expand Down
36 changes: 34 additions & 2 deletions spacy/tests/test_displacy.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
import pytest

from spacy import displacy
from spacy.displacy.render import DependencyRenderer, EntityRenderer
from spacy.tokens import Span, Doc
from spacy.lang.fa import Persian
from spacy.tokens import Span, Doc


def test_displacy_parse_ents(en_vocab):
Expand All @@ -12,7 +13,38 @@ def test_displacy_parse_ents(en_vocab):
ents = displacy.parse_ents(doc)
assert isinstance(ents, dict)
assert ents["text"] == "But Google is starting from behind "
assert ents["ents"] == [{"start": 4, "end": 10, "label": "ORG"}]
assert ents["ents"] == [
{"start": 4, "end": 10, "label": "ORG", "kb_id": "", "kb_url": "#"}
]

doc.ents = [Span(doc, 1, 2, label=doc.vocab.strings["ORG"], kb_id="Q95")]
ents = displacy.parse_ents(doc)
assert isinstance(ents, dict)
assert ents["text"] == "But Google is starting from behind "
assert ents["ents"] == [
{"start": 4, "end": 10, "label": "ORG", "kb_id": "Q95", "kb_url": "#"}
]


def test_displacy_parse_ents_with_kb_id_options(en_vocab):
"""Test that named entities with kb_id on a Doc are converted into displaCy's format."""
doc = Doc(en_vocab, words=["But", "Google", "is", "starting", "from", "behind"])
doc.ents = [Span(doc, 1, 2, label=doc.vocab.strings["ORG"], kb_id="Q95")]

ents = displacy.parse_ents(
doc, {"kb_url_template": "https://www.wikidata.org/wiki/{}"}
)
assert isinstance(ents, dict)
assert ents["text"] == "But Google is starting from behind "
assert ents["ents"] == [
{
"start": 4,
"end": 10,
"label": "ORG",
"kb_id": "Q95",
"kb_url": "https://www.wikidata.org/wiki/Q95",
}
]


def test_displacy_parse_deps(en_vocab):
Expand Down
8 changes: 8 additions & 0 deletions website/docs/api/top-level.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,7 @@ If a setting is not present in the options, the default value will be used.
| `ents` | Entity types to highlight or `None` for all types (default). ~~Optional[List[str]]~~ |
| `colors` | Color overrides. Entity types should be mapped to color names or values. ~~Dict[str, str]~~ |
| `template` <Tag variant="new">2.2</Tag> | Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use `{bg}`, `{text}` and `{label}`. See [`templates.py`](%%GITHUB_SPACY/spacy/displacy/templates.py) for examples. ~~Optional[str]~~ |
| `kb_url_template` | Optional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in. ~~Optional[str]~~ |

By default, displaCy comes with colors for all entity types used by
[spaCy's trained pipelines](/models). If you're using custom entity types, you
Expand All @@ -326,6 +327,13 @@ or pipeline package can also expose a
[`spacy_displacy_colors` entry point](/usage/saving-loading#entry-points-displacy)
to add custom labels and their colors automatically.

By default, displaCy links to `#` for entities without a `kb_id` set on their span.
If you wish to link an entity to their URL then consider using the `kb_url_template`
option from above. For example if the `kb_id` on a span is `Q95` and this is a Wikidata
identifier then this option can be set to `https://www.wikidata.org/wiki/{}`.
Clicking on your entity in the rendered HTML should redirect you to their Wikidata page,
in this case `https://www.wikidata.org/wiki/Q95`.

## registry {#registry source="spacy/util.py" new="3"}

spaCy's function registry extends
Expand Down