Skip to content

de_core_news_sm-2.2.0

Compare
Choose a tag to compare
@explosion-bot explosion-bot released this 27 Sep 13:31
· 1415 commits to master since this release
c5623d0

Downloads

Details: https://spacy.io/models/de#de_core_news_sm

File checksum: 8b79574382b1e06b24f67e76d652d60d00750405c23e7a078dc4cc53aad5e219

German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name de_core_news_sm
Version 2.2.0
spaCy >=2.2.0
Model size 14 MB
Pipeline tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources TIGER Corpus
WikiNER
License MIT
Author Explosion

Label Scheme

Component Labels
tagger $(, $,, $., ADJA, ADJD, ADV, APPO, APPR, APPRART, APZR, ART, CARD, FM, ITJ, KOKOM, KON, KOUI, KOUS, NE, NN, NNE, PDAT, PDS, PIAT, PIS, PPER, PPOSAT, PPOSS, PRELAT, PRELS, PRF, PROAV, PTKA, PTKANT, PTKNEG, PTKVZ, PTKZU, PWAT, PWAV, PWS, TRUNC, VAFIN, VAIMP, VAINF, VAPP, VMFIN, VMINF, VMPP, VVFIN, VVIMP, VVINF, VVIZU, VVPP, XY, _SP
parser ROOT, ac, adc, ag, ams, app, avc, cc, cd, cj, cm, cp, cvc, da, dep, dm, ep, ju, mnr, mo, ng, nk, nmc, oa, oc, og, op, par, pd, pg, ph, pm, pnc, punct, rc, re, rs, sb, sbp, svp, uc, vo
ner LOC, MISC, ORG, PER

Accuracy

Type Score
LAS 聽88.63
UAS 聽90.75
TOKEN_ACC 聽95.88
TAGS_ACC 聽96.29
ENTS_F 聽83.11
ENTS_P 聽83.57
ENTS_R 聽82.66

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy
python -m spacy download de_core_news_sm