de_core_news_sm-2.2.0
explosion-bot
released this
27 Sep 13:31
·
1415 commits
to master
since this release
Details: https://spacy.io/models/de#de_core_news_sm
File checksum:
8b79574382b1e06b24f67e76d652d60d00750405c23e7a078dc4cc53aad5e219
German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | de_core_news_sm |
Version | 2.2.0 |
spaCy | >=2.2.0 |
Model size | 14 MB |
Pipeline | 聽tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | TIGER Corpus WikiNER |
License | MIT |
Author | Explosion |
Label Scheme
Component | Labels |
---|---|
tagger |
聽$( , $, , $. , ADJA , ADJD , ADV , APPO , APPR , APPRART , APZR , ART , CARD , FM , ITJ , KOKOM , KON , KOUI , KOUS , NE , NN , NNE , PDAT , PDS , PIAT , PIS , PPER , PPOSAT , PPOSS , PRELAT , PRELS , PRF , PROAV , PTKA , PTKANT , PTKNEG , PTKVZ , PTKZU , PWAT , PWAV , PWS , TRUNC , VAFIN , VAIMP , VAINF , VAPP , VMFIN , VMINF , VMPP , VVFIN , VVIMP , VVINF , VVIZU , VVPP , XY , _SP |
parser |
聽ROOT , ac , adc , ag , ams , app , avc , cc , cd , cj , cm , cp , cvc , da , dep , dm , ep , ju , mnr , mo , ng , nk , nmc , oa , oc , og , op , par , pd , pg , ph , pm , pnc , punct , rc , re , rs , sb , sbp , svp , uc , vo |
ner |
聽LOC , MISC , ORG , PER |
Accuracy
Type | Score |
---|---|
LAS |
聽88.63 |
UAS |
聽90.75 |
TOKEN_ACC |
聽95.88 |
TAGS_ACC |
聽96.29 |
ENTS_F |
聽83.11 |
ENTS_P |
聽83.57 |
ENTS_R |
聽82.66 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy
python -m spacy download de_core_news_sm