Releases · explosion/spacy-models

17 Nov 08:13

en_core_web_trf-3.7.3

f15206b

Latest

Checksum .tar.gz: dae355f7f419bee53f2804a8e62a6473425e8680ac8ff8e8a7b30b7e2b8b0c4f
Checksum .whl: f72abb34bdf174876bd4267b29b2501677e605e0a251fdc56c163003182ed68b

Details: https://spacy.io/models/en#en_core_web_trf

English transformer pipeline (Transformer(name='roberta-base', piece_encoder='byte-bpe', stride=104, type='roberta', width=768, window=144, vocab_size=50265)). Components: transformer, tagger, parser, ner, attribute_ruler, lemmatizer.

Feature	Description
Name	`en_core_web_trf`
Version	`3.7.3`
spaCy	`>=3.7.2,<3.8.0`
Default Pipeline	`transformer`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner`
Components	`transformer`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University) roberta-base (Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov)
License	`MIT`
Author	Explosion
Model size	436 MB

Label Scheme

View label scheme (112 labels for 3 components)

Component	Labels
`tagger`	`$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, ````
`parser`	`ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	99.86
`TOKEN_P`	99.57
`TOKEN_R`	99.58
`TOKEN_F`	99.57
`TAG_ACC`	98.13
`SENTS_P`	94.89
`SENTS_R`	85.79
`SENTS_F`	90.11
`DEP_UAS`	95.26
`DEP_LAS`	93.91
`ENTS_P`	90.08
`ENTS_R`	90.30
`ENTS_F`	90.19

Installation

pip install spacy
python -m spacy download en_core_web_trf

Assets 4

17 Nov 08:13

explosion-bot

en_core_web_sm-3.7.1

f15206b

en_core_web_sm-3.7.1

Checksum .tar.gz: 1075c2aa2bc2fee105ab6e90a01a5d1a428c9f5b20a1fa003dc2cb6a438d295e
Checksum .whl: 86cc141f63942d4b2c5fcee06630fd6f904788d2f0ab005cce45aadb8fb73889

Details: https://spacy.io/models/en#en_core_web_sm

English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.

Feature	Description
Name	`en_core_web_sm`
Version	`3.7.1`
spaCy	`>=3.7.2,<3.8.0`
Default Pipeline	`tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner`
Components	`tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University)
License	`MIT`
Author	Explosion
Model size	12 MB

Label Scheme

View label scheme (113 labels for 3 components)

Component	Labels
`tagger`	`$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, `_SP`, ````
`parser`	`ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	99.86
`TOKEN_P`	99.57
`TOKEN_R`	99.58
`TOKEN_F`	99.57
`TAG_ACC`	97.25
`SENTS_P`	92.02
`SENTS_R`	89.21
`SENTS_F`	90.59
`DEP_UAS`	91.75
`DEP_LAS`	89.87
`ENTS_P`	84.55
`ENTS_R`	84.57
`ENTS_F`	84.56

Installation

pip install spacy
python -m spacy download en_core_web_sm

Assets 4

17 Nov 08:13

explosion-bot

en_core_web_md-3.7.1

f15206b

en_core_web_md-3.7.1

Checksum .tar.gz: 3273a1335fcb688be09949c5cdb73e85eb584ec3dfc50d4338c17daf6ccd4628
Checksum .whl: 6a0f857a2b4d219c6fa17d455f82430b365bf53171a2d919b9376e5dc9be032e

Details: https://spacy.io/models/en#en_core_web_md

English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.

Feature	Description
Name	`en_core_web_md`
Version	`3.7.1`
spaCy	`>=3.7.2,<3.8.0`
Default Pipeline	`tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner`
Components	`tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner`
Vectors	514157 keys, 20000 unique vectors (300 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University) Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl) (Explosion)
License	`MIT`
Author	Explosion
Model size	40 MB

Label Scheme

View label scheme (113 labels for 3 components)

Component	Labels
`tagger`	`$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, `_SP`, ````
`parser`	`ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	99.86
`TOKEN_P`	99.57
`TOKEN_R`	99.58
`TOKEN_F`	99.57
`TAG_ACC`	97.33
`SENTS_P`	92.21
`SENTS_R`	89.37
`SENTS_F`	90.77
`DEP_UAS`	92.05
`DEP_LAS`	90.23
`ENTS_P`	84.94
`ENTS_R`	85.49
`ENTS_F`	85.22

Installation

pip install spacy
python -m spacy download en_core_web_md

Assets 4

17 Nov 08:13

explosion-bot

en_core_web_lg-3.7.1

f15206b

en_core_web_lg-3.7.1

Checksum .tar.gz: 4c8b2fd2572a5fb232c7b38345d301e7e092d1242b7184e14a86eff8ef6eb6d7
Checksum .whl: ab70aeb6172cde82508f7739f35ebc9918a3d07debeed637403c8f794ba3d3dc

Details: https://spacy.io/models/en#en_core_web_lg

English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.

Feature	Description
Name	`en_core_web_lg`
Version	`3.7.1`
spaCy	`>=3.7.2,<3.8.0`
Default Pipeline	`tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner`
Components	`tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner`
Vectors	514157 keys, 514157 unique vectors (300 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University) Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl) (Explosion)
License	`MIT`
Author	Explosion
Model size	560 MB

Label Scheme

View label scheme (113 labels for 3 components)

Component	Labels
`tagger`	`$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, `_SP`, ````
`parser`	`ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	99.86
`TOKEN_P`	99.57
`TOKEN_R`	99.58
`TOKEN_F`	99.57
`TAG_ACC`	97.35
`SENTS_P`	92.19
`SENTS_R`	89.27
`SENTS_F`	90.71
`DEP_UAS`	92.08
`DEP_LAS`	90.27
`ENTS_P`	85.16
`ENTS_R`	85.70
`ENTS_F`	85.43

Installation

pip install spacy
python -m spacy download en_core_web_lg

Assets 4

01 Oct 09:30

explosion-bot

zh_core_web_trf-3.7.2

dbe9b97

zh_core_web_trf-3.7.2

Checksum .tar.gz: 38857a79f6754b9427619362843c84c18e6410e7ba1f05a1d7aa1c91f7b08904
Checksum .whl: 16b8d4bf23d20a04cfcbe676ae1be2be4437b40cf8101c9f3e7f6db4674ec91d

Details: https://spacy.io/models/zh#zh_core_web_trf

Chinese transformer pipeline (Transformer(name='bert-base-chinese', piece_encoder='bert-wordpiece', stride=152, type='bert', width=768, window=208, vocab_size=21128)). Components: transformer, tagger, parser, ner, attribute_ruler.

Feature	Description
Name	`zh_core_web_trf`
Version	`3.7.2`
spaCy	`>=3.7.0,<3.8.0`
Default Pipeline	`transformer`, `tagger`, `parser`, `attribute_ruler`, `ner`
Components	`transformer`, `tagger`, `parser`, `attribute_ruler`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group) bert-base-chinese (Hugging Face)
License	`MIT`
Author	Explosion
Model size	396 MB

Label Scheme

View label scheme (99 labels for 3 components)

Component	Labels
`tagger`	`AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X`
`parser`	`ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	95.85
`TOKEN_P`	94.58
`TOKEN_R`	91.36
`TOKEN_F`	92.94
`TAG_ACC`	91.75
`SENTS_P`	70.92
`SENTS_R`	67.57
`SENTS_F`	69.21
`DEP_UAS`	75.72
`DEP_LAS`	71.45
`ENTS_P`	76.09
`ENTS_R`	72.18
`ENTS_F`	74.08

Installation

pip install spacy
python -m spacy download zh_core_web_trf

Assets 4

01 Oct 08:58

explosion-bot

zh_core_web_sm-3.7.0

dbe9b97

zh_core_web_sm-3.7.0

Checksum .tar.gz: c22fe1cb9a0479a297d24d33641592436d1b68385c9bbd750ea20e84c4273ef5
Checksum .whl: f51075665749e07406d629d1055ce5a68635fae6ab3c34257ee798c62b4fc431

Details: https://spacy.io/models/zh#zh_core_web_sm

Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.

Feature	Description
Name	`zh_core_web_sm`
Version	`3.7.0`
spaCy	`>=3.7.0,<3.8.0`
Default Pipeline	`tok2vec`, `tagger`, `parser`, `attribute_ruler`, `ner`
Components	`tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group)
License	`MIT`
Author	Explosion
Model size	46 MB

Label Scheme

View label scheme (100 labels for 3 components)

Component	Labels
`tagger`	`AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X`, `_SP`
`parser`	`ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	95.85
`TOKEN_P`	94.58
`TOKEN_R`	91.36
`TOKEN_F`	92.94
`TAG_ACC`	89.33
`SENTS_P`	77.85
`SENTS_R`	72.62
`SENTS_F`	75.14
`DEP_UAS`	69.60
`DEP_LAS`	64.08
`ENTS_P`	72.03
`ENTS_R`	64.93
`ENTS_F`	68.30

Installation

pip install spacy
python -m spacy download zh_core_web_sm

Assets 4

01 Oct 08:58

explosion-bot

zh_core_web_md-3.7.0

dbe9b97

zh_core_web_md-3.7.0

Checksum .tar.gz: 920cf2f7e8db666f22d52b763ff76cf9eeac2c7e6dbc00f5e99ed543ba7da50e
Checksum .whl: a528dbbcf7f323718be4b523559840dc850303046e25a62f9a1049b7ab9f9e68

Details: https://spacy.io/models/zh#zh_core_web_md

Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.

Feature	Description
Name	`zh_core_web_md`
Version	`3.7.0`
spaCy	`>=3.7.0,<3.8.0`
Default Pipeline	`tok2vec`, `tagger`, `parser`, `attribute_ruler`, `ner`
Components	`tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `ner`
Vectors	500000 keys, 20000 unique vectors (300 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group) Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia) (Explosion)
License	`MIT`
Author	Explosion
Model size	74 MB

Label Scheme

View label scheme (100 labels for 3 components)

Component	Labels
`tagger`	`AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X`, `_SP`
`parser`	`ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	95.85
`TOKEN_P`	94.58
`TOKEN_R`	91.36
`TOKEN_F`	92.94
`TAG_ACC`	90.04
`SENTS_P`	78.89
`SENTS_R`	72.80
`SENTS_F`	75.72
`DEP_UAS`	70.50
`DEP_LAS`	65.22
`ENTS_P`	71.88
`ENTS_R`	67.90
`ENTS_F`	69.83

Installation

pip install spacy
python -m spacy download zh_core_web_md

Assets 4

01 Oct 08:58

explosion-bot

zh_core_web_lg-3.7.0

dbe9b97

zh_core_web_lg-3.7.0

Checksum .tar.gz: 0a07048baf3e73f22b16a7edac47f97632772c7a05ebf1bcc51ab458f0670dcf
Checksum .whl: 6bfd1796788dc27c0f5e0cc43374eb96abe0b4f0ec1b29f19f5782051216c556

Details: https://spacy.io/models/zh#zh_core_web_lg

Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.

Feature	Description
Name	`zh_core_web_lg`
Version	`3.7.0`
spaCy	`>=3.7.0,<3.8.0`
Default Pipeline	`tok2vec`, `tagger`, `parser`, `attribute_ruler`, `ner`
Components	`tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `ner`
Vectors	500000 keys, 500000 unique vectors (300 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) CoreNLP Universal Dependencies Converter (Stanford NLP Group) Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia) (Explosion)
License	`MIT`
Author	Explosion
Model size	575 MB

Label Scheme

View label scheme (100 labels for 3 components)

Component	Labels
`tagger`	`AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X`, `_SP`
`parser`	`ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_ACC`	95.85
`TOKEN_P`	94.58
`TOKEN_R`	91.36
`TOKEN_F`	92.94
`TAG_ACC`	90.33
`SENTS_P`	78.05
`SENTS_R`	72.63
`SENTS_F`	75.24
`DEP_UAS`	70.86
`DEP_LAS`	65.71
`ENTS_P`	73.55
`ENTS_R`	69.25
`ENTS_F`	71.34

Installation

pip install spacy
python -m spacy download zh_core_web_lg

Assets 4

01 Oct 08:58

explosion-bot

xx_sent_ud_sm-3.7.0

dbe9b97

xx_sent_ud_sm-3.7.0

Checksum .tar.gz: fc769f274ad087e1ee3042d671a5487714a885d2a0fba5baea56cd5a6b23cc8d
Checksum .whl: aafb609d5a895a62ed9672fbef2aa8061106a4b164a700999a376f8529acc3ad

Details: https://spacy.io/models/xx#xx_sent_ud_sm

Multi-language pipeline optimized for CPU. Components: senter.

Feature	Description
Name	`xx_sent_ud_sm`
Version	`3.7.0`
spaCy	`>=3.7.0,<3.8.0`
Default Pipeline	`senter`
Components	`senter`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	Universal Dependencies v2.8 (UD_Afrikaans-AfriBooms, UD_Croatian-SET, UD_Czech-CAC, UD_Czech-CLTT, UD_Danish-DDT, UD_Dutch-Alpino, UD_Dutch-LassySmall, UD_English-EWT, UD_Finnish-FTB, UD_Finnish-TDT, UD_French-GSD, UD_French-Spoken, UD_German-GSD, UD_Indonesian-GSD, UD_Irish-IDT, UD_Italian-TWITTIRO, UD_Korean-GSD, UD_Korean-Kaist, UD_Latvian-LVTB, UD_Lithuanian-ALKSNIS, UD_Lithuanian-HSE, UD_Marathi-UFAL, UD_Norwegian-Bokmaal, UD_Norwegian-Nynorsk, UD_Norwegian-NynorskLIA, UD_Persian-Seraji, UD_Portuguese-Bosque, UD_Portuguese-GSD, UD_Romanian-Nonstandard, UD_Romanian-RRT, UD_Russian-GSD, UD_Russian-Taiga, UD_Serbian-SET, UD_Slovak-SNK, UD_Spanish-GSD, UD_Swedish-Talbanken, UD_Telugu-MTG, UD_Vietnamese-VTB) (Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell; et al.)
License	`CC BY-SA 3.0`
Author	Explosion
Model size	4 MB

Label Scheme

Accuracy

Type	Score
`TOKEN_ACC`	98.59
`TOKEN_P`	95.31
`TOKEN_R`	95.72
`TOKEN_F`	95.52
`SENTS_P`	90.66
`SENTS_R`	81.58
`SENTS_F`	85.88

Installation

pip install spacy
python -m spacy download xx_sent_ud_sm

Assets 4

01 Oct 08:58

explosion-bot

xx_ent_wiki_sm-3.7.0

dbe9b97

xx_ent_wiki_sm-3.7.0

Checksum .tar.gz: 96e9c622429d34c08127aca1689fb5c5c557bbd3027c4a5a655874dd915206cc
Checksum .whl: 66c227a793f8a79814d6ca1da7c0ae633172e2fb0a94737bc8bd2e517479e73c

Details: https://spacy.io/models/xx#xx_ent_wiki_sm

Multi-language pipeline optimized for CPU. Components: ner.

Feature	Description
Name	`xx_ent_wiki_sm`
Version	`3.7.0`
spaCy	`>=3.7.0,<3.8.0`
Default Pipeline	`ner`
Components	`ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	WikiNER (Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, James R Curran)
License	`MIT`
Author	Explosion
Model size	10 MB

Label Scheme

View label scheme (4 labels for 1 components)

Component	Labels
`ner`	`LOC`, `MISC`, `ORG`, `PER`

Accuracy

Type	Score
`ENTS_P`	83.53
`ENTS_R`	82.65
`ENTS_F`	83.08

Installation

pip install spacy
python -m spacy download xx_ent_wiki_sm

Assets 4

Releases: explosion/spacy-models

en_core_web_trf-3.7.3

Details: https://spacy.io/models/en#en_core_web_trf

Label Scheme

Accuracy

Installation

en_core_web_sm-3.7.1

Details: https://spacy.io/models/en#en_core_web_sm

Label Scheme

Accuracy

Installation

en_core_web_md-3.7.1

Details: https://spacy.io/models/en#en_core_web_md

Label Scheme

Accuracy

Installation

en_core_web_lg-3.7.1

Details: https://spacy.io/models/en#en_core_web_lg

Label Scheme

Accuracy

Installation

zh_core_web_trf-3.7.2

Details: https://spacy.io/models/zh#zh_core_web_trf

Label Scheme

Accuracy

Installation

zh_core_web_sm-3.7.0

Details: https://spacy.io/models/zh#zh_core_web_sm

Label Scheme

Accuracy

Installation

zh_core_web_md-3.7.0

Details: https://spacy.io/models/zh#zh_core_web_md

Label Scheme

Accuracy

Installation

zh_core_web_lg-3.7.0

Details: https://spacy.io/models/zh#zh_core_web_lg

Label Scheme

Accuracy

Installation

xx_sent_ud_sm-3.7.0

Details: https://spacy.io/models/xx#xx_sent_ud_sm

Label Scheme

Accuracy

Installation

xx_ent_wiki_sm-3.7.0

Details: https://spacy.io/models/xx#xx_ent_wiki_sm

Label Scheme

Accuracy

Installation