I've been working with [`spacy`](https://spacy.io/) more and more over the years, and I thought it'd be a good idea to write about the configuration system. There are mentions of it throughout the [docs](https://spacy.io/usage/training#config) and in some of the `spacy` 3.0 [videos](https://youtu.be/BWhh3r6W-qE), but I have yet to find a super detailed breakdown of what's going on (except maybe this [blog](https://explosion.ai/blog/spacy-v3-project-config-systems#spacy-config-system)). Hopefully this post will shed some light.

Let's start with a brief demo of `spacy`.

> Install spacy and the `en_core_web_sm` model if you want to follow along:
> ```shell
$ pip install spacy
$ python -m spacy download en_core_web_sm
```

In [3]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Hi, my name is Ian and this is my blog.")
print(doc)

Hi, my name is Ian and this is my blog.


Nothing fancy on the surface, but this [`doc`](https://spacy.io/api/doc) object that we've created is the product of sending our string of characters through a [pipeline of models](https://spacy.io/usage/processing-pipelines), or as `spacy` likes to call them, [components](https://spacy.io/usage/processing-pipelines#pipelines). We can view the pipeline components via the [`nlp.pipeline` property](https://spacy.io/api/language#attributes).

In [4]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x1cc3df79970>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x1cc3df7aed0>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x1cc3d103bc0>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x1cc3d417450>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x1cc3df4bf50>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x1cc3d103b50>)]

And we can get more component information with [nlp.analyze_pipes](https://spacy.io/api/language#analyze_pipes) such as what each assigns, their requirements, their scoring metrics, and whether they retokenize.

In [8]:
nlp.analyze_pipes(pretty=True);  # note the semicolon (;) to reduce output after the table.

[1m

#   Component         Assigns               Requires   Scores             Retokenizes
-   ---------------   -------------------   --------   ----------------   -----------
0   tok2vec           doc.tensor                                          False      
                                                                                     
1   tagger            token.tag                        tag_acc            False      
                                                                                     
2   parser            token.dep                        dep_uas            False      
                      token.head                       dep_las                       
                      token.is_sent_start              dep_las_per_type              
                      doc.sents                        sents_p                       
                                                       sents_r                       
                                                

I use the [`ner` component](https://spacy.io/api/entityrecognizer) quite a bit—let's take a closer look at what it [assigned to the `doc`](https://spacy.io/api/entityrecognizer#assigned-attributes).

In [12]:
doc.ents

(Ian,)

It identified "Ian"—that's me!—as an entity. We can find out more by looking at the individual [`token`](https://spacy.io/api/token) values and what the `ner` component assigned to those.

In [14]:
# `Doc` objects are made of tokens; we can iterate over them and access their attributes.
[(token, token.ent_iob, token.ent_type) for token in doc]

[(Hi, 2, 0),
 (,, 2, 0),
 (my, 2, 0),
 (name, 2, 0),
 (is, 2, 0),
 (Ian, 3, 380),
 (and, 2, 0),
 (this, 2, 0),
 (is, 2, 0),
 (my, 2, 0),
 (blog, 2, 0),
 (., 2, 0)]

Per the [docs](https://spacy.io/api/token#attributes):
| Name | Description |
|:-|:-|
|`ent_iob`|IOB code of named entity tag. `3` means the token begins an entity, `2` means it is outside an entity, `1` means it is inside an entity, and `0` means no entity tag is set.<br><br>Type: `int`|
|`ent_type`|Named entity type.<br><br>Type: `int`|

All of the tokens have an `ent_iob` of `2` except "Ian" which has a `3`. Kind of helpful, but what does an `ent_type` of `380` mean? If we add an underscore (`_`) to the end we can see the "human" readable meaning.

In [22]:
# "Ian" is the 5th token (zero indexed).
doc[5].ent_type_

'PERSON'

That's right! I am a `PERSON`.

In [23]:
ner = nlp.get_pipe("ner")

In [25]:
tok2vec = nlp.get_pipe("tok2vec")

In [27]:
ner.tok2vec

<thinc.model.Model at 0x1cc42f6e740>

In [29]:
tok2vec.listening_components

['tagger', 'parser']

In [26]:
tok2vec.listener_map

{'tagger': [<spacy.pipeline.tok2vec.Tok2VecListener at 0x1cc3d3c4a50>],
 'parser': [<spacy.pipeline.tok2vec.Tok2VecListener at 0x1cc3ddf39d0>]}