Custom attributes let you add any meta data to Docs, Tokens and Spans. The data can be added once, or it can be computed dynamically.

Custom attributes are available via the dot-underscore property. This makes it clear that they were added by the user, and not built into spaCy, like token dot text.

doc._.title = 'My document'

Attributes need to be registered on the global Doc, Token and Span classes you can import from spacy dot tokens. You can use the set extension method.

Doc.set_extension('title', default=None)

# Setting Extension Attribute

1. Use Token.set_extension to register is_country (default False).
2. Update it for "Spain" and print it for all tokens.

In [6]:
import spacy
from spacy.lang.en import English
from spacy.tokens import Token

In [7]:
nlp = English()

# Register the Token extension attribute 'is_country' with the default value False
Token.set_extension('is_country', default=False)

In [3]:
# Process the text and set the is_country attribute to True for the token "Spain"
doc = nlp("I live in Spain.")
doc[3]._.is_country = True

In [4]:

# Print the token text and the is_country attribute for all tokens
print([(token.text, token._.is_country) for token in doc])

[('I', False), ('live', False), ('in', False), ('Spain', True), ('.', False)]


Use Token.set_extension to register 'reversed' (getter function get_reversed).

In [9]:
def get_reversed(token):
    return token.text[::-1]

# Register the Token property extension 'reversed' with the getter get_reversed
Token.set_extension("reversed", getter=get_reversed)


ValueError: [E090] Extension 'reversed' already exists on Token. To overwrite the existing extension, set `force=True` on `Token.set_extension`.

In [11]:
# Process the text and print the reversed attribute for each token
doc = nlp("All generalizations are false, including this one.")
for token in doc:
    print("reversed:", token._.reversed)

reversed: llA
reversed: snoitazilareneg
reversed: era
reversed: eslaf
reversed: ,
reversed: gnidulcni
reversed: siht
reversed: eno
reversed: .


# complex attributes using getters and method extensions.

Part-I
1. Complete the has_number function .
2. Use Doc.set_extension to register has_number (getter get_has_number) and print its value.

In [1]:
from spacy.lang.en import English
from spacy.tokens import Doc
nlp = English()

In [2]:

# Define the getter function
def get_has_number(doc):
    # Return if any of the tokens in the doc return True for token.like_num
    return any(token.like_num for token in doc)

# Register the Doc property extension 'has_number' with the getter get_has_number
Doc.set_extension('has_number', getter=get_has_number)


In [3]:
# Process the text and check the custom has_number attribute
doc = nlp("The museum closed for five years in 2012.")
print("has_number:", doc._.has_number)

has_number: True


Part-II
1. Use Span.set_extension to register 'to_html' (method to_html).
2. Call it on doc[0:2] with the tag 'strong'.

In [4]:
from spacy.tokens import Span

In [5]:
# Define the method
def to_html(span, tag):
    # Wrap the span text in a HTML tag and return it
    return "<{tag}>{text}</{tag}>".format(tag=tag, text=span.text)


# Register the Span property extension 'to_html' with the method to_html
Span.set_extension('to_html', method=to_html)


In [9]:

# Process the text and call the to_html method on the span with the tag name 'strong'
doc = nlp("Hello world, this is a sentence.")
span = doc[0:2]
print(span._.to_html('strong'))

<strong>Hello world</strong>
