Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plug sense2vec it into your spaCy pipeline #141

Open
myeghaneh opened this issue Jul 9, 2021 · 2 comments
Open

plug sense2vec it into your spaCy pipeline #141

myeghaneh opened this issue Jul 9, 2021 · 2 comments

Comments

@myeghaneh
Copy link

I want to add my own sense2vec to my own spacy model, as you wrote in documentation,

I add that to my current pipeline

[initialize.components]

[initialize.components.sense2vec]
data_path = "/path/to/s2v"

then

nlp = spacy.load("../data/ModelV05b/model-best")
nlp.add_pipe("sense2vec")
s2v.from_disk("../data/S2VFasttextV04")

it does not work , since it says that

[E090] Extension '_s2v' already exists on Doc. To overwrite the existing extension, set `force=True` on `Doc.set_extension`.

since sense2vec is`in nlp.component_names

['tok2vec',
 'tagger',
 'parser',
 'ner',
 'attribute_ruler',
 'lemmatizer',
 'sense2vec']

then I changed to my model

nlp = spacy.load("../data/ModelV05b/model-best")

still it does not work and it says

doc = nlp2("The testimony of the ages confirms that the motions of the planets are orbicular.")
assert doc[1:2].text == "testimony"
freq = doc[1:2]._.s2v_freq
vector = doc[1:2]._.s2v_vec
most_similar = doc[1:2]._.s2v_most_similar(3)

and it says that

AttributeError: 'NoneType' object has no attribute 'get_freq'

@Hendler
Copy link

Hendler commented Dec 25, 2021

similar issue here

@marknsikora
Copy link

I've located the source of the issue. Here is the smallest case I can make that demonstrates it.

import spacy

s2v_path = "../s2v_old"

nlp1 = spacy.load("en_core_web_sm")
s2v = nlp1.add_pipe("sense2vec")
s2v.from_disk(s2v_path)

nlp2 = spacy.load("en_core_web_sm")
s2v = nlp2.add_pipe("sense2vec")
s2v.from_disk(s2v_path)

# Uncomment to make pass
# s2v.first_run = False

nlp1("hello world")
nlp2("hello world")

The error gets thrown when evaluating nlp2 in the init_component call. This call tries to add all the extensions to the Doc object for the convenience s2v functions. The call succeeds if only a single pipeline is created, but the second pipeline tries to add the same extensions and fails. This can be worked around by hacking the first run internal variable on the second instance of the sense2vec component. But this is extremely hacky.

The "correct" solution here is probably to stop trying to be smart about adding the extension functions, and just always add them when the sense2vec library is available. In the case that the sense2vec is not part of the current pipeline, the ._s2v variable will be null and all the calls to the extension functions will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants