-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working with the new ._.trf_data
object (3.7+)
#13137
Comments
spaCy 3.7 switched to the Curated Transformers library. The https://spacy.io/api/curatedtransformer#doctransformeroutput The |
Ah, this probably should have been documented better as part of the release. At first glance, the |
Ah, the |
Yeah, they do. |
So to get back to the original question, The data for each token is also a import spacy
nlp = spacy.load("en_core_web_trf")
doc = nlp("DocTransformerOutput.last_hidden_layer_state is a Ragged object")
# for the tensors corresponding to "DocTransformerOutput.last_hidden_layer_state"
# (token index 0), you can access doc._.trf_data.last_hidden_layer_state[0].data
assert doc._.trf_data.last_hidden_layer_state[0].data.shape == (12, 768) |
Great! That answers my question, and that's a very intuitive way to access the tensor by token index. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
tl;dr: how do I access the transformer tensors in 3.7+?
I have a Python package that used the spaCy transformer encodings for each token in a classifier and similarity models.
In the pre-3.7 spaCy models, I could access the tensors using
doc._.trf_data.tensors
(full example). However, after the 3.7 update, this attribute doesn't exist. I'm not sure how to access this attribute now.How to reproduce the behaviour
As expected, there's a
doc._.trf_data
object.The documentation for the transformer assigned attributes says that
doc._.trf_data
is typeTransformerData
. The docs forTransformerData
say that that class has the following attributes:However,
dir(doc._.trf_data)
shows the following attributes (methods omitted for space):which doesn't include the expected
tokens
,model_output
, etc.When I call
type(doc._.trf_data)
, I get<class 'spacy_curated_transformers.models.output.DocTransformerOutput'>
, which doesn't seem to match theTransformerData
type I expected from the docs.Any help you have would be very appreciated, and apologies if I'm just not looking in the right place for the tensors.
Your Environment
The text was updated successfully, but these errors were encountered: