-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does the output of feature-extraction pipeline represent? #4613
Comments
They are embeddings generated from the model. (Bert -Base Model I guess. cause it has a hidden representation of 768 dim). You get 9 elements:- one contextual embedding for each word in your sequence. These values of embeddings represent some hidden features that are not easy to interpret. |
So the pipeline will just return the last layer encoding of Bert?
Also, does BERT encoding have similar traits as word2vec? e.g. similar word will be closer, France - Paris = England - London , etc? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi @orko19, |
@merleyc I do not! Please share if you do :) |
The outputs between "last_hidden_state" and "feature-extraction pipeline" are same, you can try by yourself "feature-extraction pipeline" just helps us do some jobs from tokenize words to embedding |
I am using the feature-extraction pipeline:
As an output I get a list with one element - that is a list with 9 elements - that is a list of 768 features (floats).
What is the output represent? What is every element of the lists, and what is the meaning of the 768 float values?
Thanks
The text was updated successfully, but these errors were encountered: