-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
word or sentence embedding from BERT model #1950
Comments
You can use |
Found it, thanks @bkkaggle . Just for others who are looking for the same information. Using Pytorch:
Using Tensorflow:
|
This is a bit different for Example: inputs = {
"input_ids": batch[0],
"attention_mask": batch[1]
}
output = bertmodel(**inputs)
logits = output[0]
hidden_states = output[1] |
By using this code, you can obtain a PyTorch tensor of (1, N, 768) shape, where N is the number of different tokens extracted from
|
I am interested in the last hidden states which are seen as kind of embeddings. I think you are referring to all hidden states including the output of the embedding layer.
|
You can take an average of them. However, I think the embeddings at first position [CLS] are considered a kind of sentence vector because only those are fed to a further classifier if any for downstream tasks. Disclaimer: I am not sure about it. |
Should be as simple as grabbing the last element in the list: last_layer = hidden_states[-1] |
@maxzzze According to the documentation, one can get the last hidden states directly without setting this flag to True. See below.
BTW, for me, the shape of hidden_states in the below code is
|
I believe your comment is in reference to the standard models, but its hard to tell without a link. Can you link where to where in the documentation the pasted doc string is from? I dont know if you saw my original comment but I was providing an example for how to get |
Sorry, I missed that part :) I am referring to the standard BERTMODEL. Doc link:
|
@TheEdoardo93 is this example taking the first element in each of the |
@engrsfi You can process the hidden states of BERT (all layers or only the last layer) in whatever way you want. Most people usually only take the hidden states of the [CLS] token of the last layer - using the hidden states for all tokens or from multiple layers doesn't usually help you that much. If you want to get the embeddings for classification, just do something like:
|
Do you have any reference as to "people usually only take the hidden states of the [CLS] token of the last layer"? |
There is some clarification about the use of the last hidden states in the BERT Paper. From the paper:
Reference: |
What about ALBERT? The output of the last hidden state isn't the same of the embedding because in the doc they say that the embedding have a size of 128 for every model (https://arxiv.org/pdf/1909.11942.pdf page 6). |
if batch size is N, how to convert? |
128 is used internally by Albert. The output of the model (last hidden state) is your actual word embeddings. In order to understand this better, you should read the following blog from Google. Quote: This is achieved by factorization of the embedding parametrization — the embedding matrix is split between input-level embeddings with a relatively-low dimension (e.g., 128), while the hidden-layer embeddings use higher dimensionalities (768 as in the BERT case, or more). With this step alone, ALBERT achieves an 80% reduction in the parameters of the projection block, at the expense of only a minor drop in performance — 80.3 SQuAD2.0 score, down from 80.4; or 67.9 on RACE, down from 68.2 — with all other conditions the same as for BERT." |
If I understand you correctly, you are asking for how to get the last hidden states for all entries in a batch of size N. If that's the case, then here is the explanation. Your model expect input of the following shape:
and returns last hidden states of the following shape:
You can just go through the last hidden states to get the individual last hidden state for each input in the batch size of N. Reference: |
@engrsfi : What if I want to use bert embedding vector of each token as an input to an LSTM network? Can I get the embedding of each token of the sentence from the last hidden layer of the bert model? In this case I think I can't just use the embedding for [CLS] token as I need word embedding of each token? import tensorflow as tf bert_model = TFBertModel.from_pretrained("bert-base-uncased") |
Hi, could I ask how you would use Spacy to do this? Is there a link? Thanks a lot. |
Here is the link: |
Thank you for sharing the code. It really helped in understanding tokenization in BERT. I ran this and had a minor problem. Shouldn't it be:
|
@sahand91 I have used the sequence output for classification task like sentiment analysis. As the paper mentions that the pooled output is not a good representation of the whole sentence so we use the sequence output and feed it further in a CNN or LSTM. So I don't see any problem in using the sequence output for classification tasks as we get to see the actual vector representation of the word say "bank" in both contexts "commercial" and "location" (bank of a river) |
Yes i think the same. @sumitsidana Anyone confirming that embeddings_of_last_layer[0][0] is the embedding related to CLS token for each sequence? |
Yes it is. but it is only for first batch. you will have to loop through all the batches and get the first element (CLS) for each sentence. |
Yes gotcha. Thanks |
logtis = output[0] means the word embedding. So, does hidden_states = output[1] means the sentence level embedding ? |
outputs[0] is sentence embedding for "Hello, my dog is cute" right? |
If I want to encode a list of strings, |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is great, i am interested in how to get word vectors for out of vocabulary (OOV) tokens. Any references would help. thanks . |
@engrsfi
It stops with errors on model = TFBertModel.from_pretrained('bert-base-uncased'): model = TFBertModel.from_pretrained('bert-base-uncased') |
How can I extract embeddings for a sentence or a set of words directly from pre-trained models (Standard BERT)? For example, I am using Spacy for this purpose at the moment where I can do it as follows:
sentence vector:
sentence_vector = bert_model("This is an apple").vector
word_vectors:
I am wondering if this is possible directly with huggingface pre-trained models (especially BERT).
The text was updated successfully, but these errors were encountered: