how to get word embedding vector in GPT-2 #1458

weiguowilliam · 2019-10-08T15:55:00Z

❓ Questions & Help

How can we get the word embedding vector in gpt-2? I follow the guidance in bert(model.embeddings.word_embeddings.weight). But it shows that ''GPT2LMHeadModel' object has no attribute 'embeddings''.

Please help me with that. Thank you in advance.

LysandreJik · 2019-10-08T16:22:42Z

Hi, indeed GPT-2 has a slightly different implementation than BERT. In order to have access to the embeddings, you would have to do the following:

from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')  # or any other checkpoint
word_embeddings = model.transformer.wte.weight  # Word Token Embeddings 
position_embeddings = model.transformer.wpe.weight  # Word Position Embeddings

weiguowilliam · 2019-10-08T16:28:42Z

Hi, indeed GPT-2 has a slightly different implementation than BERT. In order to have access to the embeddings, you would have to do the following:
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')  # or any other checkpoint
word_embeddings = model.transformer.wte.weight  # Word Token Embeddings 
position_embeddings = model.transformer.wpe.weight  # Word Position Embeddings 

Hi,

Thank you for your reply! So if I want to get the vector for 'man', it would be like this:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text_index = tokenizer.encode('man',add_prefix_space=True)
vector = model.transformer.wte.weight[text_index,:]

Is it correct?

fqassemi · 2020-02-29T20:37:59Z

Just wondering, how to transform word_vector to word? Imagine a word vector and change a few elements, how can I find closest word from gpt2 model?

weiguowilliam · 2020-03-02T16:01:48Z

Just wondering, how to transform word_vector to word? Imagine a word vector and change a few elements, how can I find closest word from gpt2 model?

So for each token in dictionary there is a static embedding(on layer 0). You can use cosine similarity to find the closet static embedding to the transformed vector. That should help you find the word.

fqassemi · 2020-03-02T16:51:39Z

Just wondering, how to transform word_vector to word? Imagine a word vector and change a few elements, how can I find closest word from gpt2 model?

So for each token in dictionary there is a static embedding(on layer 0). You can use cosine similarity to find the closet static embedding to the transformed vector. That should help you find the word.

Thanks. It means that for every word_vector I have to calculate vocab_size (~50K) cosine_sim manipulation. Is that right?

weiguowilliam · 2020-03-02T17:03:03Z

Just wondering, how to transform word_vector to word? Imagine a word vector and change a few elements, how can I find closest word from gpt2 model?

So for each token in dictionary there is a static embedding(on layer 0). You can use cosine similarity to find the closet static embedding to the transformed vector. That should help you find the word.

Thanks. It means that for every word_vector I have to calculate vocab_size (~50K) cosine_sim manipulation. Is that right?

I guess so. Unless you can use some property to first tighten the range.

fqassemi · 2020-03-02T17:12:11Z

Just wondering, how to transform word_vector to word? Imagine a word vector and change a few elements, how can I find closest word from gpt2 model?

So for each token in dictionary there is a static embedding(on layer 0). You can use cosine similarity to find the closet static embedding to the transformed vector. That should help you find the word.

Thanks. It means that for every word_vector I have to calculate vocab_size (~50K) cosine_sim manipulation. Is that right?

I guess so. Unless you can use some property to first tighten the range.

Ok. Three more questions, 1) is there any resource on how to generate fixed length sentence (a sentence with N words that ends with "." or "!" )? 2) what is the most effective underlying parameter for hyper-parameter tuning (eg. Temperature)? 3) Is there any slack channel to discuss these types of questions?

AminTaheri23 · 2020-03-16T15:21:00Z

Just wondering, how to transform word_vector to word? Imagine a word vector and change a few elements, how can I find closest word from gpt2 model?

So for each token in dictionary there is a static embedding(on layer 0). You can use cosine similarity to find the closet static embedding to the transformed vector. That should help you find the word.

Thanks. It means that for every word_vector I have to calculate vocab_size (~50K) cosine_sim manipulation. Is that right?

I guess so. Unless you can use some property to first tighten the range.

Ok. Three more questions, 1) is there any resource on how to generate fixed length sentence (a sentence with N words that ends with "." or "!" )? 2) what is the most effective underlying parameter for hyper-parameter tuning (eg. Temperature)? 3) Is there any slack channel to discuss these types of questions?

about 1) I don't think that there is any. You can use Web Scraping for such specified sentences. Also, you can download a corpus and use Regex to extract desired sentences.

I don't really know
If you find any, please share it with me too. Thanks! 😄

AminTaheri23 · 2020-03-16T15:23:06Z

Hi, indeed GPT-2 has a slightly different implementation than BERT. In order to have access to the embeddings, you would have to do the following:
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')  # or any other checkpoint
word_embeddings = model.transformer.wte.weight  # Word Token Embeddings 
position_embeddings = model.transformer.wpe.weight  # Word Position Embeddings 
Hi,

Thank you for your reply! So if I want to get the vector for 'man', it would be like this:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text_index = tokenizer.encode('man',add_prefix_space=True)
vector = model.transformer.wte.weight[text_index,:]

Is it correct?

Did you succeed? I'm pursuing the same goal and I don't know how to validate my findings. I have tested some king - man + woman stuff, but it didn't work.

realsama · 2020-09-05T21:12:42Z

Hi, indeed GPT-2 has a slightly different implementation than BERT. In order to have access to the embeddings, you would have to do the following:
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')  # or any other checkpoint
word_embeddings = model.transformer.wte.weight  # Word Token Embeddings 
position_embeddings = model.transformer.wpe.weight  # Word Position Embeddings 
Hi,
Thank you for your reply! So if I want to get the vector for 'man', it would be like this:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text_index = tokenizer.encode('man',add_prefix_space=True)
vector = model.transformer.wte.weight[text_index,:]

Is it correct?
Did you succeed? I'm pursuing the same goal and I don't know how to validate my findings. I have tested some king - man + woman stuff, but it didn't work.

How did it go? I am stuck here too.

benam2 · 2021-03-23T18:32:13Z

Hi, indeed GPT-2 has a slightly different implementation than BERT. In order to have access to the embeddings, you would have to do the following:
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')  # or any other checkpoint
word_embeddings = model.transformer.wte.weight  # Word Token Embeddings 
position_embeddings = model.transformer.wpe.weight  # Word Position Embeddings 
Hi,

Thank you for your reply! So if I want to get the vector for 'man', it would be like this:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text_index = tokenizer.encode('man',add_prefix_space=True)
vector = model.transformer.wte.weight[text_index,:]

Is it correct?

How did it go?

fqassemi · 2021-03-24T20:11:51Z

Hi, indeed GPT-2 has a slightly different implementation than BERT. In order to have access to the embeddings, you would have to do the following:
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')  # or any other checkpoint
word_embeddings = model.transformer.wte.weight  # Word Token Embeddings 
position_embeddings = model.transformer.wpe.weight  # Word Position Embeddings 
Hi,
Thank you for your reply! So if I want to get the vector for 'man', it would be like this:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text_index = tokenizer.encode('man',add_prefix_space=True)
vector = model.transformer.wte.weight[text_index,:]

Is it correct?
How did it go?

Well, it is working. However, these weights/embeddings are "context-dependent" so one should not expect "king-queen+woman" lead to anything.

geajack · 2023-05-26T11:31:32Z

The code already posted here is correct:

model.transformer.wte.weight[input_ids,:]

where input_ids is a tensor of shape (batch_size, sequence_length). This will give you a tensor of shape (batch_size, sequence_length, embedding_dimension). For example, you can do this with the output of the tokenizer:

inputs = tokenizer(["Hello, my name"], return_tensors="pt")
embeds = model.transformer.wte.weight[input_ids, :]

You can validate that this is correct by passing the embeds into the model and checking that you get the same thing as when passing in the inputs:

outputs1 = model(input_ids=inputs.input_ids)
outputs2 = model(inputs_embeds=embeds)
assert torch.allclose(outputs1.logits, outputs2.logits)

or even

for layer1, layer2 in zip(outputs1.hidden_states, outputs2.hidden_states):
    assert torch.allclose(layer1, layer2)

ish3lan · 2023-12-19T12:34:29Z

it's a bit late, but might help someone, despite not being static, contextual embeddings still gave me reasonable results here

model_id = "gpt2-large"
model = GPT2LMHeadModel.from_pretrained(model_id, output_attentions=True).to(device)
model.eval()
tokenizer = GPT2TokenizerFast.from_pretrained(model_id)
def get_word_embedding(word, model, tokenizer):
    # Encode the word to get token IDs
    token_ids = tokenizer.encode(word, add_special_tokens=False)
    
    # Convert token IDs to tensor and move it to the model's device
    tokens_tensor = torch.tensor([token_ids], device=model.device)
    
    with torch.no_grad():
        # Forward pass through the model
        outputs = model(tokens_tensor)
        # Retrieve the hidden states from the model output
        hidden_states = outputs[0]  # 'outputs' is a tuple, the first element is the hidden states

    # Averaging over the sequence length
    return hidden_states[0].mean(dim=0)

king_emb = get_word_embedding('King', model, tokenizer)
man_emb = get_word_embedding('Man', model, tokenizer)
woman_emb = get_word_embedding('Woman', model, tokenizer)
queen_emb = get_word_embedding('Queen', model, tokenizer)

# print all the embeddings
print("king embedding: ", king_emb)
print("man embedding:", man_emb)
print("woman embedding: ", woman_emb)
print("queen embedding:", queen_emb)
from torch.nn.functional import cosine_similarity
analogy_emb = king_emb - man_emb + woman_emb
similarity = cosine_similarity(analogy_emb.unsqueeze(0), queen_emb.unsqueeze(0))
print("Cosine similarity: ", similarity.item())

gave me:

king embedding:  tensor([ 2.3706,  4.7613, -0.7195,  ..., -8.0351, -3.0770,  2.2482],
       device='cuda:3')
man embedding: tensor([ 2.8015,  3.5800, -0.1190,  ..., -6.7876, -3.8558,  1.8777],
       device='cuda:3')
woman embedding:  tensor([ 3.0411,  5.3653,  0.3071,  ..., -6.2418, -3.3228,  2.6389],
       device='cuda:3')
queen embedding: tensor([ 2.5185,  5.2505, -0.6024,  ..., -7.1251, -2.5000,  1.6070],
       device='cuda:3')
Cosine similarity:  0.9761547446250916

and regarding @fqassemi 's question:

from torch.nn.functional import cosine_similarity
import torch

from tqdm import tqdm  # Import tqdm

# Iterate over the entire vocabulary
vocab = tokenizer.get_vocab()
top_matches = []
top_similarities = []
def get_word_embedding(word, model, tokenizer):
    if word in embeddings_dict:
        # Return the embedding if already in the dictionary
        return embeddings_dict[word]
    
    # Encode the word to get token IDs
    token_ids = tokenizer.encode(word, add_special_tokens=False)
    
    # Convert token IDs to tensor and move it to the model's device
    tokens_tensor = torch.tensor([token_ids], device=model.device)
    
    with torch.no_grad():
        # Forward pass through the model
        outputs = model(tokens_tensor)
        # Retrieve the hidden states from the model output
        hidden_states = outputs[0]  # 'outputs' is a tuple, the first element is the hidden states
    word_emb = hidden_states[0].mean(dim=0)
    
    # Store the new embedding in the dictionary
    embeddings_dict[word] = word_emb
    return word_emb
    

for word, token_id in tqdm(vocab.items(), desc="Processing vocabulary"):
    word_emb = get_word_embedding(word, model, tokenizer)
    sim = cosine_similarity(analogy_emb.unsqueeze(0), word_emb.unsqueeze(0)).item()
    
    # Keep track of top matches
    if len(top_matches) < 5 or sim > min(top_similarities):
        if len(top_matches) >= 5:
            # Remove the current lowest similarity
            min_index = top_similarities.index(min(top_similarities))
            top_matches.pop(min_index)
            top_similarities.pop(min_index)
        
        top_matches.append(word)
        top_similarities.append(sim)

# Sort the top matches by similarity
sorted_top_matches = sorted(zip(top_matches, top_similarities), key=lambda x: x[1], reverse=True)

print(sorted_top_matches)

gave me reasonable result for the nearest vectors:

Processing vocabulary: 100%|██████████| 50257/50257 [22:23<00:00, 37.41it/s]
[('Woman', 0.9765560626983643), ('Queen', 0.9761547446250916), ('Lady', 0.9727475643157959), ('ishop', 0.9681873917579651), ('!"', 0.9671139717102051)]

kurt-abela · 2023-12-22T10:04:25Z

Thanks for your code @ish3lan . I assume you can extend that to include sentences not just words correct?

On another note, I've been using that first code snippet and trying a bunch of different words to get different cosine similarities but all the similarities ended up being very high (>0.95). Is this normal/expected?

LysandreJik closed this as completed Oct 8, 2019

LysandreJik reopened this Oct 8, 2019

weiguowilliam closed this as completed Oct 9, 2019

ToonTalk mentioned this issue Jan 7, 2020

Consider updating word embeddings using BERT or GPT-2 ecraft2learn/ai#98

Open

paolorechia mentioned this issue Apr 22, 2023

Add embedding API lm-sys/FastChat#235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to get word embedding vector in GPT-2 #1458

how to get word embedding vector in GPT-2 #1458

weiguowilliam commented Oct 8, 2019

LysandreJik commented Oct 8, 2019 •

edited

Loading

weiguowilliam commented Oct 8, 2019

fqassemi commented Feb 29, 2020

weiguowilliam commented Mar 2, 2020

fqassemi commented Mar 2, 2020

weiguowilliam commented Mar 2, 2020

fqassemi commented Mar 2, 2020

AminTaheri23 commented Mar 16, 2020

AminTaheri23 commented Mar 16, 2020

realsama commented Sep 5, 2020

benam2 commented Mar 23, 2021

fqassemi commented Mar 24, 2021

geajack commented May 26, 2023

ish3lan commented Dec 19, 2023

kurt-abela commented Dec 22, 2023 •

edited

Loading

how to get word embedding vector in GPT-2 #1458

how to get word embedding vector in GPT-2 #1458

Comments

weiguowilliam commented Oct 8, 2019

❓ Questions & Help

LysandreJik commented Oct 8, 2019 • edited Loading

weiguowilliam commented Oct 8, 2019

fqassemi commented Feb 29, 2020

weiguowilliam commented Mar 2, 2020

fqassemi commented Mar 2, 2020

weiguowilliam commented Mar 2, 2020

fqassemi commented Mar 2, 2020

AminTaheri23 commented Mar 16, 2020

AminTaheri23 commented Mar 16, 2020

realsama commented Sep 5, 2020

benam2 commented Mar 23, 2021

fqassemi commented Mar 24, 2021

geajack commented May 26, 2023

ish3lan commented Dec 19, 2023

kurt-abela commented Dec 22, 2023 • edited Loading

LysandreJik commented Oct 8, 2019 •

edited

Loading

kurt-abela commented Dec 22, 2023 •

edited

Loading