You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Lucid,
i am working on quantizing CLIP image embeddings with your RQ-VAE. It works pretty well.
Next I want to take all learned codebook vectors and add them to the vocab of a GPT (as frozen token embeddings).
The idea is to train a GPT with CLIP image embeddings in between texts, e.g. IMAGE-CAPTION or TEXT-IMAGE-TEXT-IMAGE- ... Flamingo-style).
If this works, then GPT could maybe also learn to generate quantized CLIP IM embeddings token by token --> and then e.g. show images through a.) retrieval or b.) a DALLE 2 decoder :)
... So my question is: Once the RQ-VAE is trained and i can get the quantized reconstructions and indices - How can I get a list or tensor of the actual codebook? (all possible vectors from the rq-vocab) :)
The text was updated successfully, but these errors were encountered:
+1 I can reverse engineer the forward function, but it'd be nice if there was an easy function call I'm missing
Edit: ended up reverse engineering it anyways :-) You can do codes from indices like: quantizer.layers[i]._codebook.embed[0, tokens_ids[:, i]] for each layer i in the residual vector quantizer. As a bonus, you can reconstruct the input (image / audio / etc.) by doing:
decoded_vector = 0.0
for i, layer in enumerate(quantizer.layers):
vector = vector + layer._codebook.embed[0, tokens[:, i]]
Hi Lucid,
i am working on quantizing CLIP image embeddings with your RQ-VAE. It works pretty well.
Next I want to take all learned codebook vectors and add them to the vocab of a GPT (as frozen token embeddings).
The idea is to train a GPT with CLIP image embeddings in between texts, e.g. IMAGE-CAPTION or TEXT-IMAGE-TEXT-IMAGE- ... Flamingo-style).
If this works, then GPT could maybe also learn to generate quantized CLIP IM embeddings token by token --> and then e.g. show images through a.) retrieval or b.) a DALLE 2 decoder :)
... So my question is: Once the RQ-VAE is trained and i can get the quantized reconstructions and indices - How can I get a list or tensor of the actual codebook? (all possible vectors from the rq-vocab) :)
The text was updated successfully, but these errors were encountered: