-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding learned image embedding and text embedding in Unet #38
Comments
@xiankgx 🙏 🙏 🙏 thank you for the Q&A i missed this detail! fixed in the latest version :) |
How about the time dependence of null_text_embed? https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1104 |
Suppose we have a sequence of text embeddings: v1 v2 v3 v4 v5 And we need to change the vector of this to some null vectors, say v2 and v4. Then, we have v1 v_null, v3, v_null, v5. But can v_null in position 2 and 4 have different values? |
@xiankgx so that is actually some improvisation on my part, an observation from DALL-E v1 that the text conditioning works better if one does not mask, but just provide padding tokens in the places without https://github.com/lucidrains/DALLE-pytorch/blob/main/dalle_pytorch/dalle_pytorch.py#L372 it has been proven out in many follow-up works, in my mind |
@xiankgx but yes, there is one problem, where i believe the text encoding should always be padded to the maximum text length, so the conditioning is consistent across batches with variable lengthed text encodings let me fix that now 🙏 |
According to the paper Section 2.1 Decoder, it says
We enable classifier-free guidance by randomly setting CLIP embeddings to zero (or a learned embedding) 10% of the time, and randomly dropping the text caption 50% of he time during training.
It seems that we are replacing the embeddings after turning them to condition sequences.
https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1216-L1222
https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1229-L1234
And from the following it seems that that null text embeddings can vary according to their sequence position. For image embeddings, I feel it is fine, but what about for text encodings?
https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1104
Also, it seems perhaps it is needed to have a separate a separate cond_drop_prob one for image embedding and one for text encodings.
If we do that, how do we modify forward_with_cond_scale()?
https://github.com/lucidrains/DALLE2-pytorch/blob/main/dalle2_pytorch/dalle2_pytorch.py#L1166-L1178
The text was updated successfully, but these errors were encountered: