[Core] refactor `embeddings` #8722

sayakpaul · 2024-06-27T11:23:29Z

What does this PR do?

Extension of #7995.

Some comments are in line.

Todos

Add licensing

sayakpaul · 2024-06-27T11:24:25Z

src/diffusers/models/embeddings/__init__.py

@@ -0,0 +1,34 @@
+from .combined import (


This way nothing should break in terms of imports.

src/diffusers/models/embeddings/combined.py

src/diffusers/models/embeddings/image_text.py

sayakpaul · 2024-06-27T11:27:27Z

src/diffusers/models/embeddings/image_text.py

+        return (latent + pos_embed).to(latent.dtype)
+
+
+class ImagePositionalEmbeddings(nn.Module):


Even though it says ImagePositionalEmbeddings it really isn't about positions. See VQDiffusion for details.

HuggingFaceDocBuilderDev · 2024-06-27T11:40:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/diffusers/models/embeddings/combined.py

Co-authored-by: Yiyi Xu <yixu310@gmail.com>

yiyixuxu · 2024-07-02T17:40:23Z

src/diffusers/models/embeddings/others.py

+        return embeddings
+
+
+class HunyuanDiTAttentionPool(nn.Module):


this and the other attention pool are both used by some other embedding((assume it's one of the combined ones), we can move it to where it is used

also, I think they are both "text" so also ok to move to text_image

Thanks!

The other AttentionPool class is used for TextTimeEmbedding (which, in theory, is also a kind of combined embedding class, IMO).

However HunyuanDiTAttentionPool is used in HunyuanCombinedTimestepTextSizeStyleEmbedding that combines timesteps, text embeddings, and additional things. So, it's clearly not just text. So, IMO, it's better to keep it in combined.py.

WDYT?

However HunyuanDiTAttentionPool is used in HunyuanCombinedTimestepTextSizeStyleEmbedding that combines timesteps, text embeddings, and additional things. So, it's clearly not just text. So, IMO, it's better to keep it in combined.py

HunyuanCombinedTimestepTextSizeStyleEmbedding take a combination if inputs but HunyuanDiTAttentionPool is only used to project the text inputs - but I'm fine to put it in combined.py since it's probably won't be used on its own. Same with the other AttentionPool for TextTimeEmbedding

Sorry about the iteration here. TextTimeEmbedding is really just about projecting hidden_states. So, no combination. And also, after taking into consideration what said above, I thought it would make sense to keep HunyuanCombinedTimestepTextSizeStyleEmbedding in image_text, actually. So, your original suggestion.

I hope it makes sense now.

So, no combination. And also, after taking into consideration what said above, I thought it would make sense to keep HunyuanCombinedTimestepTextSizeStyleEmbedding in image_text, actually. So, your original suggestion.

sorry when did I suggest to put in image_text? it it clearly combined, no?

HunyuanCombinedTimestepTextSizeStyleEmbedding take a combination if inputs but HunyuanDiTAttentionPool is only used to project the text inputs - but I'm fine to put it in combined.py since it's probably won't be used on its own. Same with the other AttentionPool for TextTimeEmbedding

I thought this meant putting it in "image_text" but you were okay if put in combined as well. Sorry, if I misunderstood it.

yiyixuxu · 2024-07-03T03:08:04Z

src/diffusers/models/embeddings/combined.py

+import torch.nn.functional as F
+
+
+class LabelEmbedding(nn.Module):


why is this one here now?

Because it's only used by a single class below. I followed the philosophy behind placing the attention pooling layers.

got it! let's put it into others and put the attention pooling layers into text_image

I thought these attention pooling layers have been here for a long time and no one else has used them, so it is ok to just put them next to the class that uses it (same for LabelEmbedding), but if we want to come up with one rule that applies to all the same situation, I think it is better to always put them under the respective files so that it is more likely to be reused.

DN6 · 2024-07-03T12:29:40Z

src/diffusers/models/embeddings/image_text.py

+        return emb
+
+
+class TextImageProjection(nn.Module):


This feels like it should be in combined no?

DN6 · 2024-07-03T12:31:33Z

src/diffusers/models/embeddings/image_text.py

+        return x.squeeze(0)
+
+
+class HunyuanCombinedTimestepTextSizeStyleEmbedding(nn.Module):


Perhaps this is better suited to combined?

DN6 · 2024-07-03T12:32:40Z

src/diffusers/models/embeddings/image_text.py

+        return self.norm(x)
+
+
+class PixArtAlphaTextProjection(nn.Module):


Didn't we want to rename this to a generic name because it's used in 3 models?

Wanted us to agree on the separation of classes first. Renaming, etc. can be dealt with later.

sayakpaul · 2024-07-03T12:33:50Z

@DN6 this is a point to be worked out. What criterion to follow for placing a class in combined and what to follow for image_text. At the moment, embedding classes dealing with images and texts are placed in image_text. Combined is usually for classes that take either timesteps or other forms of conditioning along with {image,text} into account.

DN6 · 2024-07-03T12:34:41Z

src/diffusers/models/embeddings/__init__.py

+    LabelEmbedding,
+    PixArtAlphaCombinedTimestepSizeEmbeddings,
+)
+from .image_text import (


Can't we split these into image and text?

sayakpaul · 2024-07-03T12:37:52Z

Can't we split these into image and text?

I was trying to follow this:

#7995 (comment)

IMO, it's perhaps better to have all the embedding classes in combined.py that deal with timestep or mix modalities (image and text, for example) regardless of whether they have Combined in their class names or not. And then split image_text into image and text separately w.r.t to their types. So, if a class is ONLY dealing with text, we put it in embeddings/text.py.

WDYT @DN6 @yiyixuxu ?

Sorry about the back and forth here.

DN6 · 2024-07-03T12:56:14Z

IMO, it's perhaps better to have all the embedding classes in combined.py that deal with timestep or mix modalities (image and text, for example) regardless of whether they have Combined in their class names or not. And then split image_text into image and text separately w.r.t to their types. So, if a class is ONLY dealing with text, we put it in embeddings/text.py.

This makes sense to me.

sayakpaul · 2024-07-03T12:57:37Z

Alright. Will wait for @yiyixuxu to comment as well before making changes.

yiyixuxu · 2024-07-03T20:22:25Z

combined does not need "combined" in names, of course

regardless of whether they have Combined in their class names or not.

for this, I am not sure what you mean. Do you want to put all the timestep embeddings into combined.py? timestep embeddings should have their own file, but it is a "combined" embedding that takes various inputs, including timestep, of course, it should definitely go into combined

it's perhaps better to have all the embedding classes in combined.py that deal with timestep or mix modalities

ok with this

And then split image_text into image and text separately w.r.t to their types. So, if a class is ONLY dealing with text, we put it in embeddings/text.py.

and in addition, I made a comment here #8722 (comment) - we can put these attention pool layers into text

github-actions · 2024-09-14T15:08:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

refactor embeddings

abf67a8

sayakpaul mentioned this pull request Jun 27, 2024

[Core] refactor embeddings.py so that it becomes a nice module. #7995

Closed

sayakpaul commented Jun 27, 2024

View reviewed changes

src/diffusers/models/embeddings/combined.py Outdated Show resolved Hide resolved

sayakpaul commented Jun 27, 2024

View reviewed changes

src/diffusers/models/embeddings/image_text.py Outdated Show resolved Hide resolved

sayakpaul commented Jun 27, 2024

View reviewed changes

sayakpaul added 2 commits June 27, 2024 17:02

fix

173e7b0

fix morer

2a1dff8

sayakpaul requested review from DN6 and yiyixuxu June 27, 2024 11:43

sayakpaul marked this pull request as ready for review June 27, 2024 11:43

Merge branch 'main' into embeddings-refactor

aee0d07

DN6 reviewed Jul 1, 2024

View reviewed changes

src/diffusers/models/embeddings/combined.py Outdated Show resolved Hide resolved

sayakpaul added 2 commits July 1, 2024 20:21

resolve conflicts

cb5d919

Merge branch 'main' into embeddings-refactor

776bc54

yiyixuxu mentioned this pull request Jul 2, 2024

[hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding #8761

Merged

sayakpaul and others added 6 commits July 2, 2024 13:21

reflect changes from HunyuanCombinedTimestepTextSizeStyleEmbedding

acd8461

Co-authored-by: Yiyi Xu <yixu310@gmail.com>

resolve conflicts

3d42fc7

fix

f888685

patch embedding to position

28478a6

fix

b8876b6

style

50932aa

yiyixuxu reviewed Jul 2, 2024

View reviewed changes

sayakpaul added 2 commits July 3, 2024 08:10

Merge branch 'main' into embeddings-refactor

5f9a995

up

733609a

sayakpaul requested review from DN6 and yiyixuxu July 3, 2024 02:54

yiyixuxu reviewed Jul 3, 2024

View reviewed changes

move hunyuan stuff to image_text

f94e3fc

DN6 reviewed Jul 3, 2024

View reviewed changes

Merge branch 'main' into embeddings-refactor

dc3ef41

sayakpaul added 7 commits July 4, 2024 07:48

Merge branch 'main' into embeddings-refactor

836be38

move labelembedding to others

91b75c6

move more to combibed.

498ec77

up

792deca

up

19c4c8e

fix import

48d7a28

changes from #8764

a61d541

sayakpaul requested review from DN6 and yiyixuxu July 4, 2024 03:34

Merge branch 'main' into embeddings-refactor

59d4db7

github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024

yiyixuxu closed this Nov 20, 2024

yiyixuxu deleted the embeddings-refactor branch November 20, 2024 22:29

		return (latent + pos_embed).to(latent.dtype)


		class ImagePositionalEmbeddings(nn.Module):

		import torch.nn.functional as F


		class LabelEmbedding(nn.Module):

		return x.squeeze(0)


		class HunyuanCombinedTimestepTextSizeStyleEmbedding(nn.Module):

		return self.norm(x)


		class PixArtAlphaTextProjection(nn.Module):

[Core] refactor embeddings #8722

[Core] refactor embeddings #8722

Conversation

sayakpaul commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Todos

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 27, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jul 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DN6 commented Jul 3, 2024

Uh oh!

sayakpaul commented Jul 3, 2024

Uh oh!

yiyixuxu commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Core] refactor `embeddings` #8722

[Core] refactor `embeddings` #8722

sayakpaul commented Jun 27, 2024 •

edited

Loading

yiyixuxu Jul 3, 2024 •

edited

Loading

sayakpaul commented Jul 3, 2024 •

edited

Loading

yiyixuxu commented Jul 3, 2024 •

edited

Loading