Add ability to train decoder using embedding-image pairs #63

Veldrovive · 2022-05-05T14:59:23Z

I am implementing a single node training script for the decoder and it seems @lucidrains has implemented a wrapper script for this purpose that is already feature-full. Currently, the forward pass is implemented as follows:

DALLE2-pytorch/dalle2_pytorch/train.py

Lines 189 to 199 in 1d5dc08

    
           def forward( 
        
               self, 
        
               x, 
        
               *, 
        
               unet_number, 
        
               divisor = 1, 
        
               **kwargs 
        
           ): 
        
               with autocast(enabled = self.amp): 
        
                   loss = self.decoder(x, unet_number = unet_number, **kwargs) 
        
               return self.scale(loss / divisor, unet_number = unet_number)

This lacks the ability to substitute our own image embeddings in the case where we have precomputed embedding-image pairs. The functionality is already mostly supported by the Decoder network where image_embed can be passed to the forward method so this could be implemented by simply adding the image_embed parameter as a pass though to decoder.forward. However, it would also be convenient to make the clip model optional in the Decoder constructor. I already started on this a week ago in this branch by adding the ability to set clip_image_size and channels separately from a clip model.

There are only a few small changes that would be necessary to implement this feature so I could put together a pull request to do this.

The text was updated successfully, but these errors were encountered:

lucidrains · 2022-05-05T15:04:26Z

@Veldrovive Hi Aidan! Indeed that is the case, and I can get this finished in the next half hour, been meaning to get around to it!

lucidrains · 2022-05-05T15:11:35Z

@Veldrovive here you go https://github.com/lucidrains/DALLE2-pytorch/releases/tag/0.0.106

Veldrovive · 2022-05-05T15:21:35Z

Great! For the DecoderTrainer are you thinking to just use kwargs for image_embed and not put a specific named parameter for it?

lucidrains · 2022-05-05T15:24:51Z

@Veldrovive yup, for wrapper i usually just forward kwargs to whatever is being wrapped (instead of using some fancy forwarding module)

Veldrovive closed this as completed May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to train decoder using embedding-image pairs #63

Add ability to train decoder using embedding-image pairs #63

Veldrovive commented May 5, 2022

lucidrains commented May 5, 2022

lucidrains commented May 5, 2022

Veldrovive commented May 5, 2022

lucidrains commented May 5, 2022

Add ability to train decoder using embedding-image pairs #63

Add ability to train decoder using embedding-image pairs #63

Comments

Veldrovive commented May 5, 2022

lucidrains commented May 5, 2022

lucidrains commented May 5, 2022

Veldrovive commented May 5, 2022

lucidrains commented May 5, 2022