Skip to content
This repository has been archived by the owner on Dec 5, 2023. It is now read-only.

Models for different embedding dimensions #3

Open
ethanlshen opened this issue Aug 23, 2023 · 3 comments
Open

Models for different embedding dimensions #3

ethanlshen opened this issue Aug 23, 2023 · 3 comments

Comments

@ethanlshen
Copy link

Hi,
Wondering if you could upload trained models for different embedding sizes?
Thanks

@kdexd
Copy link

kdexd commented Aug 23, 2023

Hi Ethan 👋

Unfortunately, we do not have these model weights anymore, I apologize for the inconvenience! However, training models with different embedding sizes (as mentioned in the paper) is much faster and computationally efficient than training models from scratch.

Quoting Section 4.4:

We initialize the encoders from ViT-L/16 models to reduce compute requirements, keep them frozen, and re-initialize projection layers and learnable scalars. We train for 30K iterations ...

We have released ViT-L/16 weights (links in README) that remain unchanged in the models you want. Reproducing these experiments will require you to re-initialize the image/text projection layers (two 1024 x 512 weight matrices for ViT-L/16) and four learnable scalars — softmax temperature, curvature, and alpha scaling after projection layers.

Based on my observation, these models train very quickly — you will get reasonable performance well before 30K iterations since the trainable parameters are low. Moreover, you can afford a large batch size with fewer GPUs due to reduced model size.

Let me know if you have further questions!

@kdexd
Copy link

kdexd commented Aug 23, 2023

(two 1024 x 512 weight matrices for ViT-L/16)

Correction: these will be 1024 x W matrices, where W is your desired output embedding dimension!

@ez2rok
Copy link

ez2rok commented Sep 12, 2023

I am experimenting with models of different dimensions and used the following code. I expect you to have already downloaded one of the pretrained models, and have a path to a train_config file (e.g. train_config = 'configs/train_meru_vit_s.py')

First, let's load the model

import torch 
from meru.config import LazyConfig, LazyFactory
from meru.utils.checkpointing import CheckpointManager

# get device
device = (
        torch.cuda.current_device()
        if torch.cuda.is_available()
        else torch.device("cpu")
    )

# Create the model using training config and load pre-trained weights.
_C_TRAIN = LazyConfig.load(train_config)
model = LazyFactory.build_model(_C_TRAIN, device).eval()
CheckpointManager(model=model).load(checkpoint_path)

Now freeze all layers except for those in learnable_params.

learnable_params = ['logit_scale', 'curv', 'visual_alpha', 'textual_alpha', 'visual_proj.weight', 'textual_proj.weight']
for name, p in model.named_parameters():
    if name not in learnable_params:
        p.requires_grad = False

After this, start your training! Hope this helps!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants