Skip to content

Problems encountered when calling clip_model.train() #354

@JWargrave

Description

@JWargrave

Hi, there.

I am new to CLIP and I've found that it really improves my development productivity. Thanks for your great work.

But I have some problems when using CLIP to tackle a image retrieval task.

For every epcoh, I train and validate the model respectively. The pseudocode of the training process is as follows:

clip_model, clip_preprocess=clip.load('RN50x4', device=device, jit=False)
...
for epoch in range(num_epoch):
    clip_model.train() # with or without **clip_model.train()** makes a noticeable difference in accuracy on validation set
    for image,text,label in train_dataloader:
        with torch.cuda.amp.autocast():
            img_feat=clip_model.encode_image(image)
            text_feat=clip_model.encode_text(text)
            loss=loss_function(img_feat,text_feat,label)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
    clip_model.eval()
    for image,text,label in val_dataloader:
        ...
    

At first, I forgot to call clip_model.train() before the training loops.

Then, I added clip_model.train() before training loops. But the accuracy on the validation set drops noticeably (I have tried for many times and the performance gap always exist).

That is to say, only adding or removing model.train() changes the performance noticeably.

The phenomenon above is very strange and I want to know the reason behind this and how to solve this problem.

Thanks a lot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions