Problems encountered when calling `clip_model.train()`

Hi, there.

I am new to `CLIP` and I've found that it really improves my development productivity. Thanks for your great work.

But I have some problems when using `CLIP` to tackle a image retrieval task.

For every epcoh, I train and validate the model respectively. The pseudocode of the training process is as follows:

```python
clip_model, clip_preprocess=clip.load('RN50x4', device=device, jit=False)
...
for epoch in range(num_epoch):
    clip_model.train() # with or without **clip_model.train()** makes a noticeable difference in accuracy on validation set
    for image,text,label in train_dataloader:
        with torch.cuda.amp.autocast():
            img_feat=clip_model.encode_image(image)
            text_feat=clip_model.encode_text(text)
            loss=loss_function(img_feat,text_feat,label)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
    clip_model.eval()
    for image,text,label in val_dataloader:
        ...
    
```

At first, I forgot to call `clip_model.train()` before the training loops.

Then, I added `clip_model.train()` before training loops. But the accuracy on the validation set drops noticeably (I have tried for many times and the performance gap always exist).

That is to say, only adding or removing `model.train()` changes the performance noticeably.

The phenomenon above is very strange and I want to know the reason behind this and how to solve this problem.

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems encountered when calling `clip_model.train()` #354

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems encountered when calling clip_model.train() #354

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Problems encountered when calling `clip_model.train()` #354