Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different zero shot results, grad strides do not match butcket view strides #29

Closed
ngfuong opened this issue Oct 6, 2022 · 2 comments
Closed

Comments

@ngfuong
Copy link

ngfuong commented Oct 6, 2022

Thank you for your great paper.

I tried to train a zero-shot model (vitl16_384) and tested on PASCAL fold 0 and got the following problems:

  1. After testing, it returned mIoU = 32.9 versus 61.3 reported in the paper.

This is my training script:

python train_lseg_zs.py \

--exp_name train_vitl16_pascal_fold0 --project_name lightseg \

--backbone clip_vitl16_384 \

--dataset pascal --data_path data/Dataset_HSN \

--fold 0 --nshot 0 \

--batch_size 4 --base_lr 0.0001 --max_epochs 200 \

--weight_decay 1e-5 --no-scaleinv --widehead 

How does the max_epochs argument take part in the training process since there are only 4 epochs logged out?
Apart from changing the model from vitl16_384 to vitb32_384, is there anything wrong with my training script?

  1. While training, this error is logged out:
[W reducer.cpp:283] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [512, 256, 1, 1], strides() = [256, 1, 256, 256]
bucket_view.sizes() = [512, 256, 1, 1], strides() = [256, 1, 1, 1] (function operator())

While training, DPP is enabled and I only used 01 GPU with batch_size = 4. I am not sure if this damages training. Does the argument accumulate_grad_batches probably make this happen?

@Boyiliee
Copy link
Collaborator

Boyiliee commented Oct 6, 2022

Hi @ngfuong ,

Thanks for your interest in LSeg!

As has been reported, we only train a few epochs of LSeg on zero-shot for PASCAL and COCO experiments. I haven't met such a problem, you might need to check your system for details. Also, we have released our checkpoints, please feel free to use them.

Hope this helps.

Best,
Boyi

@Boyiliee Boyiliee closed this as completed Oct 6, 2022
@ngfuong
Copy link
Author

ngfuong commented Oct 7, 2022

Hi @Boyiliee
Thank you for replying!
I'm trying to reproduce your results so whether your reported results matche mine is of critical importance.
Could you verify if my arguments and hyperparameters are reasonable? I'm training with 01 Tesla A100 80GB GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants