Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about testing my own videos of different resolutions #9

Closed
yavon818 opened this issue Aug 22, 2023 · 1 comment
Closed

Questions about testing my own videos of different resolutions #9

yavon818 opened this issue Aug 22, 2023 · 1 comment

Comments

@yavon818
Copy link

I wonder if the input image size is fixed, as I run into some problems when I use the images of different resolutions (e.g., 688*384 ) , CUDA_VISIBLE_DEVICES=0 python infer_NVDS_dpt_bi.py --base_dir ./demo_outputs/dpt_init/kid_running/ --vnum kid_running --infer_w 688 --infer_h 384
let us begin test NVDS(DPT) demo
Load checkpoint: ./gmflow/checkpoints/gmflow_sintel-0c07dcb3.pth
******self.shift_size: 0
here mask none
******self.shift_size: 0
here mask none
******self.shift_size: 0
here mask none
******self.shift_size: 0
here mask none
/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
warnings.warn(
Traceback (most recent call last):
File "infer_NVDS_dpt_bi.py", line 396, in
outputs = dpt.forward(rgb)
File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 115, in forward
inv_depth = super().forward(x).squeeze(dim=1)
File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 80, in forward
path_3 = self.scratch.refinenet3(path_4, layer_3_rn)
File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data_ssd/home/z00647125/NVDS/dpt/blocks.py", line 372, in forward
output = self.skip_add.add(output, res)
File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/quantized/modules/functional_modules.py", line 43, in add
r = torch.add(x, y)
RuntimeError: The size of tensor a (44) must match the size of tensor b (43) at non-singleton dimension 3

@RaymondWang987
Copy link
Owner

RaymondWang987 commented Aug 22, 2023

I wonder if the input image size is fixed, as I run into some problems when I use the images of different resolutions (e.g., 688*384 ) , CUDA_VISIBLE_DEVICES=0 python infer_NVDS_dpt_bi.py --base_dir ./demo_outputs/dpt_init/kid_running/ --vnum kid_running --infer_w 688 --infer_h 384
let us begin test NVDS(DPT) demo
Load checkpoint: ./gmflow/checkpoints/gmflow_sintel-0c07dcb3.pth
******self.shift_size: 0
here mask none
******self.shift_size: 0
here mask none
******self.shift_size: 0
here mask none
******self.shift_size: 0
here mask none
/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/functional.py:3609: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
warnings.warn(
Traceback (most recent call last):
File "infer_NVDS_dpt_bi.py", line 396, in
outputs = dpt.forward(rgb)
File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 115, in forward
inv_depth = super().forward(x).squeeze(dim=1)
File "/data_ssd/home/z00647125/NVDS/dpt/models.py", line 80, in forward
path_3 = self.scratch.refinenet3(path_4, layer_3_rn)
File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data_ssd/home/z00647125/NVDS/dpt/blocks.py", line 372, in forward
output = self.skip_add.add(output, res)
File "/opt/conda/envs/NVDS/lib/python3.8/site-packages/torch/nn/quantized/modules/functional_modules.py", line 43, in add
r = torch.add(x, y)
RuntimeError: The size of tensor a (44) must match the size of tensor b (43) at non-singleton dimension 3

The input image can be changed. However, the --infer_w and --infer_h should be set to integer multiples of 32. For example, you can use --infer_w 672 or --infer_w 704 in your case.

For initial depth predictors (DPT in your case) and our NVDS, the smallest feature maps produced by the backbone is 1/32 of the input width and height. But 688/32=21.5 thus there will be misalignment of resolutions (the 44 and 43 in your error message) in the down-sampling and up-sampling processes (both for DPT, Midas, or our NVDS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants