questions on double_size and some blurred results #17

zengyh1900 · 2019-09-03T15:57:21Z

Dear authors,

I'm studying your paper and codes, thanks for sharing!
I have a question that,
as I know from your codes and paper, 0 indicates non-hole pixels and 1 indicates hole pixels.
but why do you need to multiply masks by 0.5 when the input size is 512?
demo_vi.py

Also, I have observed similar results of some cases shown in your paper, however, some cases are very blurred (especially in slow-moving cases)
This is an example from DAVIS (DAVIS/bear)

Is it reasonable? Or did I miss anything to get the results better?

Looking forward to your reply.

ytongW · 2020-01-02T08:58:00Z

Could you tell me how to change the size of output image?
when I changed the size of input image directly, I got this error.
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 32 and 64 in dimension 3 at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCTensorMath.cu:111
if I resize the output image, then the image will get very blurred.
Thanks for your time!

AjithPanja · 2020-12-13T14:04:15Z

Yeah, I too noticed the blurry part while running the code with Bear video.
I would be really grateful if you could clarify my doubt 😅. From my understanding, the known pixels from the previous and future frames are filled, but how blind spot pixels are filled? (Eg. A trashcan in the same place throughout the video, If the trashcan has to be removed how it's pixels will be filled?)

mcahny · 2020-12-20T17:00:30Z

Hi all, thanks for your interest. To answer your questions,

The double_size case was trained with the mask where the hole region is filled with the value 0.5, and non-hole regions with 1.0. There is no special reason behind this choice.

About the fixed-size hole, your results look reasonable and I can reproduce that on the bear video.
My understanding on this result are based on these points:

VINet can be divided into 1) an image-level encoder-decoder network, and 2) additional reference encoders that support the target frame inpainting.
1 performs the standard image inpainting and is supposed to be able to "hallucinate" on the never visible region.
2 performs "copy-and-paste" from neighbor frames onto the target frame hole region.
While VINet is supposed to be good at both, the empirical results imply that training did not balance well between the both, and mainly focused on "copy-and-paste" learning. This would have led to poor "hallucination" performance and thus blurry results with fixed holes.

mcahny closed this as completed Aug 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions on double_size and some blurred results #17

questions on double_size and some blurred results #17

zengyh1900 commented Sep 3, 2019

ytongW commented Jan 2, 2020

AjithPanja commented Dec 13, 2020

mcahny commented Dec 20, 2020 •

edited

Loading

questions on double_size and some blurred results #17

questions on double_size and some blurred results #17

Comments

zengyh1900 commented Sep 3, 2019

ytongW commented Jan 2, 2020

AjithPanja commented Dec 13, 2020

mcahny commented Dec 20, 2020 • edited Loading

mcahny commented Dec 20, 2020 •

edited

Loading