Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions on double_size and some blurred results #17

Closed
zengyh1900 opened this issue Sep 3, 2019 · 3 comments
Closed

questions on double_size and some blurred results #17

zengyh1900 opened this issue Sep 3, 2019 · 3 comments

Comments

@zengyh1900
Copy link

Dear authors,

I'm studying your paper and codes, thanks for sharing!
I have a question that,
as I know from your codes and paper, 0 indicates non-hole pixels and 1 indicates hole pixels.
but why do you need to multiply masks by 0.5 when the input size is 512?
demo_vi.py

Also, I have observed similar results of some cases shown in your paper, however, some cases are very blurred (especially in slow-moving cases)
This is an example from DAVIS (DAVIS/bear)
image
Is it reasonable? Or did I miss anything to get the results better?

Looking forward to your reply.

@ytongW
Copy link

ytongW commented Jan 2, 2020

Could you tell me how to change the size of output image?
when I changed the size of input image directly, I got this error.
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 32 and 64 in dimension 3 at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCTensorMath.cu:111
if I resize the output image, then the image will get very blurred.
Thanks for your time!

@AjithPanja
Copy link

Yeah, I too noticed the blurry part while running the code with Bear video.
I would be really grateful if you could clarify my doubt 😅. From my understanding, the known pixels from the previous and future frames are filled, but how blind spot pixels are filled? (Eg. A trashcan in the same place throughout the video, If the trashcan has to be removed how it's pixels will be filled?)

@mcahny
Copy link
Owner

mcahny commented Dec 20, 2020

Hi all, thanks for your interest. To answer your questions,

The double_size case was trained with the mask where the hole region is filled with the value 0.5, and non-hole regions with 1.0. There is no special reason behind this choice.

About the fixed-size hole, your results look reasonable and I can reproduce that on the bear video.
My understanding on this result are based on these points:

  • VINet can be divided into 1) an image-level encoder-decoder network, and 2) additional reference encoders that support the target frame inpainting.
  • 1 performs the standard image inpainting and is supposed to be able to "hallucinate" on the never visible region.
  • 2 performs "copy-and-paste" from neighbor frames onto the target frame hole region.
  • While VINet is supposed to be good at both, the empirical results imply that training did not balance well between the both, and mainly focused on "copy-and-paste" learning. This would have led to poor "hallucination" performance and thus blurry results with fixed holes.

@mcahny mcahny closed this as completed Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants