[Documentation] Video Prediction Labeled as a V2V process, despite taking only 1 frame #1

Sazoji · 2021-11-25T07:59:23Z

Judging by the results, the transformer is taking in a single frame, and would be considered an Image to Video process.
Something like video inpainting or camera FOV extrapolation(like in FGVC) would be input video -> output video.
Am I missing something in the documentation that maybe shows it as some sort of sparse video interpolation where it can input more than a (D1, D2, single frame); or was it called V2V in order to match the I2I label on the inpainting/image completion counterparts?

Additionally, there isn't a direct link to the paper, which documents that the V2V model only takes in a single image.
https://arxiv.org/abs/2111.12417

chenfei-wu · 2021-11-26T12:41:52Z

We view image as a speical video with one frame. As a result, image-to-video generation can viewed as a special case of video-to-video generation.

Sazoji · 2021-11-29T00:42:08Z

Ok, I'll agree that frame to video can be seen as a special case of V2V generation. I was going to close this yesterday, but GitHub was down during my break.
I'd just like to mention that this method is not the type of V2V usage one would be looking for when trying to do video completion or inpainting, which seemed to be implied when put below image completion.

An actual example of V2V synthesis would be a doman change or style transfer, like a video label encoder -> photorealistic video decoder, NUWA-Infinity seems to have the capacity to change style via a conditioned decoder, and properly labels the synthetic models as video prediction and generation based on what's encoded (images and text, IE not video). Would still like to see how video encoders could be implemented.

Sazoji closed this as completed Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Documentation] Video Prediction Labeled as a V2V process, despite taking only 1 frame #1

[Documentation] Video Prediction Labeled as a V2V process, despite taking only 1 frame #1

Sazoji commented Nov 25, 2021

chenfei-wu commented Nov 26, 2021

Sazoji commented Nov 29, 2021 •

edited

[Documentation] Video Prediction Labeled as a V2V process, despite taking only 1 frame #1

[Documentation] Video Prediction Labeled as a V2V process, despite taking only 1 frame #1

Comments

Sazoji commented Nov 25, 2021

chenfei-wu commented Nov 26, 2021

Sazoji commented Nov 29, 2021 • edited

Sazoji commented Nov 29, 2021 •

edited