Questions about function forward() in NUWA please. #9

Fitzwong · 2022-03-03T08:04:36Z

I'm confused me that, in function forward() of class NUWA, the ground-truth video is fed to transformer and calculate the output video, which is different from function generate().

frame_embeddings = self.video_transformer(
            frame_embeddings,  # calculated from ground-truth video
            context = text_embeds,
            context_mask = text_mask
        )

So when training NUWA, the loss comes from logits. But the logits are not only from text, but ground-truth video (only one transformer layer, different from the auto-regressive model in generate function). Is that some kind of cheating when training? Or should I generate logits in the same way as in generate(), and then calculate loss to train?

lucidrains · 2022-03-12T22:13:13Z

so the reason is because we compress the video into a sequence of tokens, and then we have each token predict the next token, autoregressively

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about function forward() in NUWA please. #9

Questions about function forward() in NUWA please. #9

Fitzwong commented Mar 3, 2022

lucidrains commented Mar 12, 2022

Questions about function forward() in NUWA please. #9

Questions about function forward() in NUWA please. #9

Comments

Fitzwong commented Mar 3, 2022

lucidrains commented Mar 12, 2022