Skip to content

The VidTwin training shows a NaN issue #24

@fx-hit

Description

@fx-hit

When training from scratch using my own dataset, after around 100k steps, the AE loss decreases gradually from 1.02×10⁵ to −3.41×10⁵, and then starts producing NaN values.
However, when loading the author’s pretrained model, the AE loss starts at −3.59×10⁵ and becomes NaN after only a few hundred steps.
I have two main questions:

  • Why is the loss scale so large? Is this normal?
  • What could be the possible causes of the NaN issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions