In the second stage of the FGT network training, I found that batch_size was only set to 1 and only 5 frames per video were selected for training. Thus, the size of the input tensor is (b, t, c, h, w)=>(1, 5, c, h, w). I would like to know why batch_size is set so small.