-
Notifications
You must be signed in to change notification settings - Fork 165
Closed
Description
Hi @EasternJournalist could you confirm or deny some of the hyperparameters that were used for training? This is what I gathered from various sources. In particular, I suspect that 1 million training steps for MoGe (v1) is not actually correct? What about the batch size of MoGe-2?
Thanks in advance!
| MoGe | MoGe-2 | |
|---|---|---|
| batch size | 256 (paper) | 128 (comment) |
| steps | ~160k (comment) | 120k (paper) |
| base LR | 1e-4/1e-5 configs/train/v1.json | 1e-4/1e-5 (paper) |
| LR schedule | "more conservative than 2" | half every 25k steps (paper) |
| GPUs | 64 V100 #9 | 32 A100 (paper) |
| time | one week #9 | 5 days (paper) |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels