New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about ema alpha setting #168
Comments
Hi @FateScript , yeah good point. We never published this. It's some math I sketched out. I'll share my math below with the caveat that I haven't been very careful to verify its full correctness, and the math may lack context (e.g., variable meanings). I sketched the math and implemented it, and it works empirically. So I'm sharing the math, and hopefully there's no embarrassing mistakes in it. If you find an embarrassing mistake in the math below, please lmk! Now, the math is lacking context, I hope you'll be able to understand it. I probably won't have time to go into more detail, but I figured I'd share in the hope that you can try to decode it and it's somewhat useful :) Momentum formulation [α=.999] Update formulation [α=.001]: Two step update rolled into one assuming α2 ≈ 0 and setting u=(u0+u1)/2: The same holds for n>>1 updates not just 2 since for small α and αn<<1 the following holds: Thus, To make the update independent of batch size n, we will specify α* (independent of batch_size) and we will use α in the update step where: Finally, to normalize by schedule length, we set: |
Thanks @pdollar , I understand how the magic code works now. It's soooooo kind of you : ) BTW, I want to discuss this issue a bit more. |
Hey thanks for digging in deeper! I don't think I have time to adjust this or think more deeply, and we're already using this way of defining EMA for many models we have trained. I find it works really well, but more importantly, I wouldn't want to break backward compatibility at this stage even if the result was more intuitive! Thanks for the discussion/suggestions tho. |
Hi, thanks for your wonderful repo.
In your code of
update_model_ema
pycls/pycls/core/net.py
Lines 101 to 114 in ee770af
I notice that you are using a magic code
adjust = cfg.TRAIN.BATCH_SIZE / cfg.OPTIM.MAX_EPOCH * update_period
to modify alpha value. Is there any insight of doing this? If there are some paper of this, could you please help telling me?
Thanks : )
The text was updated successfully, but these errors were encountered: