-
Notifications
You must be signed in to change notification settings - Fork 301
Add compilation defaults for the Fnet MaskedLM task model #834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Code looks good. Have we tried this out with a few different presets to make sure we are converging generally? |
Thanks for the PR! 🥂 @mattdangerw Matt, I am feeling for future PRs adding compilation defaults, we may want to require fitting results to justify the value we choose. It will add a bit overhead to such PRs, but I think it's important for us to ensure the correctness. |
Agreed! I think this can be kinda add hoc for now, as we didn't really have an established process for classification defaults either. But long term, we for sure should invest in infra to experiment with multiple learning rates for multiple checkpoints (and potentially even multiple datasets!), to make sure we have good default. Obviously we are limited by compute and will have to make some tradeoffs. For now, let's just sanity check that our constant are somewhat reasonable. |
@mattdangerw yea, budget for computing is a pain for us now. I think for now we can ask for sanity check on small/tiny from users, since they can fit in colab free-tier GPU. But as we don't have a guideline right now, we can merge this PR without asking for that. |
@mattdangerw I opened an issue to track. |
Hi @chenmoneygithub, I think we can consider using Kaggle Notebooks as an alternative to colab free tier ( colab gpus are pretty slow compared to kaggle , see this. For #833 , following your example script, I am experimenting with 1e-5, 2e-5, 5e-5, 1e-4 as LRs for the three models. Script is still running, while it demonstrates how to brute-force check with which LR model will perform better. Do you think it'll be a meaningful contribution against #850 ? On a side note, I don't remember if colab still has that 1 hr limit or so, but kaggle doesn't. I think we can use kaggle (private notebook if needed) to do tpu testing for new backbones we add. Thanks. PS. The Kaggle notebook I linked is still running (another cool feature of Kaggle which allows you to save your work when your script running is completed), once it's done you'll be able to see it. I'll ping here. |
@shivance It's good to know! thanks! |
@mattdangerw @chenmoneygithub I have tested this performs the best at learning rate 1e-4.Accordingly updated the value. |
@soma2000-lang thanks! Do you have a way to share the experiments you did? I think going off of @shivance testing on #833, it seems like 5e-5 might be a good general purpose learning rate here, and a lower number will be a little more conservative. Let's just stick with that for now for simplicity, but we can always update. |
…#834) * Adding * Update f_net_masked_lm.py * Update f_net_masked_lm.py * Update f_net_masked_lm.py * Use 5e-5 --------- Co-authored-by: Matt Watson <1389937+mattdangerw@users.noreply.github.com>
Partially fixes #830