Skip to content

Conversation

soma2000-lang
Copy link
Contributor

@soma2000-lang soma2000-lang commented Mar 11, 2023

Partially fixes #830

@soma2000-lang soma2000-lang marked this pull request as ready for review March 11, 2023 06:56
@soma2000-lang soma2000-lang changed the title Adding Add compilation defaults for the Fnet MaskedLM task model Mar 11, 2023
@mattdangerw
Copy link
Member

Code looks good. Have we tried this out with a few different presets to make sure we are converging generally?

@chenmoneygithub
Copy link
Contributor

Thanks for the PR! 🥂

@mattdangerw Matt, I am feeling for future PRs adding compilation defaults, we may want to require fitting results to justify the value we choose. It will add a bit overhead to such PRs, but I think it's important for us to ensure the correctness.

@mattdangerw
Copy link
Member

@mattdangerw Matt, I am feeling for future PRs adding compilation defaults, we may want to require fitting results to justify the value we choose.

Agreed! I think this can be kinda add hoc for now, as we didn't really have an established process for classification defaults either. But long term, we for sure should invest in infra to experiment with multiple learning rates for multiple checkpoints (and potentially even multiple datasets!), to make sure we have good default. Obviously we are limited by compute and will have to make some tradeoffs.

For now, let's just sanity check that our constant are somewhat reasonable.

@chenmoneygithub
Copy link
Contributor

@mattdangerw yea, budget for computing is a pain for us now. I think for now we can ask for sanity check on small/tiny from users, since they can fit in colab free-tier GPU. But as we don't have a guideline right now, we can merge this PR without asking for that.

@chenmoneygithub
Copy link
Contributor

@mattdangerw I opened an issue to track.

@kanpuriyanawab
Copy link
Collaborator

kanpuriyanawab commented Mar 15, 2023

@mattdangerw yea, budget for computing is a pain for us now. I think for now we can ask for sanity check on small/tiny from users, since they can fit in colab free-tier GPU. But as we don't have a guideline right now, we can merge this PR without asking for that.

Hi @chenmoneygithub, I think we can consider using Kaggle Notebooks as an alternative to colab free tier ( colab gpus are pretty slow compared to kaggle , see this. For #833 , following your example script, I am experimenting with 1e-5, 2e-5, 5e-5, 1e-4 as LRs for the three models.

Script is still running, while it demonstrates how to brute-force check with which LR model will perform better. Do you think it'll be a meaningful contribution against #850 ?

On a side note, I don't remember if colab still has that 1 hr limit or so, but kaggle doesn't.

image

I think we can use kaggle (private notebook if needed) to do tpu testing for new backbones we add.

Thanks.

PS. The Kaggle notebook I linked is still running (another cool feature of Kaggle which allows you to save your work when your script running is completed), once it's done you'll be able to see it. I'll ping here.

@chenmoneygithub
Copy link
Contributor

@shivance It's good to know! thanks!

@soma2000-lang
Copy link
Contributor Author

@mattdangerw @chenmoneygithub I have tested this performs the best at learning rate 1e-4.Accordingly updated the value.

@mattdangerw
Copy link
Member

@soma2000-lang thanks! Do you have a way to share the experiments you did?

I think going off of @shivance testing on #833, it seems like 5e-5 might be a good general purpose learning rate here, and a lower number will be a little more conservative. Let's just stick with that for now for simplicity, but we can always update.

@mattdangerw mattdangerw merged commit ca1a62e into keras-team:master Mar 17, 2023
kanpuriyanawab pushed a commit to kanpuriyanawab/keras-nlp that referenced this pull request Mar 26, 2023
…#834)

* Adding

* Update f_net_masked_lm.py

* Update f_net_masked_lm.py

* Update f_net_masked_lm.py

* Use 5e-5

---------

Co-authored-by: Matt Watson <1389937+mattdangerw@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add compilation defaults for the MaskedLM task models
4 participants