Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: (Update) Together AI fine-tuning always stops at step 10 ❌ Need further explanations on how steps work in Together AI ✅ #18

Closed
ruiyiw opened this issue Oct 1, 2023 · 5 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed together-ai Issues related to together ai

Comments

@ruiyiw
Copy link
Collaborator

ruiyiw commented Oct 1, 2023

Description of the bug

All fine-tuning processes still stop at step 10. According to @lwaekfjlk, no early stopping code is found in the source code. And the curves shown in the below image stop at different losses. Need further experiments.
WechatIMG2322

Steps To Reproduce

No response

Additional Information

Here is a comparison between two different fine-tuning settings:
epoch = 2, batch=4 -> epoch 1 stopped at step 5, epoch 2 stopped at step 10
epoch = 4, batch=32 -> epoch 1 stopped at step 3, epoch 2 stopped at step 6, epoch 3 stopped at step 9, epoch 4 stopped at step 10.

@ruiyiw ruiyiw added the bug Something isn't working label Oct 1, 2023
@ruiyiw
Copy link
Collaborator Author

ruiyiw commented Oct 2, 2023

Total step number changes in a new fine-tuning setting:
llama-2-13b-chat w/ 587 datapoints ->
epoch = 2, batch = 8, lr=5e-5
Screenshot 2023-10-01 at 11 23 38 PM

@ruiyiw
Copy link
Collaborator Author

ruiyiw commented Oct 2, 2023

Need further explanations on how the steps work in Together AI

@ruiyiw ruiyiw changed the title [BUG]: Together AI fine-tuning always stops at step 10 [BUG]: (Update) Together AI fine-tuning always stops at step 10 ❌ Need further explanations on how steps work in Together AI ✅ Oct 2, 2023
@ruiyiw ruiyiw self-assigned this Oct 2, 2023
@lwaekfjlk
Copy link
Member

@ruiyiw Even though we set the learning rate as 5e-5, the peak point of the learning rate is actually 1e-5? Is that expected?

@lwaekfjlk lwaekfjlk added the together-ai Issues related to together ai label Oct 2, 2023
@ruiyiw ruiyiw added the help wanted Extra attention is needed label Oct 2, 2023
@ruiyiw
Copy link
Collaborator Author

ruiyiw commented Oct 2, 2023

Yes. But why does the lr start from 0 and gradually reach 1e-5?

@lwaekfjlk
Copy link
Member

Should be included in #14

lwaekfjlk pushed a commit that referenced this issue Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed together-ai Issues related to together ai
Projects
None yet
Development

No branches or pull requests

2 participants