-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor rlhf #328
Refactor rlhf #328
Conversation
# Conflicts: # llm_studio/python_configs/text_causal_language_modeling_config.py
# Conflicts: # app_utils/config.py # llm_studio/src/datasets/text_causal_language_modeling_ds.py # llm_studio/src/models/text_causal_language_modeling_model.py
Should be good for a first review. Some points I noticed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @maxjeblick .
I really like your refactors and it makes great sense to put RLHF in an own problem type.
I will still need to go through the model and train changes and also do some local testing, but will already provide you some initial thoughts now.
llm_studio/python_configs/text_rlhf_language_modeling_config.py
Outdated
Show resolved
Hide resolved
llm_studio/python_configs/text_rlhf_language_modeling_config.py
Outdated
Show resolved
Hide resolved
llm_studio/python_configs/text_rlhf_language_modeling_config.py
Outdated
Show resolved
Hide resolved
llm_studio/python_configs/text_rlhf_language_modeling_config.py
Outdated
Show resolved
Hide resolved
llm_studio/python_configs/text_rlhf_language_modeling_config.py
Outdated
Show resolved
Hide resolved
# Conflicts: # llm_studio/src/models/text_base_model.py
# Conflicts: # tests/models/test_text_causal_language_modeling_model.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and let's also remove
|
Thanks for the review, it was very helpful. I addressed the issues above, in addition I added some smaller code changes:
Yes good idea, I changed that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @maxjeblick , very nice refactor and small fixes along the way!
Looks all good to me now.
I will likely work on RLHF again in a follow up PR/issue and allow for larger batches. It should be much easier now with the individual train loops.
This PR adds a separate problem type for RLHF.
Some discussion items:
train.py
contains two training functions (run_train_rlhf
andrun_train
) with partially duplicated code. IMO, it is ok to keep it as is rather than having one function with multiple if-else statements.Target Text
is not used, thus always an empty string). Will implement it here after Max/insights table view #301 has been merged.Fixes #317
We can first merge #308 and I will fix potential merge conflicts subsequently.