Refactor rlhf #328

maxjeblick · 2023-08-01T08:56:14Z

This PR adds a separate problem type for RLHF.

Some discussion items:

How to structure the training settings configuration. The current order may not be optimal.
train.py contains two training functions (run_train_rlhf and run_train) with partially duplicated code. IMO, it is ok to keep it as is rather than having one function with multiple if-else statements.
Train data insights should be redone (Target Text is not used, thus always an empty string). Will implement it here after Max/insights table view #301 has been merged.

Fixes #317
We can first merge #308 and I will fix potential merge conflicts subsequently.

# Conflicts: # llm_studio/python_configs/text_causal_language_modeling_config.py

# Conflicts: # app_utils/config.py # llm_studio/src/datasets/text_causal_language_modeling_ds.py # llm_studio/src/models/text_causal_language_modeling_model.py

…ctor_rlhf

maxjeblick · 2023-08-07T19:49:40Z

Should be good for a first review.

Some points I noticed:

Train data insights do not display the answer text. We could either leave it as is (highlighting that training is not using ground truth answer), or removing the field by subclassing the plot class.
Some inference parameters for RLHF are hidden in the chat window. One could explicitly change the visibility in the chat tab?

pascal-pfeiffer

Thanks a lot @maxjeblick .
I really like your refactors and it makes great sense to put RLHF in an own problem type.

I will still need to go through the model and train changes and also do some local testing, but will already provide you some initial thoughts now.

llm_studio/python_configs/text_rlhf_language_modeling_config.py

llm_studio/python_configs/text_causal_language_modeling_config.py

llm_studio/python_configs/text_rlhf_language_modeling_config.py

llm_studio/src/datasets/text_rlhf_modeling_ds.py

# Conflicts: # llm_studio/src/models/text_base_model.py

# Conflicts: # tests/models/test_text_causal_language_modeling_model.py

pascal-pfeiffer

LLM Backbone is not a dropdown for me
hide output dir

We could probably move the reward model to the top, too, wdyt?

pascal-pfeiffer · 2023-08-10T14:07:54Z

and let's also remove

FSDP
skip parent proba
random parent proba

maxjeblick · 2023-08-14T14:48:49Z

Thanks for the review, it was very helpful. I addressed the issues above, in addition I added some smaller code changes:

I fixed a small bug in plot_batch
I refactored .from_dict class method to a common base method.

We could probably move the reward model to the top, too, wdyt?

Yes good idea, I changed that.

pascal-pfeiffer

Thanks a lot @maxjeblick , very nice refactor and small fixes along the way!
Looks all good to me now.

I will likely work on RLHF again in a follow up PR/issue and allow for larger batches. It should be much easier now with the individual train loops.

maxjeblick added 22 commits July 27, 2023 11:29

create rlhf dataset

24e33c8

create rlhf model

3cf7a8a

refactor rlhf code

e37477e

add rlhf config classes

2305d17

refactor train py

4ce84cb

Merge branch 'main' into max/refactor_rlhf

0879fbc

refactor config loading

d5aa3f7

fix typo

76e2ec3

fix generate issues

76c8b45

add reward_model_prompt_text

6d2ace4

Merge branch 'main' into max/refactor_rlhf

4d46a91

# Conflicts: # llm_studio/python_configs/text_causal_language_modeling_config.py

merge main

2b4f763

fix order

a0ebdcf

refactor dataset

ea892e2

Merge branch 'main' into max/refactor_rlhf

c59c973

fix format

dff0e3c

fix autorefactroing

9394504

fix loss function

471546f

add dummy loss class

420e7a3

add dummy loss class

dd66955

fix zero division error

768ad99

minor edits

87314ba

maxjeblick requested a review from pascal-pfeiffer August 1, 2023 08:56

maxjeblick and others added 6 commits August 1, 2023 11:27

Merge branch 'main' into max/refactor_rlhf

6588ada

Merge branch 'main' into max/refactor_rlhf

b6e1bab

remove automatic refactoring artifacts

57694d0

Merge branch 'main' into max/refactor_rlhf

3127302

# Conflicts: # app_utils/config.py # llm_studio/src/datasets/text_causal_language_modeling_ds.py # llm_studio/src/models/text_causal_language_modeling_model.py

Merge remote-tracking branch 'origin/max/refactor_rlhf' into max/refa…

28a3151

…ctor_rlhf

fix merge conflicts

7cd94b3

maxjeblick marked this pull request as draft August 7, 2023 10:10

maxjeblick added 2 commits August 7, 2023 19:49

fix config parsing

76534de

fix autorefactroing

1368f34

maxjeblick marked this pull request as ready for review August 7, 2023 19:46

maxjeblick added 2 commits August 7, 2023 21:58

add readme updates

489679b

remove uneeded cfg asignments

d2ce9e7

pascal-pfeiffer requested changes Aug 7, 2023

View reviewed changes

maxjeblick added 8 commits August 8, 2023 10:29

address pr comments

56c5554

fix order in DefaultConfigProblemBase

d3c38cc

Merge branch 'main' into max/refactor_rlhf

ecd4c60

# Conflicts: # llm_studio/src/models/text_base_model.py

merge main

31d1a0a

Merge branch 'main' into max/refactor_rlhf

5753783

# Conflicts: # tests/models/test_text_causal_language_modeling_model.py

fix failing tests

668a005

Merge branch 'main' into max/refactor_rlhf

4033287

fix format

7892dbe

pascal-pfeiffer reviewed Aug 10, 2023

View reviewed changes

maxjeblick added 7 commits August 14, 2023 10:42

Merge branch 'main' into max/refactor_rlhf

4b872af

address pr comments

b0470dc

add tests for from_dict method

53e6b73

add docstring

4ea11ca

promote reward model to top

bd8634d

promote reward model to top

6cf1e65

fix plot_batch bug

24a2582

maxjeblick requested a review from pascal-pfeiffer August 14, 2023 14:48

code naming improvements

879bb18

pascal-pfeiffer approved these changes Aug 15, 2023

View reviewed changes

maxjeblick merged commit be29c33 into main Aug 15, 2023
5 checks passed

maxjeblick deleted the max/refactor_rlhf branch August 15, 2023 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor rlhf #328

Refactor rlhf #328

maxjeblick commented Aug 1, 2023

maxjeblick commented Aug 7, 2023

pascal-pfeiffer left a comment

pascal-pfeiffer left a comment

pascal-pfeiffer commented Aug 10, 2023

maxjeblick commented Aug 14, 2023

pascal-pfeiffer left a comment

Refactor rlhf #328

Refactor rlhf #328

Conversation

maxjeblick commented Aug 1, 2023

maxjeblick commented Aug 7, 2023

pascal-pfeiffer left a comment

Choose a reason for hiding this comment

pascal-pfeiffer left a comment

Choose a reason for hiding this comment

pascal-pfeiffer commented Aug 10, 2023

maxjeblick commented Aug 14, 2023

pascal-pfeiffer left a comment

Choose a reason for hiding this comment