You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to train a SFT model using the PPOTrainer but receive the following error when initializing PPOTrainer. This doesn't appear to be an error caused by something I'm passing to PPOTrainer but I could be mistaken. Note: both the SFT model and the Reward model are quantized (LoRA adapters have been merged using merge_and_unload().
The initialization is straight forward (I followed these instructions):
ppo_trainer = PPOTrainer(
config=ppo_config,
model=sft_model,
ref_model=None, # will re-use `model` if set to None
tokenizer=tokenizer,
dataset=dataset,
optimizer=None, # None defaults to Adam optimizer with linear learning rate specific in PPOConfig
num_shared_layers=None, # None defaults to all layers are shared
)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am attempting to train a SFT model using the PPOTrainer but receive the following error when initializing PPOTrainer. This doesn't appear to be an error caused by something I'm passing to PPOTrainer but I could be mistaken. Note: both the SFT model and the Reward model are quantized (LoRA adapters have been merged using
merge_and_unload()
.The initialization is straight forward (I followed these instructions):
--> Please see this notebook (last cell) for reproducible error.
Beta Was this translation helpful? Give feedback.
All reactions