Skip to content

nggsam/preference_model

Repository files navigation

Trains and compares a variety of preference models (reward models) with different losses and datasets.

TODOs

  • Add model.
  • Add training code.
  • Add evaluation code.
  • Test complete workflow with 10% train and 10% eval data for one epoch.
  • Add requirements.txt.
  • Train to make sure that loss is going down.
  • Add metrics to measure accuracy while training.
  • Try different configs:
    • Freeze some of the layers to avoid overfitting.
    • Train first layer for 0.1 epoch. Then train the other layers.
  • Deepspeed with a config file.
  • Add Deepspeed config and try Deepspeed training.
  • Try PyTorch compile.
  • Compare different losses.
  • Compare different datasets.
  • Add synthetic datasets.
  • Incorporate WANDB to keep track of experiments.

Resources

About

Trains and compares a variety of preference models (reward models) with different losses and datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors