-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
experimentDescription of an experiment to be performedDescription of an experiment to be performed
Description
Ref: https://arxiv.org/abs/2503.18866
Why
We lack data, but have plenty of compute for the current scale of models we're interested in. Being able to generate more data in a smart way might be key to actually improving models. BoLT seems like a promising way to do that.
Approach
Implement the method and run experiments on e.g. 7B CPT w/ Dynaword or similar scale data. This method can soak up a lot of compute, so it'll be important to cap compute usage for experiments, maybe 20k MI250X hours to show positive results.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
experimentDescription of an experiment to be performedDescription of an experiment to be performed