-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Dev finetune update #1698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev finetune update #1698
Conversation
…o dev_finetune_update
|
Note to self: we may need to think about the role of adapter retries at (i) bootstrapping time and (ii) inference time with the FT'ed model. My basic intuition is that (i) is fine but that (ii) needs the adapter with which finetuning was conducted to be attached to the program permanently and saved with it, etc. |
| if not data_format: | ||
| adapter = self.infer_adapter() | ||
| data_format = infer_data_format(adapter) | ||
| validate_data_format(data=train_data, data_format=data_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data validation applicable to all finetunes
chenmoneygithub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Can we create a notebook (Colab or other format) to demonstrate our use case from end to end? We may want to ship a code example together with this PR.
| return format_turn(signature, values, role, incomplete) | ||
|
|
||
| # TODO(PR): Looks ok? | ||
| def format_finetune_data(self, signature, demos, inputs, outputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is generally true for all finetuning service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function looks good to me.
the provider class for an unusual service can just reformat the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our case, DataFormat is the object that represents the format for fine-tuning. The purpose of this adapter method is to re-assemble the actual call that's made with its completion and prompt. We could re-name to make this clear.
| from dspy.clients.finetune import ( | ||
| FinetuneJob, | ||
| TrainingMethod, | ||
| # TrainingMethod, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a ruff fix change. The anyscale.py file needs to be separately updated to match our new interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm should we remove this TrainingMethod instead of commenting it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes!
dspy/clients/lm.py
Outdated
| # Remember to update LM.copy() if you modify the constructor! | ||
| if launch_kwargs is not None or provider is not None: | ||
| import dspy | ||
| assert dspy.settings.experimental, "The `launch_kwargs` and `provider` arguments are experimental. Set `dspy.settings.experimental` to `True` to use them." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general assert statement is discouraged: https://google.github.io/styleguide/pyguide.html#244-decision. For this case, we can throw a ValueError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine for now
| return job | ||
|
|
||
| def _run_finetune_job(self, job: TrainingJob): | ||
| # TODO(enhance): We should listen for keyboard interrupts somewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need it? I was assuming finetune will return immediate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point -- finetune returns immediately after starting a thread, which runs this blocking function.
It would be nicer to switch all of this to by async, perhaps we should discuss for the next iteration.
| key_to_job[key] = lm.finetune(**finetune_kwargs) | ||
|
|
||
| key_to_lm = {} | ||
| for ind, (key, job) in enumerate(key_to_job.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this chunk? We are waiting for all jobs to finish in the order of creation. Shall we create an event for each job thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Event loop is a good idea!
Here we don't mind waiting because we need all the predictors' LMs to be trained in order to have a complete, optimized program.
|
|
||
| class BootstrapFinetune(Teleprompter): | ||
|
|
||
| # TODO(PR) check with team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add docstring to explain what these args do (they are not very self-explanable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! Holding off for now in case we decide to change the, making a note for my next update
|
|
||
| data = [] | ||
| adapter = self.adapter[lm] or lm.infer_adapter() | ||
| data_format = infer_data_format(adapter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having constructor take in adapters, and infer the data format from the adapter. Shall we set data_format in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the interaction here could be improved, but I don't have concrete proposals.
Either way, we would need to access an "adapter", that may not be directly inferred from the data_format (Here it can be inferred, but this is not true for all possible data_formats we could have, say prompt, chosen, rejected) Making a note of this for a future discussion.
| ) | ||
| # TODO(PR): Should "trace" not be included in the lambda function? | ||
| _metric = metric if metric else lambda example, prediction: 1 | ||
| evaluator(program, metric=_metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are doing a full eval here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using the evaluator to make use of the threaded sampling
| prediction=prediction, | ||
| trace=trace, | ||
| ) | ||
| if metric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to filter out examples that are below some threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is leaving that to the caller and just worrying about collecting the data -- BootstrapFinetune is filtering the examples (but other optimizers may not)
I accidentally clicked on the approval button...
| messages = self.format(signature, demos, inputs) | ||
|
|
||
| # Add the assistant message | ||
| role = "assistant" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just pass the keyword=value to the function directly
| from dspy.utils.logging import logger | ||
|
|
||
|
|
||
| class TrainingJob(Future): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a docstring that says why we inherit from Future, what are we trying to gain from that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It allow us to use the following methods to check on the training status of a job: done(), result() and set_result())
6040baf to
218b71e
Compare
No description provided.