Skip to content

Conversation

@dilarasoylu
Copy link
Collaborator

No description provided.

@okhat
Copy link
Collaborator

okhat commented Oct 27, 2024

Note to self: we may need to think about the role of adapter retries at (i) bootstrapping time and (ii) inference time with the FT'ed model. My basic intuition is that (i) is fine but that (ii) needs the adapter with which finetuning was conducted to be attached to the program permanently and saved with it, etc.

if not data_format:
adapter = self.infer_adapter()
data_format = infer_data_format(adapter)
validate_data_format(data=train_data, data_format=data_format)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data validation applicable to all finetunes

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Can we create a notebook (Colab or other format) to demonstrate our use case from end to end? We may want to ship a code example together with this PR.

return format_turn(signature, values, role, incomplete)

# TODO(PR): Looks ok?
def format_finetune_data(self, signature, demos, inputs, outputs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is generally true for all finetuning service?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function looks good to me.
the provider class for an unusual service can just reformat the data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case, DataFormat is the object that represents the format for fine-tuning. The purpose of this adapter method is to re-assemble the actual call that's made with its completion and prompt. We could re-name to make this clear.

from dspy.clients.finetune import (
FinetuneJob,
TrainingMethod,
# TrainingMethod,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a ruff fix change. The anyscale.py file needs to be separately updated to match our new interface.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm should we remove this TrainingMethod instead of commenting it out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes!

# Remember to update LM.copy() if you modify the constructor!
if launch_kwargs is not None or provider is not None:
import dspy
assert dspy.settings.experimental, "The `launch_kwargs` and `provider` arguments are experimental. Set `dspy.settings.experimental` to `True` to use them."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general assert statement is discouraged: https://google.github.io/styleguide/pyguide.html#244-decision. For this case, we can throw a ValueError.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine for now

return job

def _run_finetune_job(self, job: TrainingJob):
# TODO(enhance): We should listen for keyboard interrupts somewhere.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it? I was assuming finetune will return immediate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point -- finetune returns immediately after starting a thread, which runs this blocking function.

It would be nicer to switch all of this to by async, perhaps we should discuss for the next iteration.

key_to_job[key] = lm.finetune(**finetune_kwargs)

key_to_lm = {}
for ind, (key, job) in enumerate(key_to_job.items()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this chunk? We are waiting for all jobs to finish in the order of creation. Shall we create an event for each job thread?

Copy link
Collaborator Author

@dilarasoylu dilarasoylu Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event loop is a good idea!

Here we don't mind waiting because we need all the predictors' LMs to be trained in order to have a complete, optimized program.


class BootstrapFinetune(Teleprompter):

# TODO(PR) check with team
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add docstring to explain what these args do (they are not very self-explanable)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! Holding off for now in case we decide to change the, making a note for my next update


data = []
adapter = self.adapter[lm] or lm.infer_adapter()
data_format = infer_data_format(adapter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having constructor take in adapters, and infer the data format from the adapter. Shall we set data_format in the constructor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the interaction here could be improved, but I don't have concrete proposals.

Either way, we would need to access an "adapter", that may not be directly inferred from the data_format (Here it can be inferred, but this is not true for all possible data_formats we could have, say prompt, chosen, rejected) Making a note of this for a future discussion.

)
# TODO(PR): Should "trace" not be included in the lambda function?
_metric = metric if metric else lambda example, prediction: 1
evaluator(program, metric=_metric)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are doing a full eval here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using the evaluator to make use of the threaded sampling

prediction=prediction,
trace=trace,
)
if metric:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to filter out examples that are below some threshold?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is leaving that to the caller and just worrying about collecting the data -- BootstrapFinetune is filtering the examples (but other optimizers may not)

@chenmoneygithub chenmoneygithub self-requested a review October 29, 2024 21:23
@chenmoneygithub chenmoneygithub dismissed their stale review October 29, 2024 21:24

I accidentally clicked on the approval button...

messages = self.format(signature, demos, inputs)

# Add the assistant message
role = "assistant"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just pass the keyword=value to the function directly

from dspy.utils.logging import logger


class TrainingJob(Future):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a docstring that says why we inherit from Future, what are we trying to gain from that here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It allow us to use the following methods to check on the training status of a job: done(), result() and set_result())

@okhat okhat self-requested a review November 7, 2024 18:04
@okhat okhat force-pushed the dev_finetune_update branch 3 times, most recently from 6040baf to 218b71e Compare November 7, 2024 18:49
@okhat okhat merged commit 1e539b0 into main Nov 7, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants