Skip to content

Conversation

@chenmoneygithub
Copy link
Collaborator

@chenmoneygithub chenmoneygithub commented Nov 7, 2024

Add Databricks finetuning support.

Most of the complexity results from orchestrating the databricks jobs, this integration handles the following things:

  • Create a directory in Databricks Unity Catalog and upload the training data to the directory.
  • Kick off the finetuning job.
  • After the finetuning is done, automatically create/update the endpoint on databricks model serving platform.
  • The create endpoint's name is returned, and used to override the original LM.

Similar to OpenAI provider, we are making blocking calls in finetune() method, which means the method will block until the serving endpoint is successfully created (or crash in the middle). The blocking method is wrapped in the LM.finetune() method, which sends this blocking method into a background thread. But essentially as the optimizer waits for the finetuning to finish before proceeding, this is still blocking call.

Below is the full log of running the examples/finetune/databricks_finetune.py:

(dspy) (base) *[databricks-finetuning][~/Documents/mlflow_team/dspy]$ python3 bsft_tmp.py
/Users/chen.qian/miniconda3/envs/dspy/lib/python3.12/site-packages/datasets/load.py:1454: FutureWarning: The repository for PolyAI/banking77 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/PolyAI/banking77
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
Try the original model:  Prediction(
    reasoning='The user is inquiring about the status of their card, which they are still waiting to receive. This suggests that the user is experiencing an issue with their card delivery, and is seeking an update on when they can expect to receive it.',
    answer=11
)
[BootstrapFinetune] Preparing the student and teacher programs...
Ensuring that the student is not compiled
No teacher provided. Using a copy of the student program as the teacher.
[BootstrapFinetune] Bootstrapping data...
Average Metric: 10 / 10  (100.0): 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 729.08it/s]
2024/11/07 15:59:19 INFO dspy.evaluate.evaluate: Average Metric: 10 / 10 (100.0%)
[BootstrapFinetune] Preparing the train data...
[BootstrapFinetune] Collected data for 10 examples
[BootstrapFinetune] After filtering for score, 10 examples remain
Using 10 data points for fine-tuning the model: databricks/databricks-meta-llama-3-1-70b-instruct
[BootstrapFinetune] Starting LM fine-tuning...
[BootstrapFinetune] 1 fine-tuning job(s) to start
[BootstrapFinetune] Starting 1 fine-tuning jobs...
2024/11/07 15:59:20 INFO dspy.clients.databricks: Directory /Volumes/main/chenmoney/testing/dspy_testing/classification already exists, skip creating it.
2024/11/07 15:59:21 INFO dspy.clients.databricks: Starting finetuning on Databricks... this might take a few minutes to finish.
2024/11/07 16:10:32 INFO dspy.clients.databricks: Finetuning run completed successfully!
2024/11/07 16:10:34 INFO dspy.clients.databricks: Creating serving endpoint main_chenmoney_finetuned_model_classification on Databricks model serving!
2024/11/07 16:10:37 INFO dspy.clients.databricks: Successfully started creating/updating serving endpoint main_chenmoney_finetuned_model_classification on Databricks model serving!
2024/11/07 16:10:37 INFO dspy.clients.databricks: Waiting for serving endpoint main_chenmoney_finetuned_model_classification to be ready, this might take a few minutes... You can check the status of the endpoint at https://e2-dogfood.staging.cloud.databricks.com/ml/endpoints/main_chenmoney_finetuned_model_classification
2024/11/07 16:22:45 INFO dspy.clients.databricks: Databricks model serving endpoint main_chenmoney_finetuned_model_classification is ready!
Job 1/1 completed.
[BootstrapFinetune] Updating the student program with the fine-tuned LMs...
[BootstrapFinetune] BootstrapFinetune has finished compiling the student program

@chenmoneygithub chenmoneygithub marked this pull request as draft November 7, 2024 00:26
job.launch_started = True
model_to_deploy = train_kwargs.get("register_to")
job.endpoint_name = model_to_deploy.replace(".", "_")
DatabricksProvider.deploy_finetuned_model(model_to_deploy, data_format, databricks_host, databricks_token)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be here or inside launch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this should, but the current implementation of LM and optimizer requires finetune() to return a ready-to-use endpoint. This is actually one thing I would like to improve, but I decided to stick to the current design to speed things up.

I will work on the general improvements during my side time afterwards.

small fix

some fixes
@chenmoneygithub chenmoneygithub changed the base branch from dev_finetune_update to main November 7, 2024 21:08
@chenmoneygithub chenmoneygithub marked this pull request as ready for review November 8, 2024 00:43
@chenmoneygithub chenmoneygithub changed the title [WIP] Databricks finetuning integration Databricks finetuning integration Nov 8, 2024
@okhat okhat merged commit cdee3b4 into stanfordnlp:main Nov 8, 2024
4 checks passed
@chenmoneygithub chenmoneygithub deleted the databricks-finetuning branch December 27, 2024 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants