Databricks finetuning integration #1770

chenmoneygithub · 2024-11-07T00:26:00Z

Add Databricks finetuning support.

Most of the complexity results from orchestrating the databricks jobs, this integration handles the following things:

Create a directory in Databricks Unity Catalog and upload the training data to the directory.
Kick off the finetuning job.
After the finetuning is done, automatically create/update the endpoint on databricks model serving platform.
The create endpoint's name is returned, and used to override the original LM.

Similar to OpenAI provider, we are making blocking calls in finetune() method, which means the method will block until the serving endpoint is successfully created (or crash in the middle). The blocking method is wrapped in the LM.finetune() method, which sends this blocking method into a background thread. But essentially as the optimizer waits for the finetuning to finish before proceeding, this is still blocking call.

Below is the full log of running the examples/finetune/databricks_finetune.py:

(dspy) (base) *[databricks-finetuning][~/Documents/mlflow_team/dspy]$ python3 bsft_tmp.py
/Users/chen.qian/miniconda3/envs/dspy/lib/python3.12/site-packages/datasets/load.py:1454: FutureWarning: The repository for PolyAI/banking77 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/PolyAI/banking77
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
  warnings.warn(
Try the original model:  Prediction(
    reasoning='The user is inquiring about the status of their card, which they are still waiting to receive. This suggests that the user is experiencing an issue with their card delivery, and is seeking an update on when they can expect to receive it.',
    answer=11
)
[BootstrapFinetune] Preparing the student and teacher programs...
Ensuring that the student is not compiled
No teacher provided. Using a copy of the student program as the teacher.
[BootstrapFinetune] Bootstrapping data...
Average Metric: 10 / 10  (100.0): 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 729.08it/s]
2024/11/07 15:59:19 INFO dspy.evaluate.evaluate: Average Metric: 10 / 10 (100.0%)
[BootstrapFinetune] Preparing the train data...
[BootstrapFinetune] Collected data for 10 examples
[BootstrapFinetune] After filtering for score, 10 examples remain
Using 10 data points for fine-tuning the model: databricks/databricks-meta-llama-3-1-70b-instruct
[BootstrapFinetune] Starting LM fine-tuning...
[BootstrapFinetune] 1 fine-tuning job(s) to start
[BootstrapFinetune] Starting 1 fine-tuning jobs...
2024/11/07 15:59:20 INFO dspy.clients.databricks: Directory /Volumes/main/chenmoney/testing/dspy_testing/classification already exists, skip creating it.
2024/11/07 15:59:21 INFO dspy.clients.databricks: Starting finetuning on Databricks... this might take a few minutes to finish.
2024/11/07 16:10:32 INFO dspy.clients.databricks: Finetuning run completed successfully!
2024/11/07 16:10:34 INFO dspy.clients.databricks: Creating serving endpoint main_chenmoney_finetuned_model_classification on Databricks model serving!
2024/11/07 16:10:37 INFO dspy.clients.databricks: Successfully started creating/updating serving endpoint main_chenmoney_finetuned_model_classification on Databricks model serving!
2024/11/07 16:10:37 INFO dspy.clients.databricks: Waiting for serving endpoint main_chenmoney_finetuned_model_classification to be ready, this might take a few minutes... You can check the status of the endpoint at https://e2-dogfood.staging.cloud.databricks.com/ml/endpoints/main_chenmoney_finetuned_model_classification
2024/11/07 16:22:45 INFO dspy.clients.databricks: Databricks model serving endpoint main_chenmoney_finetuned_model_classification is ready!
Job 1/1 completed.
[BootstrapFinetune] Updating the student program with the fine-tuned LMs...
[BootstrapFinetune] BootstrapFinetune has finished compiling the student program

okhat · 2024-11-07T01:04:58Z

dspy/clients/databricks.py

+        job.launch_started = True
+        model_to_deploy = train_kwargs.get("register_to")
+        job.endpoint_name = model_to_deploy.replace(".", "_")
+        DatabricksProvider.deploy_finetuned_model(model_to_deploy, data_format, databricks_host, databricks_token)


Should this be here or inside launch?

Ideally this should, but the current implementation of LM and optimizer requires finetune() to return a ready-to-use endpoint. This is actually one thing I would like to improve, but I decided to stick to the current design to speed things up.

I will work on the general improvements during my side time afterwards.

small fix some fixes

chenmoneygithub marked this pull request as draft November 7, 2024 00:26

okhat reviewed Nov 7, 2024

View reviewed changes

chenmoneygithub force-pushed the databricks-finetuning branch from 89fbda8 to d468f4f Compare November 7, 2024 01:38

Databricks finetuning

3c0554c

small fix some fixes

chenmoneygithub force-pushed the databricks-finetuning branch from d468f4f to 3c0554c Compare November 7, 2024 21:07

chenmoneygithub changed the base branch from dev_finetune_update to main November 7, 2024 21:08

chenmoneygithub force-pushed the databricks-finetuning branch from 1d73ca8 to 669c9b1 Compare November 8, 2024 00:42

chenmoneygithub marked this pull request as ready for review November 8, 2024 00:43

chenmoneygithub changed the title ~~[WIP] Databricks finetuning integration~~ Databricks finetuning integration Nov 8, 2024

add examples

882e9ec

chenmoneygithub force-pushed the databricks-finetuning branch from 669c9b1 to 882e9ec Compare November 8, 2024 00:44

okhat merged commit cdee3b4 into stanfordnlp:main Nov 8, 2024
4 checks passed

chenmoneygithub deleted the databricks-finetuning branch December 27, 2024 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Databricks finetuning integration #1770

Databricks finetuning integration #1770

Uh oh!

chenmoneygithub commented Nov 7, 2024 •

edited

Loading

Uh oh!

okhat Nov 7, 2024

Uh oh!

chenmoneygithub Nov 7, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Databricks finetuning integration #1770

Databricks finetuning integration #1770

Uh oh!

Conversation

chenmoneygithub commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

okhat Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenmoneygithub commented Nov 7, 2024 •

edited

Loading