[GSoC] Create LLM Hyperparameters Optimization API Proposal #2333

helenxie-bit · 2024-05-20T03:41:05Z

What this PR does / why we need it:
Give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

andreyvelich · 2024-05-24T19:36:38Z

/area gsoc

andreyvelich · 2024-05-24T19:37:57Z

Ref issue: #2291

andreyvelich

Thank you for this @helenxie-bit.
I left initial questions.
Please take a look @kubeflow/wg-training-leads @deepanker13 @droctothorpe.

docs/proposals/llm-hyperparameters-tuning-api.md

docs/images/design_api.jpg

docs/proposals/llm-hyperparameter-optimization-api.md

docs/images/design_api.jpg

docs/proposals/llm-hyperparameters-tuning-api.md

helenxie-bit · 2024-06-18T03:23:17Z

@andreyvelich I have removed the objective function and modified the API design based on our discussion.

In the updated version, users must provide parameters such as model_provider_parameters, dataset_provider_parameters, and trainer_parameters to the tune API. The hyperparameter search space is now defined within trainer_parameters for ease of use. The tune API will then download the pretrained models and datasets using storage_initializer, and create the experiment and trials automatically.

Please review the changes and let me know if you have any feedback or suggestions!

docs/proposals/llm-hyperparameter-optimization-api.md

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

…rom tune api Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

…ation' part Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

docs/proposals/llm-hyperparameter-optimization-api.md

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

docs/proposals/llm-hyperparameter-optimization-api.md

johnugeorge · 2024-07-01T04:06:53Z

Other than consistent naming, it looks good

…ive function Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

helenxie-bit · 2024-07-03T01:00:52Z

docs/proposals/llm-hyperparameter-optimization-api.md

+            "storage_class": None,
+            "access_modes": constants.PVC_DEFAULT_ACCESS_MODES,
+        },
+		objective: Optional[Callable] = None,


In addition to using the Storage Initializer to download models and datasets from HuggingFace or S3, I added objective and base_image parameters to allow users to define their own objective function. This approach enables us to modify the existing tune API directly instead of developing another one. What do you think? @andreyvelich @johnugeorge

Yes, we should validate that user sets one of objective or trainer_parameters argument in the tune API. Other idea might be to give user ability to set HuggingFaceTrainerParams directly in objective argument. E.g.:

objective = HuggingFaceTrainerParams( training_parameters = transformers.TrainingArguments( output_dir = "test_trainer", save_strategy = "no", eval_strategy = "epoch", disable_tqdm = True, log_level = "info", learning_rate = katib.search.double(min=1e-05, max=5e-05), per_device_train_batch_size = katib.search.categorical([8, 16, 32]), num_train_epochs = katib.search.int(min=1, max=10), weight_decay = katib.search.double(min=0.0, max=1.0), ), # Set LoRA config to reduce number of trainable model parameters. lora_config = LoraConfig( r = katib.search.int(min=8, max=32), lora_alpha = 8, lora_dropout = 0.1, bias = "none", ), ),

I understand that this will be in-consistent with train API in training client, but it will help us avoid unnecessary validation for one of
WDYT @helenxie-bit @johnugeorge ?

From my perspective, it's better to define objective and trainer_parameters separately. In this way, we can keep the trainer_parameters argument (instead of setting HuggingFaceTrainerParams directly in the objective argument, which would render trainer_parameters useless), and this approach is also clearer for users. Users will have two options:

Use external models and datasets: Users need to set model_provider_parameters, dataset_provider_parameters, storage_config, and trainer_parameters.

Define a custom objective function: Users need to set objective and trainer_parameters.

We can add validation for this logic, and it should not be difficult.

@andreyvelich @johnugeorge According to today's discussion, I will keep these parameters as they are for now. We can make adjustments in the next version.

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

andreyvelich · 2024-07-01T16:15:00Z

docs/proposals/llm-hyperparameter-optimization-api.md

+)
+```
+
+_#WIP_


We can remove this.

@andreyvelich I have removed the "WIP" tag. Please take a final look at the proposal to identify any other issues.

andreyvelich · 2024-07-05T23:42:48Z

docs/proposals/llm-hyperparameter-optimization-api.md

+            "storage_class": None,
+            "access_modes": constants.PVC_DEFAULT_ACCESS_MODES,
+        },
+		objective: Optional[Callable] = None,


Yes, we should validate that user sets one of objective or trainer_parameters argument in the tune API. Other idea might be to give user ability to set HuggingFaceTrainerParams directly in objective argument. E.g.:

objective = HuggingFaceTrainerParams( training_parameters = transformers.TrainingArguments( output_dir = "test_trainer", save_strategy = "no", eval_strategy = "epoch", disable_tqdm = True, log_level = "info", learning_rate = katib.search.double(min=1e-05, max=5e-05), per_device_train_batch_size = katib.search.categorical([8, 16, 32]), num_train_epochs = katib.search.int(min=1, max=10), weight_decay = katib.search.double(min=0.0, max=1.0), ), # Set LoRA config to reduce number of trainable model parameters. lora_config = LoraConfig( r = katib.search.int(min=8, max=32), lora_alpha = 8, lora_dropout = 0.1, bias = "none", ), ),

I understand that this will be in-consistent with train API in training client, but it will help us avoid unnecessary validation for one of
WDYT @helenxie-bit @johnugeorge ?

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

andreyvelich · 2024-07-25T14:08:11Z

Related: #2339

andreyvelich

I think, we can merge it.
We can create followup PRs for additional changes.
Thanks for driving this @helenxie-bit!
/lgtm
/approve

google-oss-prow · 2024-07-25T14:09:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

helenxie-bit · 2024-07-25T15:45:51Z

@andreyvelich Of course! Thank you for reviewing!

google-oss-prow bot requested review from anencore94, gaocegege and johnugeorge May 20, 2024 03:41

google-oss-prow bot added the size/L label May 20, 2024

google-oss-prow bot added the area/gsoc label May 24, 2024

helenxie-bit mentioned this pull request Jun 1, 2024

[GSoC] Project 4 Tracking Issue: Hyperparameter Optimization API in Katib for LLMs #2339

Open

11 tasks

andreyvelich reviewed Jun 3, 2024

View reviewed changes

helenxie-bit changed the title ~~Create LLM Hyperparameters Tuning API Proposal~~ [GSoC]Create LLM Hyperparameters Tuning API Proposal Jun 10, 2024

andreyvelich mentioned this pull request Jun 14, 2024

[Release] Katib 0.17 Roadmap #2255

Closed

10 tasks

helenxie-bit changed the title ~~[GSoC]Create LLM Hyperparameters Tuning API Proposal~~ [GSoC]Create LLM Hyperparameters Optimization API Proposal Jun 16, 2024

andreyvelich reviewed Jun 17, 2024

View reviewed changes

docs/images/design_api.jpg Outdated Show resolved Hide resolved

helenxie-bit commented Jun 17, 2024

View reviewed changes

helenxie-bit mentioned this pull request Jun 18, 2024

Add helenxie-bit as member kubeflow/internal-acls#683

Merged

andreyvelich reviewed Jun 18, 2024

View reviewed changes

helenxie-bit and others added 9 commits June 18, 2024 21:31

create llm hyperparameters tuning api proposal

1af0f3c

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

update llm hyperparameters tuning api proposal

830a259

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

update proposal

8f604a7

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

fix some typos

b34eb47

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

update the path of image and delete parameter 'resouces_per_worker' f…

f83070a

…rom tune api Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

delete objective function and adjust the design of tune API

7e8904b

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

Update docs/proposals/llm-hyperparameter-optimization-api.md

3664243

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

Move 'Advanced Functionalities' to 'Non-Goals', and update 'Implement…

2a4e3d3

…ation' part Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

update 'pytorch_config'

5ed301a

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

helenxie-bit force-pushed the proposal branch from c401a9b to 5ed301a Compare June 19, 2024 04:32

helenxie-bit added 3 commits June 24, 2024 14:37

change the name of 'pytorch_config' to 'resources_per_trial'

7762414

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

adjust format

b343383

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

adjust format

dc75517

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

adjust format

8346548

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

helenxie-bit commented Jun 24, 2024

View reviewed changes

docs/proposals/llm-hyperparameter-optimization-api.md Outdated Show resolved Hide resolved

helenxie-bit added 2 commits June 28, 2024 00:20

update implementation part and the type of 'resources_per_trial'

bd82dca

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

update the example

5bcfadb

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

helenxie-bit commented Jun 28, 2024

View reviewed changes

docs/proposals/llm-hyperparameter-optimization-api.md Show resolved Hide resolved

andreyvelich reviewed Jun 28, 2024

View reviewed changes

docs/proposals/llm-hyperparameter-optimization-api.md Outdated Show resolved Hide resolved

update 'resources_per_trial'& add one more option for defining object…

c69bf0f

…ive function Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

helenxie-bit commented Jul 3, 2024

View reviewed changes

fix typo errors

1b499b4

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

andreyvelich reviewed Jul 5, 2024

View reviewed changes

delete 'WIP' tag

002e0ea

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

helenxie-bit mentioned this pull request Jul 21, 2024

[GSoC] Update tune API for LLM hyperparameters optimization #2393

Open

update example

54979d7

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

helenxie-bit changed the title ~~[GSoC]Create LLM Hyperparameters Optimization API Proposal~~ [GSoC] Create LLM Hyperparameters Optimization API Proposal Jul 21, 2024

helenxie-bit added 3 commits July 21, 2024 00:37

update example

1526a9d

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

update example

1a2881c

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

fix format

dac1dc3

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>

andreyvelich reviewed Jul 25, 2024

View reviewed changes

google-oss-prow bot assigned andreyvelich Jul 25, 2024

google-oss-prow bot added the lgtm label Jul 25, 2024

google-oss-prow bot added the approved label Jul 25, 2024

google-oss-prow bot merged commit 2c57522 into kubeflow:master Jul 25, 2024
60 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC] Create LLM Hyperparameters Optimization API Proposal #2333

[GSoC] Create LLM Hyperparameters Optimization API Proposal #2333

helenxie-bit commented May 20, 2024

andreyvelich commented May 24, 2024

andreyvelich commented May 24, 2024

andreyvelich left a comment

helenxie-bit commented Jun 18, 2024

johnugeorge commented Jul 1, 2024

helenxie-bit Jul 3, 2024

andreyvelich Jul 5, 2024

helenxie-bit Jul 6, 2024

helenxie-bit Jul 15, 2024

andreyvelich Jul 1, 2024

helenxie-bit Jul 15, 2024 •

edited

Loading

andreyvelich Jul 5, 2024

andreyvelich commented Jul 25, 2024

andreyvelich left a comment

google-oss-prow bot commented Jul 25, 2024

helenxie-bit commented Jul 25, 2024

+              )
+              ```
+              _#WIP_

[GSoC] Create LLM Hyperparameters Optimization API Proposal #2333

[GSoC] Create LLM Hyperparameters Optimization API Proposal #2333

Conversation

helenxie-bit commented May 20, 2024

andreyvelich commented May 24, 2024

andreyvelich commented May 24, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

helenxie-bit commented Jun 18, 2024

johnugeorge commented Jul 1, 2024

helenxie-bit Jul 3, 2024

Choose a reason for hiding this comment

andreyvelich Jul 5, 2024

Choose a reason for hiding this comment

helenxie-bit Jul 6, 2024

Choose a reason for hiding this comment

helenxie-bit Jul 15, 2024

Choose a reason for hiding this comment

andreyvelich Jul 1, 2024

Choose a reason for hiding this comment

helenxie-bit Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

andreyvelich Jul 5, 2024

Choose a reason for hiding this comment

andreyvelich commented Jul 25, 2024

andreyvelich left a comment

Choose a reason for hiding this comment

google-oss-prow bot commented Jul 25, 2024

helenxie-bit commented Jul 25, 2024

helenxie-bit Jul 15, 2024 •

edited

Loading