Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Create LLM Hyperparameters Optimization API Proposal #2333

Merged
merged 22 commits into from
Jul 25, 2024

Conversation

helenxie-bit
Copy link
Contributor

What this PR does / why we need it:
Give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@andreyvelich
Copy link
Member

/area gsoc

@andreyvelich
Copy link
Member

Ref issue: #2291

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @helenxie-bit.
I left initial questions.
Please take a look @kubeflow/wg-training-leads @deepanker13 @droctothorpe.

docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
@helenxie-bit helenxie-bit changed the title Create LLM Hyperparameters Tuning API Proposal [GSoC]Create LLM Hyperparameters Tuning API Proposal Jun 10, 2024
@andreyvelich andreyvelich mentioned this pull request Jun 14, 2024
10 tasks
@helenxie-bit helenxie-bit changed the title [GSoC]Create LLM Hyperparameters Tuning API Proposal [GSoC]Create LLM Hyperparameters Optimization API Proposal Jun 16, 2024
docs/proposals/llm-hyperparameter-optimization-api.md Outdated Show resolved Hide resolved
docs/images/design_api.jpg Outdated Show resolved Hide resolved
docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
docs/proposals/llm-hyperparameters-tuning-api.md Outdated Show resolved Hide resolved
@helenxie-bit
Copy link
Contributor Author

@andreyvelich I have removed the objective function and modified the API design based on our discussion.

In the updated version, users must provide parameters such as model_provider_parameters, dataset_provider_parameters, and trainer_parameters to the tune API. The hyperparameter search space is now defined within trainer_parameters for ease of use. The tune API will then download the pretrained models and datasets using storage_initializer, and create the experiment and trials automatically.

Please review the changes and let me know if you have any feedback or suggestions!

helenxie-bit and others added 9 commits June 18, 2024 21:31
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
…rom tune api

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
…ation' part

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
@johnugeorge
Copy link
Member

Other than consistent naming, it looks good

…ive function

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
"storage_class": None,
"access_modes": constants.PVC_DEFAULT_ACCESS_MODES,
},
objective: Optional[Callable] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to using the Storage Initializer to download models and datasets from HuggingFace or S3, I added objective and base_image parameters to allow users to define their own objective function. This approach enables us to modify the existing tune API directly instead of developing another one. What do you think? @andreyvelich @johnugeorge

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should validate that user sets one of objective or trainer_parameters argument in the tune API. Other idea might be to give user ability to set HuggingFaceTrainerParams directly in objective argument. E.g.:

objective = HuggingFaceTrainerParams(
		training_parameters = transformers.TrainingArguments(
			output_dir = "test_trainer",
			save_strategy = "no",
			eval_strategy = "epoch",
			disable_tqdm = True,
			log_level = "info",
			learning_rate = katib.search.double(min=1e-05, max=5e-05),
			per_device_train_batch_size = katib.search.categorical([8, 16, 32]),
			num_train_epochs = katib.search.int(min=1, max=10),
			weight_decay = katib.search.double(min=0.0, max=1.0),
		),
		# Set LoRA config to reduce number of trainable model parameters.
		lora_config = LoraConfig(
			r = katib.search.int(min=8, max=32),
			lora_alpha = 8,
			lora_dropout = 0.1,
			bias = "none",
		),
	),	

I understand that this will be in-consistent with train API in training client, but it will help us avoid unnecessary validation for one of
WDYT @helenxie-bit @johnugeorge ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, it's better to define objective and trainer_parameters separately. In this way, we can keep the trainer_parameters argument (instead of setting HuggingFaceTrainerParams directly in the objective argument, which would render trainer_parameters useless), and this approach is also clearer for users. Users will have two options:

  1. Use external models and datasets: Users need to set model_provider_parameters, dataset_provider_parameters, storage_config, and trainer_parameters.
  2. Define a custom objective function: Users need to set objective and trainer_parameters.

We can add validation for this logic, and it should not be difficult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich @johnugeorge According to today's discussion, I will keep these parameters as they are for now. We can make adjustments in the next version.

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
)
```

_#WIP_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this.

Copy link
Contributor Author

@helenxie-bit helenxie-bit Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich I have removed the "WIP" tag. Please take a final look at the proposal to identify any other issues.

"storage_class": None,
"access_modes": constants.PVC_DEFAULT_ACCESS_MODES,
},
objective: Optional[Callable] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should validate that user sets one of objective or trainer_parameters argument in the tune API. Other idea might be to give user ability to set HuggingFaceTrainerParams directly in objective argument. E.g.:

objective = HuggingFaceTrainerParams(
		training_parameters = transformers.TrainingArguments(
			output_dir = "test_trainer",
			save_strategy = "no",
			eval_strategy = "epoch",
			disable_tqdm = True,
			log_level = "info",
			learning_rate = katib.search.double(min=1e-05, max=5e-05),
			per_device_train_batch_size = katib.search.categorical([8, 16, 32]),
			num_train_epochs = katib.search.int(min=1, max=10),
			weight_decay = katib.search.double(min=0.0, max=1.0),
		),
		# Set LoRA config to reduce number of trainable model parameters.
		lora_config = LoraConfig(
			r = katib.search.int(min=8, max=32),
			lora_alpha = 8,
			lora_dropout = 0.1,
			bias = "none",
		),
	),	

I understand that this will be in-consistent with train API in training client, but it will help us avoid unnecessary validation for one of
WDYT @helenxie-bit @johnugeorge ?

Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
@helenxie-bit helenxie-bit changed the title [GSoC]Create LLM Hyperparameters Optimization API Proposal [GSoC] Create LLM Hyperparameters Optimization API Proposal Jul 21, 2024
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
Signed-off-by: helenxie-bit <helenxiehz@gmail.com>
@andreyvelich
Copy link
Member

Related: #2339

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can merge it.
We can create followup PRs for additional changes.
Thanks for driving this @helenxie-bit!
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 2c57522 into kubeflow:master Jul 25, 2024
60 checks passed
@helenxie-bit
Copy link
Contributor Author

@andreyvelich Of course! Thank you for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants