Conversation
Signed-off-by: Saad Zaher <szaher@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Pull Request Test Coverage Report for Build 21919346642Details
💛 - Coveralls |
There was a problem hiding this comment.
Pull request overview
This PR introduces a comprehensive design proposal for specialized trainer abstractions in the Kubeflow SDK. The proposal addresses current limitations in the SDK's trainer subsystem by introducing framework-aware trainer classes and a dedicated runtime configuration system.
Changes:
- Proposes a
BaseTrainerabstract interface enabling polymorphic handling of trainers across the SDK - Defines Tier 1 framework-specific trainers (TorchTrainer, MPITrainer, JAXTrainer, XGBoostTrainer) with automatic runtime discovery
- Introduces
RuntimeConfigdataclass to separate runtime environment settings from training logic - Provides comprehensive design details including auto-discovery, validation, backward compatibility, test plans, and implementation phases
|
|
||
| from abc import ABC, abstractmethod | ||
| from dataclasses import dataclass, field | ||
| from typing import Callable, Optional |
There was a problem hiding this comment.
Missing import for ClassVar which is used on line 240. Add from typing import ClassVar or include it in the existing typing import.
| from typing import Callable, Optional | |
| from typing import Callable, Optional, ClassVar |
| index_urls: list[str] = field( | ||
| default_factory=lambda: list(constants.DEFAULT_PIP_INDEX_URLS) | ||
| ) | ||
| quiet: bool = True |
There was a problem hiding this comment.
The quiet field has a default value of True, but the guideline notes that booleans should default to False unless there's a specific reason. Consider whether suppressing pip output by default is the desired behavior, or if it should default to False for better debugging visibility.
| """Trainer for HuggingFace Transformers training. | ||
|
|
||
| Wraps HuggingFace's Trainer API and maps to a PyTorch runtime. |
There was a problem hiding this comment.
Inconsistent spelling of "Hugging Face". The official company/product name is "Hugging Face" (two words), not "HuggingFace". This appears in multiple locations including class names, comments, and documentation. While HuggingFaceTrainer as a class name (PascalCase without spaces) is acceptable, the documentation text should use "Hugging Face" (two words).
| """Trainer for HuggingFace Transformers training. | |
| Wraps HuggingFace's Trainer API and maps to a PyTorch runtime. | |
| """Trainer for Hugging Face Transformers training. | |
| Wraps Hugging Face's Trainer API and maps to a PyTorch runtime. |
| """Trainer for HuggingFace Transformers training. | ||
|
|
||
| Wraps HuggingFace's Trainer API and maps to a PyTorch runtime. |
There was a problem hiding this comment.
Inconsistent spelling of "Hugging Face". The official company/product name is "Hugging Face" (two words), not "HuggingFace". Use "Hugging Face" in documentation text.
| """Trainer for HuggingFace Transformers training. | |
| Wraps HuggingFace's Trainer API and maps to a PyTorch runtime. | |
| """Trainer for Hugging Face Transformers training. | |
| Wraps Hugging Face's Trainer API and maps to a PyTorch runtime. |
| 3. **Deprecating `CustomTrainer` or `BuiltinTrainer`.** Both remain supported. | ||
| Specialized trainers are an additional option, not a replacement. | ||
| 4. **Tier 2 trainer implementations.** This proposal defines the extension mechanism | ||
| and interface. Concrete Tier 2 implementations (HuggingFace, DeepSpeed, Unsloth, |
There was a problem hiding this comment.
Inconsistent spelling of "Hugging Face". The official company/product name is "Hugging Face" (two words), not "HuggingFace". Use "Hugging Face" in documentation text.
| and interface. Concrete Tier 2 implementations (HuggingFace, DeepSpeed, Unsloth, | |
| and interface. Concrete Tier 2 implementations (Hugging Face, DeepSpeed, Unsloth, |
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #
Checklist: