chore: Specialized Trainers Proposal#286

Closed

szaher wants to merge 1 commit intokubeflow:mainfrom

Member

szaher commented Feb 11, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #

Checklist:

Docs included if any changes are user facing


          Specialized Trainers Proposal

fc53bdc

Signed-off-by: Saad Zaher <szaher@redhat.com>

Copilot AI review requested due to automatic review settings

February 11, 2026 19:13

google-oss-prow bot requested a review from astefanutti

February 11, 2026 19:13

Contributor

google-oss-prow bot commented Feb 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow bot requested a review from Electronic-Waste

February 11, 2026 19:13

szaher changed the title ~~Specialized Trainers Proposal~~ docs: Specialized Trainers Proposal

google-oss-prow bot added the size/XL label

Copilot started reviewing on behalf of szaher

February 11, 2026 19:13

szaher changed the title ~~docs: Specialized Trainers Proposal~~ feat: Specialized Trainers Proposal

szaher changed the title ~~feat: Specialized Trainers Proposal~~ chore: Specialized Trainers Proposal

szaher closed this

szaher deleted the KEP-285 branch

February 11, 2026 19:15

coveralls commented Feb 11, 2026

Pull Request Test Coverage Report for Build 21919346642

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 67.948%

Totals
Change from base Build 21919105330:	0.0%
Covered Lines:	2828
Relevant Lines:	4162

💛 - Coveralls

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR introduces a comprehensive design proposal for specialized trainer abstractions in the Kubeflow SDK. The proposal addresses current limitations in the SDK's trainer subsystem by introducing framework-aware trainer classes and a dedicated runtime configuration system.

Changes:

Proposes a BaseTrainer abstract interface enabling polymorphic handling of trainers across the SDK
Defines Tier 1 framework-specific trainers (TorchTrainer, MPITrainer, JAXTrainer, XGBoostTrainer) with automatic runtime discovery
Introduces RuntimeConfig dataclass to separate runtime environment settings from training logic
Provides comprehensive design details including auto-discovery, validation, backward compatibility, test plans, and implementation phases

docs/proposals/285-specialized-trainers/README.md

+              from abc import ABC, abstractmethod
+              from dataclasses import dataclass, field
+              from typing import Callable, Optional

Copilot AI Feb 11, 2026

Missing import for ClassVar which is used on line 240. Add from typing import ClassVar or include it in the existing typing import.

Suggested change

      
            from typing import Callable, Optional
          
            from typing import Callable, Optional, ClassVar

Copilot uses AI. Check for mistakes.

docs/proposals/285-specialized-trainers/README.md

+                  index_urls: list[str] = field(
+                      default_factory=lambda: list(constants.DEFAULT_PIP_INDEX_URLS)
+                  )
+                  quiet: bool = True

Copilot AI Feb 11, 2026

The quiet field has a default value of True, but the guideline notes that booleans should default to False unless there's a specific reason. Consider whether suppressing pip output by default is the desired behavior, or if it should default to False for better debugging visibility.

Copilot generated this review using guidance from repository custom instructions.

docs/proposals/285-specialized-trainers/README.md

Comment on lines +473 to +475

		"""Trainer for HuggingFace Transformers training.

		Wraps HuggingFace's Trainer API and maps to a PyTorch runtime.

Copilot AI Feb 11, 2026

Inconsistent spelling of "Hugging Face". The official company/product name is "Hugging Face" (two words), not "HuggingFace". This appears in multiple locations including class names, comments, and documentation. While HuggingFaceTrainer as a class name (PascalCase without spaces) is acceptable, the documentation text should use "Hugging Face" (two words).

Suggested change

      
                """Trainer for HuggingFace Transformers training.
          
                Wraps HuggingFace's Trainer API and maps to a PyTorch runtime.
          
                """Trainer for Hugging Face Transformers training.
          
                Wraps Hugging Face's Trainer API and maps to a PyTorch runtime.

Copilot uses AI. Check for mistakes.

docs/proposals/285-specialized-trainers/README.md

Comment on lines +473 to +475

		"""Trainer for HuggingFace Transformers training.

		Wraps HuggingFace's Trainer API and maps to a PyTorch runtime.

Copilot AI Feb 11, 2026

Inconsistent spelling of "Hugging Face". The official company/product name is "Hugging Face" (two words), not "HuggingFace". Use "Hugging Face" in documentation text.

Suggested change

      
                """Trainer for HuggingFace Transformers training.
          
                Wraps HuggingFace's Trainer API and maps to a PyTorch runtime.
          
                """Trainer for Hugging Face Transformers training.
          
                Wraps Hugging Face's Trainer API and maps to a PyTorch runtime.

Copilot uses AI. Check for mistakes.

docs/proposals/285-specialized-trainers/README.md

+. **Deprecating `CustomTrainer` or `BuiltinTrainer`.** Both remain supported.
+                 Specialized trainers are an additional option, not a replacement.
+. **Tier 2 trainer implementations.** This proposal defines the extension mechanism
+                 and interface. Concrete Tier 2 implementations (HuggingFace, DeepSpeed, Unsloth,

Copilot AI Feb 11, 2026

Inconsistent spelling of "Hugging Face". The official company/product name is "Hugging Face" (two words), not "HuggingFace". Use "Hugging Face" in documentation text.

Suggested change

      
               and interface. Concrete Tier 2 implementations (HuggingFace, DeepSpeed, Unsloth,
          
               and interface. Concrete Tier 2 implementations (Hugging Face, DeepSpeed, Unsloth,

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels