Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ A Framework represents an application with a set of Tasks:
5. Support to specify how to [classify and summarize Pod failures](doc/user-manual.md#PodFailureClassification)
6. Support to expose [Framework and Pod history snapshots](doc/user-manual.md#FrameworkPodHistory) to external systems
7. Easy to leverage [FrameworkBarrier](doc/user-manual.md#FrameworkBarrier) to achieve light-weight Gang Execution and Service Discovery
8. Easy to leverage [HivedScheduler](doc/user-manual.md#HivedScheduler) to achieve GPU Topology-Aware, Multi-Tenant, Priority and Gang Scheduling
8. Easy to leverage [HiveDScheduler](doc/user-manual.md#HiveDScheduler) to achieve GPU Topology-Aware, Multi-Tenant, Priority and Gang Scheduling
9. Compatible with other Kubernetes features, such as Kubernetes [Service](https://kubernetes.io/docs/concepts/services-networking/service), [Gpu Scheduling](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus), [Volume](https://kubernetes.io/docs/concepts/storage/volumes/), [Logging](https://kubernetes.io/docs/concepts/cluster-administration/logging)
10. Idiomatic with Kubernetes official controllers, such as [Pod Spec](https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/#pod-templates)
11. Aligned with Kubernetes [Controller Design Guidelines](https://github.com/kubernetes/community/blob/f0dd87ad477e1e91c53866902adf7832c32ce543/contributors/devel/sig-api-machinery/controllers.md) and [API Conventions](https://github.com/kubernetes/community/blob/a2cdce51a0bbbc214f0e8813e0a877176ad3b6c9/contributors/devel/sig-architecture/api-conventions.md)
Expand Down Expand Up @@ -87,7 +87,7 @@ A specialized wrapper can be built on top of FrameworkController to optimize for
### Recommended Kubernetes Scheduler
FrameworkController can directly leverage many [Kubernetes Schedulers](https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers) and among them we recommend these best fits:
* [Kubernetes Default Scheduler](https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/#kube-scheduler): A General-Purpose Kubernetes Scheduler
* [HivedScheduler](doc/user-manual.md#HivedScheduler): A Kubernetes Scheduler Extender optimized for GPUs
* [HiveDScheduler](doc/user-manual.md#HiveDScheduler): A Kubernetes Scheduler Extender optimized for AI applications

### Similar Offering On Other Cluster Manager
* [YARN FrameworkLauncher](https://github.com/microsoft/pai/blob/master/subprojects/frameworklauncher/yarn): Similar offering natively supports [Apache YARN](http://hadoop.apache.org)
Expand Down
8 changes: 4 additions & 4 deletions doc/user-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
- [Framework Consistency vs Availability](#FrameworkConsistencyAvailability)
- [Controller Extension](#ControllerExtension)
- [FrameworkBarrier](#FrameworkBarrier)
- [HivedScheduler](#HivedScheduler)
- [HiveDScheduler](#HiveDScheduler)
- [Best Practice](#BestPractice)

## <a name="FrameworkInterop">Framework Interop</a>
Expand Down Expand Up @@ -464,9 +464,9 @@ See more in:
1. [Usage](../pkg/barrier/barrier.go)
2. Example: [FrameworkBarrier Example](../example/framework/extension/frameworkbarrier.yaml), [TensorFlow Example](../example/framework/scenario/tensorflow), [etc](../example/framework/scenario).

### <a name="HivedScheduler">HivedScheduler</a>
1. [Usage](https://github.com/microsoft/pai/tree/master/subprojects/hivedscheduler)
2. Example: [TensorFlow Example](../example/framework/scenario/tensorflow/gpu/tensorflowdistributedtrainingwithhivedscheduledgpu.yaml), [etc](https://github.com/microsoft/pai/blob/master/subprojects/GOPATH/src/github.com/microsoft/hivedscheduler/example/request/design/request.yaml).
### <a name="HiveDScheduler">HiveDScheduler</a>
1. [Usage](https://github.com/microsoft/hivedscheduler)
2. Example: [TensorFlow Example](../example/framework/scenario/tensorflow/gpu/tensorflowdistributedtrainingwithhivedscheduledgpu.yaml), [etc](https://github.com/microsoft/hivedscheduler/example/request/design/request.yaml).

## <a name="BestPractice">Best Practice</a>
[Best Practice](../pkg/apis/frameworkcontroller/v1/types.go)
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ spec:
affinityGroup: null
spec:
# [PREREQUISITE]
# Do not specify the schedulerName if the HivedScheduler is directly
# Do not specify the schedulerName if the HiveDScheduler is directly
# called by the k8s default scheduler.
schedulerName: hivedscheduler
restartPolicy: Never
Expand Down Expand Up @@ -76,8 +76,8 @@ spec:
resources:
limits:
# [PREREQUISITE]
# User needs to setup HivedScheduler for the k8s cluster.
# See https://github.com/microsoft/pai/tree/master/subprojects/hivedscheduler
# User needs to setup HiveDScheduler for the k8s cluster.
# See https://github.com/microsoft/hivedscheduler
hivedscheduler.microsoft.com/pod-scheduling-enable: 1
cpu: 3
memory: 96Gi
Expand Down