Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions setup.RHOAI-v2.13/CLUSTER-SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,33 @@ kueue-controller-manager's log:

```

## Autopilot

Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.

- Add the Autopilot Helm repository

```bash
helm repo add autopilot https://ibm.github.io/autopilot/
helm repo update
```

- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.

```bash
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
```

### Enabling Prometheus metrics

After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:

```bash
oc label ns autopilot openshift.io/cluster-monitoring=true
```

The `ServiceMonitor` labeling is not required.

## Kueue Configuration

Create Kueue's default flavor:
Expand Down
27 changes: 27 additions & 0 deletions setup.RHOAI-v2.16/CLUSTER-SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,33 @@ AI configuration as follows:



## Autopilot

Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.

- Add the Autopilot Helm repository

```bash
helm repo add autopilot https://ibm.github.io/autopilot/
helm repo update
```

- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.

```bash
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
```

### Enabling Prometheus metrics

After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:

```bash
oc label ns autopilot openshift.io/cluster-monitoring=true
```

The `ServiceMonitor` labeling is not required.

## Kueue Configuration

Create Kueue's default flavor:
Expand Down
27 changes: 27 additions & 0 deletions setup.RHOAI-v2.17/CLUSTER-SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,33 @@ AI configuration as follows:



## Autopilot

Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.

- Add the Autopilot Helm repository

```bash
helm repo add autopilot https://ibm.github.io/autopilot/
helm repo update
```

- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.

```bash
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
```

### Enabling Prometheus metrics

After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:

```bash
oc label ns autopilot openshift.io/cluster-monitoring=true
```

The `ServiceMonitor` labeling is not required.

## Kueue Configuration

Create Kueue's default flavor:
Expand Down
29 changes: 29 additions & 0 deletions setup.k8s/CLUSTER-SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ The cluster setup installs and configures the following components:
+ Kueue
+ AppWrappers
+ Cluster roles and priority classes
+ Autopilot

## Priorities

Expand Down Expand Up @@ -73,6 +74,34 @@ operators as follows:
- `queueName` is set to `default-queue`,
- pod priorities, resource requests and limits have been adjusted.

## Autopilot

Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.

- Add the Autopilot Helm repository

```bash
helm repo add autopilot https://ibm.github.io/autopilot/
helm repo update
```

- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.

```bash
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
```

### Enabling Prometheus metrics

The `ServiceMonitor` object is the one that enables Prometheus to scrape the metrics produced by Autopilot.
In order for Prometheus to find the right objects, the `ServiceMonitor` needs to be annotated with the Prometheus' release name. It is usually `prometheus`, and that's the default added in the Autopilot release.
If that is not the case in your cluster, the correct release label can be found by checking in the `ServiceMonitor` of Prometheus itself, or the name of Prometheus helm chart.
Then, Autopilot's `ServiceMonitor` can be labeled with the following command

```bash
kubectl label servicemonitors.monitoring.coreos.com -n autopilot autopilot-metrics-monitor release=<prometheus-release-name> --overwrite
```

## Kueue Configuration

Create Kueue's default flavor:
Expand Down
39 changes: 39 additions & 0 deletions setup.tmpl/CLUSTER-SETUP.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ The cluster setup installs and configures the following components:
+ Kueue
+ AppWrappers
+ Cluster roles and priority classes
+ Autopilot

{{- end }}

Expand Down Expand Up @@ -154,6 +155,44 @@ operators as follows:

{{- end }}

## Autopilot

Helm charts values and how-to for customization can be found [in the official documentation](https://github.com/IBM/autopilot/blob/main/helm-charts/autopilot/README.md). As-is, Autopilot will run on GPU nodes.

- Add the Autopilot Helm repository

```bash
helm repo add autopilot https://ibm.github.io/autopilot/
helm repo update
```

- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.

```bash
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
```

### Enabling Prometheus metrics

{{ if .OPENSHIFT -}}
After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:

```bash
{{ .KUBECTL }} label ns autopilot openshift.io/cluster-monitoring=true
```

The `ServiceMonitor` labeling is not required.
{{- else -}}
The `ServiceMonitor` object is the one that enables Prometheus to scrape the metrics produced by Autopilot.
In order for Prometheus to find the right objects, the `ServiceMonitor` needs to be annotated with the Prometheus' release name. It is usually `prometheus`, and that's the default added in the Autopilot release.
If that is not the case in your cluster, the correct release label can be found by checking in the `ServiceMonitor` of Prometheus itself, or the name of Prometheus helm chart.
Then, Autopilot's `ServiceMonitor` can be labeled with the following command

```bash
{{ .KUBECTL }} label servicemonitors.monitoring.coreos.com -n autopilot autopilot-metrics-monitor release=<prometheus-release-name> --overwrite
```
{{- end }}

## Kueue Configuration

Create Kueue's default flavor:
Expand Down