Skip to content

Commit

Permalink
Merge pull request #1738 from tkatila/xpu-sidcar-tls-note
Browse files Browse the repository at this point in the history
xpumanager sidecar: add note about using HTTPS with xpum
  • Loading branch information
mythi committed May 14, 2024
2 parents 6a01e75 + 7caba39 commit 3da1292
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 15 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Table of Contents
* [DLB device plugin](#dlb-device-plugin)
* [IAA device plugin](#iaa-device-plugin)
* [Device Plugins Operator](#device-plugins-operator)
* [XeLink XPU-Manager sidecar](#xelink-xpu-manager-sidecar)
* [XeLink XPU Manager sidecar](#xelink-xpu-manager-sidecar)
* [Demos](#demos)
* [Workload Authors](#workload-authors)
* [Developers](#developers)
Expand Down Expand Up @@ -194,11 +194,11 @@ The [Device plugins operator README](cmd/operator/README.md) gives the installat

The [Device plugins Operator for OpenShift](https://github.com/intel/intel-technology-enabling-for-openshift) gives the installation and usage details for the operator available on [Red Hat OpenShift Container Platform](https://catalog.redhat.com/software/operators/detail/61e9f2d7b9cdd99018fc5736).

## XeLink XPU-Manager Sidecar
## XeLink XPU Manager Sidecar

To support interconnected GPUs in Kubernetes, XeLink sidecar is needed.

The [XeLink XPU-Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it.
The [XeLink XPU Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it.

## Demos

Expand Down
94 changes: 82 additions & 12 deletions cmd/xpumanager_sidecar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@ Table of Contents
* [Introduction](#introduction)
* [Modes and Configuration Options](#modes-and-configuration-options)
* [Installation](#installation)
* [Install XPU-Manager with the Sidecar](#install-xpu-manager-with-the-sidecar)
* [Install Sidecar to an Existing XPU-Manager](#install-sidecar-to-an-existing-xpu-manager)
* [Install XPU Manager with the Sidecar](#install-xpu-manager-with-the-sidecar)
* [Install Sidecar to an Existing XPU Manager](#install-sidecar-to-an-existing-xpu-manager)
* [Verify Sidecar Functionality](#verify-sidecar-functionality)
* [Use HTTPS with XPU Manager](#use-https-with-xpu-manager)

## Introduction

Expand All @@ -21,14 +22,14 @@ Intel GPUs can be interconnected via an XeLink. In some workloads it is benefici
| -interval | int | 10 | Interval for XeLink topology fetching and label writing (seconds, >= 1) |
| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) |
| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links |
| -allow-subdeviceless-links | bool | false | Include xelinks that are not on subdevices |
| -use-https | bool | false | Use HTTPS protocol when connecting to XPU-Manager |
| -allow-subdeviceless-links | bool | false | Include xelinks also for devices that do not have subdevices |
| -use-https | bool | false | Use HTTPS protocol when connecting to XPU Manager |

The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options.

## Installation

The following sections detail how to obtain, deploy and test the XPU-Manager XeLink sidecar.
The following sections detail how to obtain, deploy and test the XPU Manager XeLink sidecar.

### Pre-built Images

Expand All @@ -44,31 +45,100 @@ Note: Replace `<RELEASE_VERSION>` with the desired [release tag](https://github.

See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin.

#### Install XPU-Manager with the Sidecar
#### Install XPU Manager with the Sidecar

Install XPU-Manager daemonset with the XeLink sidecar
Install XPU Manager daemonset with the XeLink sidecar

```bash
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=<RELEASE_VERSION>'
```

Please see XPU-Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes).
Please see XPU Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes).

#### Install Sidecar to an Existing XPU-Manager
#### Install Sidecar to an Existing XPU Manager

Use patch to add sidecar into the XPU-Manager daemonset.
Use patch to add sidecar into the XPU Manager daemonset.

```bash
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=<RELEASE_VERSION>'
```

NOTE: The sidecar patch will remove other resources from the XPU-Manager container. If your XPU-Manager daemonset is using, for example, the smarter device manager resources, those will be removed.
NOTE: The sidecar patch will remove other resources from the XPU Manager container. If your XPU Manager daemonset is using, for example, the smarter device manager resources, those will be removed.

#### Verify Sidecar Functionality
### Verify Sidecar Functionality

You can verify the sidecar's functionality by checking node's xe-links labels:

```bash
$ kubectl get nodes -A -o=jsonpath="{range .items[*]}{.metadata.name},{.metadata.labels.gpu\.intel\.com\/xe-links}{'\n'}{end}"
master,0.0-1.0_0.1-1.1
```

### Use HTTPS with XPU Manager

XPU Manager can be configured to use HTTPS on the metrics interface. For the gunicorn sidecar, cert and key files have to be added to the command:
```
- command:
- gunicorn
...
- --certfile=/certs/tls.crt
- --keyfile=/certs/tls.key
...
- xpum_rest_main:main()
```

The gunicorn container will also need the tls.crt and tls.key files within the container. For example:

```
containers:
- name: python-exporter
volumeMounts:
- mountPath: /certs
name: certs
readOnly: true
volumes:
- name: certs
secret:
defaultMode: 420
secretName: xpum-server-cert
```

In this case, the secret providing the certificate and key is called `xpum-server-cert`.

The certificate and key can be [added manually to a secret](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/). Another way to achieve a secret is to leverage [cert-manager](https://cert-manager.io/).

<details>
<summary>Example for the Cert-manager objects</summary>

Cert-manager will create a self-signed certificate and the private key, and store them into a secret called `xpum-server-cert`.

```
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: serving-cert
spec:
dnsNames:
- xpum.svc
- xpum.svc.cluster.local
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: xpum-server-cert
```

</details>

For the XPU Manager sidecar, `use-https` has to be added to the arguments. Then the sidecar will leverage HTTPS with the connection to the metrics interface.
```
args:
- -v=2
- -use-https
```

0 comments on commit 3da1292

Please sign in to comment.