diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 000000000..e69de29bb diff --git a/0.27/.buildinfo b/0.27/.buildinfo new file mode 100644 index 000000000..e33ab616c --- /dev/null +++ b/0.27/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: 527d0961f8224af12e66dd0bbe3563fe +tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/0.27/DEVEL.html b/0.27/DEVEL.html new file mode 100644 index 000000000..6a83bef05 --- /dev/null +++ b/0.27/DEVEL.html @@ -0,0 +1,458 @@ + + + + + + + Instructions for Device Plugin Development and Maintenance — Intel® Device Plugins for Kubernetes documentation + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+
    +
  • + +
  • + View page source +
  • +
+
+
+
+
+ +
+

Instructions for Device Plugin Development and Maintenance

+

Table of Contents

+ +
+

Day-to-day Development How to’s

+
+

Get the Source Code

+

With git installed on the system, just clone the repository:

+
$ export INTEL_DEVICE_PLUGINS_SRC=/path/to/intel-device-plugins-for-kubernetes
+$ git clone https://github.com/intel/intel-device-plugins-for-kubernetes ${INTEL_DEVICE_PLUGINS_SRC}
+
+
+
+
+

Build and Run Plugin Binaries

+

With go development environment installed on the system, build the plugin:

+
$ cd ${INTEL_DEVICE_PLUGINS_SRC}
+$ make <plugin-build-target>
+
+
+

Note: All the available plugin build targets is roughly the output of ls ${INTEL_DEVICE_PLUGINS_SRC}/cmd.

+

To test the plugin binary on the development system, run as administrator:

+
$ sudo -E ${INTEL_DEVICE_PLUGINS_SRC}/cmd/<plugin-build-target>/<plugin-build-target>
+
+
+
+
+

Build Container Images

+

The dockerfiles are generated on the fly from .in suffixed files and .docker include-snippets which are stitched together with +cpp preprocessor. You need to install cpp for that, e.g. in ubuntu it is found from build-essential (sudo apt install build-essential). +Don’t edit the generated dockerfiles. Edit the inputs.

+

The simplest way to build all the docker images, is:

+
$ make images
+
+
+

But it is very slow. You can drastically speed it up by first running once:

+
$ make vendor
+
+
+

Which brings the libraries into the builder container without downloading them again and again for each plugin.

+

But it is still slow. You can further speed it up by first running once:

+
$ make licenses
+
+
+

Which pre-creates the go-licenses for all plugins, instead of re-creating them for each built plugin, every time.

+

But it is still rather slow to build all the images, and unnecessary, if you iterate on just one. Instead, build just the one you are iterating on, example:

+
$ make <image-build-target>
+
+
+

Note: All the available image build targets is roughly the output of ls ${INTEL_DEVICE_PLUGINS_SRC}/build/docker/*.Dockerfile.

+

If you iterate on only one plugin and if you know what its target cmd is (see folder cmd/), you can opt to pre-create just its licenses, example:

+
$ make licenses/<plugin-build-target>
+
+
+

The container image target names in the Makefile are derived from the .Dockerfile.in suffixed filenames under folder build/docker/templates/.

+

Recap:

+
$ make vendor
+$ make licenses (or just make licenses/<plugin-build-target>)
+$ make <image-build-target>
+
+
+

Repeat the last step only, unless you change library dependencies. If you pull in new sources, start again from make vendor.

+

Note: The image build tool can be changed from the default docker by setting the BUILDER argument +to the Makefile: make <image-build-target> BUILDER=<builder>. Supported values are docker, buildah, and podman.

+
+
+

Build Against a Newer Version of Kubernetes

+

First, you need to update module dependencies. The easiest way is to use +scripts/upgrade_k8s.sh copied from a k/k issue:

+

Just run it inside the repo’s root, e.g.

+
$ ${INTEL_DEVICE_PLUGINS_SRC}/scripts/upgrade_k8s.sh <k8s version>
+
+
+

Finally, run:

+
$ make generate
+$ make test
+
+
+

and fix all new compilation issues.

+
+
+

Work with Intel Device Plugins Operator Modifications

+

There are few useful steps when working with changes to Device Plugins CRDs and controllers:

+
    +
  1. Install controller-gen: GO111MODULE=on go get -u sigs.k8s.io/controller-tools/cmd/controller-gen@<release ver>, e.g, v0.4.1

  2. +
  3. Generate CRD and Webhook artifacts: make generate

  4. +
  5. Test local changes using envtest: make envtest

  6. +
  7. Build a custom operator image: make intel-deviceplugin-operator

  8. +
  9. (Un)deploy operator: kubectl [apply|delete] -k deployments/operator/default

  10. +
+
+
+

Publish a New Version of the Intel Device Plugins Operator to operatorhub.io

+

Check if the fields mentioned below in the base CSV manifest file have the correct values. If not, fix them manually (operator-sdk does not support updating these fields in any other way).

+
    +
  • spec.version

  • +
  • spec.replaces

  • +
  • metadata.annotations.containerImage

  • +
  • metadata.annotations.createdAT

  • +
+

Fork the Community Operators repo and clone it:

+
$ git clone https://github.com/<GitHub Username>/community-operators
+
+
+

Generate bundle and build bundle image:

+
$ make bundle TAG=0.X.Y CHANNELS=alpha DEFAULT_CHANNEL=alpha
+$ make bundle-build
+
+
+

Push the image to a registry:

+
    +
  • If pushing to the Docker hub, specify docker.io/ in front of the image name for running bundle.

  • +
  • If pushing to the local registry, put the option --use-http for running bundle.

  • +
+

Verify the operator deployment works OK via OLM in your development cluster:

+
$ operator-sdk olm install
+$ kubectl create namespace testoperator
+$ operator-sdk run bundle <Registry>:<Tag> -n testoperator
+# do verification checks
+...
+# do clean up
+$ operator-sdk cleanup intel-device-plugins-operator --namespace testoperator
+$ kubectl delete namespace testoperator
+$ operator-sdk olm uninstall
+
+
+

Commit files:

+
$ cd community-operators
+$ git add operators/intel-device-plugins-operator/0.X.Y
+$ git commit -am 'operators intel-device-plugins-operator (0.X.Y)' -s
+
+
+

Submit a PR to Community Operators repo.

+

Check operator page +https://operatorhub.io/operator/intel-device-plugins-operator +after PR is merged.

+
+
+

Run E2E Tests

+

Currently the E2E tests require having a Kubernetes cluster already configured +on the nodes with the hardware required by the device plugins. Also all the +container images with the executables under test must be available in the +cluster. If these two conditions are satisfied, run the tests with:

+
$ go test -v ./test/e2e/...
+
+
+

In case you want to run only certain tests, e.g., QAT ones, run:

+
$ go test -v ./test/e2e/... -args -ginkgo.focus "QAT"
+
+
+

If you need to specify paths to your custom kubeconfig containing +embedded authentication info then add the -kubeconfig argument:

+
$ go test -v ./test/e2e/... -args -kubeconfig /path/to/kubeconfig
+
+
+

The full list of available options can be obtained with:

+
$ go test ./test/e2e/... -args -help
+
+
+

It is also possible to run the tests which don’t depend on hardware +without a pre-configured Kubernetes cluster. Just make sure you have +Kind installed on your host and run:

+
$ make test-with-kind
+
+
+
+
+

Run Controller Tests with a Local Control Plane

+

The controller-runtime library provides a package for integration testing by +starting a local control plane. The package is called +envtest. The +operator uses this package for its integration testing.

+

For setting up the environment for testing, setup-envtest can be used:

+
$ go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
+$ setup-envtest use <K8S_VERSION>
+$ KUBEBUILDER_ASSETS=$(setup-envtest use -i -p path <K8S_VERSION>) make envtest
+
+
+
+
+
+

How to Develop Simple Device Plugins

+

To create a simple device plugin without the hassle of developing your own gRPC +server, you can use a package included in this repository called +github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.

+

All you have to do is instantiate a deviceplugin.Manager and call +its Run() method:

+
func main() {
+    ...
+
+    manager := dpapi.NewManager(namespace, plugin)
+    manager.Run()
+}
+
+
+

The manager’s constructor accepts two parameters:

+
    +
  1. namespace which is a string like “color.example.com”. All your devices +will be exposed under this name space, e.g. “color.example.com/yellow”. +Please note that one device plugin can register many such “colors”. +The manager will instantiate multiple gRPC servers for every registered “color”.

  2. +
  3. plugin which is a reference to an object implementing one mandatory +interface deviceplugin.Scanner.

  4. +
+

deviceplugin.Scanner defines one method Scan() which is called only once +for every device plugin by deviceplugin.Manager in a goroutine and operates +in an infinite loop. A Scan() implementation scans the host for devices and +sends all found devices to a deviceplugin.Notifier instance. The +deviceplugin.Notifier is implemented and provided by the deviceplugin +package itself. The found devices are organized in an instance of +deviceplugin.DeviceTree object. The object is filled in with its +AddDevice() method:

+
func (dp *devicePlugin) Scan(notifier deviceplugin.Notifier) error {
+    for {
+        devTree := deviceplugin.NewDeviceTree()
+        ...
+        devTree.AddDevice("yellow", devID, deviceplugin.DeviceInfo{
+            State: health,
+            Nodes: []pluginapi.DeviceSpec{
+                {
+                    HostPath:      devPath,
+                    ContainerPath: devPath,
+                    Permissions:   "rw",
+                },
+            },
+        })
+        ...
+        notifier.Notify(devTree)
+    }
+}
+
+
+

Optionally, your device plugin may also implement the +deviceplugin.PostAllocator interface. If implemented, its method +PostAllocate() modifies pluginapi.AllocateResponse responses just +before they are sent to kubelet. To see an example, refer to the FPGA +plugin which implements this interface to annotate its responses.

+

In case you want to implement the whole allocation functionality in your +device plugin, you can implement the optional deviceplugin.Allocator +interface. In this case PostAllocate() is not called. But if you decide in your +implementation of deviceplugin.Allocator that you need to resort to the default +implementation of the allocation functionality then return an error of the type +deviceplugin.UseDefaultMethodError.

+
+

Logging

+

The framework uses klog as its logging +framework. It is encouraged for plugins to also use klog to maintain uniformity +in the logs and command line options.

+

The framework initialises klog, so further calls to klog.InitFlags() by +plugins should not be necessary. This does add a number of log configuration +options to your plugin, which can be viewed with the -h command line option of your +plugin.

+

The framework tries to adhere to the Kubernetes +Logging Conventions. +The advise is to use the V() levels for Info() calls, as calling Info() +with no set level will make configuration and filtering of logging via the command +line more difficult.

+

The default is to not log Info() calls. This can be changed using the plugin command +line -v parameter. The additional annotations prepended to log lines by ‘klog’ can be disabled +with the -skip_headers option.

+
+
+

Error Conventions

+

The framework has a convention for producing and logging errors. Ideally plugins will also adhere +to the convention.

+

Errors generated within the framework and plugins are instantiated with the New() and +Errorf() functions of the errors package:

+
    return errors.New("error message")
+
+
+

Errors generated from outside the plugins and framework are augmented with their stack dump with code such as

+
    return errors.WithStack(err)
+
+
+

or

+
    return errors.Wrap(err, "some additional error message")
+
+
+

These errors are then logged using a default struct value format like:

+
    klog.Errorf("Example of an internal error death: %+v", err)
+
+
+

at the line where it’s certain that the error cannot be passed out farther nor handled gracefully. +Otherwise, they can be logged as simple values:

+
    klog.Warningf("Example of a warning due to an external error: %v", err)
+
+
+
+
+
+

Checklist for New Device Plugins

+

For new device plugins contributed to this repository, below is a +checklist to get the plugin on par feature and quality wise with +others:

+
    +
  1. Plugin binary available in cmd/, its corresponding Dockerfile in build/docker/ and deployment Kustomization/YAMLs in deployments/.

  2. +
  3. Plugin binary Go unit tests implemented and passing with >80% coverage: make test WHAT=./cmd/<plugin>.

  4. +
  5. Plugin binary linter checks passing: make lint.

  6. +
  7. Plugin e2e tests implemented in test/e2e/ and passing: go test -v ./test/e2e/... -args -ginkgo.focus "<plugin>".

  8. +
  9. Plugin CRD API added to pkg/apis/deviceplugin/v1 and CRDs generated: make generate.

  10. +
  11. Plugin CRD validation tests implemented in test/envtest/ and passing: make envtest.

  12. +
  13. Plugin CRD controller implemented in pkg/controllers/ and added to the manager in cmd/operator/main.go.

  14. +
  15. Plugin documentation written cmd/<plugin>/README.md and optionally end to end demos created in demo.

  16. +
+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/0.27/INSTALL.html b/0.27/INSTALL.html new file mode 100644 index 000000000..015b51dc6 --- /dev/null +++ b/0.27/INSTALL.html @@ -0,0 +1,232 @@ + + + + + + + Installing device plugins to cluster — Intel® Device Plugins for Kubernetes documentation + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Installing device plugins to cluster

+
+

Install device plugins via a DaemonSet

+

Each plugin can be installed via a DaemonSet. The install changes slightly based on the desired plugin. See install instructions per plugin.

+

Installing plugins via DaemonSets deployes them to the default (or currently active) namespace. Use kubectl’s --namespace argument to change the deployment namespace.

+
+
+

Install device plugins via device plugin operator

+

A more advanced install method is via device plugin operator. Operator configures plugin deployments based on the supplied device plugin CRDs (Custom Resource Definitions). See installation instructions in the operator README.

+

Operator installs device plugins to the same namespace where the operator itself is deployed. The default operator namespace is inteldeviceplugins-system.

+
+
+

Install with HELM charts

+

Device plugins can also be installed to a cluster using the device plugin operator Helm chart (depending on cert-manager and NFD). Individual plugin projects are under https://github.com/intel/helm-charts/tree/main/charts/.

+

These steps will install device plugin operator and plugins under inteldeviceplugins-system namespace. It’s possible to change the target namespace by changing the --namespace value in the helm install command.

+
+

Installing HELM repositories

+
helm repo add jetstack https://charts.jetstack.io # for cert-manager
+helm repo add nfd https://kubernetes-sigs.github.io/node-feature-discovery/charts # for NFD
+helm repo add intel https://intel.github.io/helm-charts/ # for device-plugin-operator and plugins
+helm repo update
+
+
+
+
+

Installing cert-manager

+
helm install --wait \
+  cert-manager jetstack/cert-manager \
+  --namespace cert-manager \
+  --create-namespace \
+  --version v1.11.0 \
+  --set installCRDs=true
+
+
+

NOTE: cert-manager install takes a while to complete.

+
+
+

Installing NFD

+
helm install nfd nfd/node-feature-discovery \
+  --namespace node-feature-discovery --create-namespace --version 0.12.1 \
+  --set 'master.extraLabelNs={gpu.intel.com,sgx.intel.com}' \
+  --set 'master.resourceLabels={gpu.intel.com/millicores,gpu.intel.com/memory.max,gpu.intel.com/tiles,sgx.intel.com/epc}'
+
+
+
+
+

Installing operator

+
helm install dp-operator intel/intel-device-plugins-operator --namespace inteldeviceplugins-system --create-namespace
+
+
+
+
+

Installing specific plugins

+

Replace PLUGIN with the desired plugin name. At least the following plugins are supported: gpu, sgx, qat, dlb, dsa & iaa.

+
helm install <PLUGIN> intel/intel-device-plugins-<PLUGIN> --namespace inteldeviceplugins-system --create-namespace \
+  --set nodeFeatureRule=true
+
+
+
+
+

Listing available versions

+

Use helm’s search functionality to list available versions.

+
helm search repo intel/intel-device-plugins-operator --versions
+helm search repo intel/intel-device-plugins-<plugin> --versions
+
+
+

For example, operator chart versions with development versions included.

+
$ helm search repo intel/intel-device-plugins-operator --versions --devel
+NAME                               	CHART VERSION	APP VERSION	DESCRIPTION
+intel/intel-device-plugins-operator	0.26.0       	0.26.0     	A Helm chart for Intel Device Plugins Operator ...
+intel/intel-device-plugins-operator	0.25.1       	0.25.1     	A Helm chart for Intel Device Plugins Operator ...
+intel/intel-device-plugins-operator	0.25.1-helm.0	0.25.0     	A Helm chart for Intel Device Plugins Operator ...
+intel/intel-device-plugins-operator	0.25.0       	0.25.0     	A Helm chart for Intel Device Plugins Operator ...
+intel/intel-device-plugins-operator	0.24.1       	0.24.1     	A Helm chart for Intel Device Plugins Operator ...
+intel/intel-device-plugins-operator	0.24.1-helm.0	0.24.0     	A Helm chart for Intel Device Plugins Operator ...
+intel/intel-device-plugins-operator	0.24.0       	0.24.0     	A Helm chart for Intel Device Plugins Operator ...
+
+
+
+
+

Customizing plugins

+

To customize plugin features, see the available chart values:

+
helm show values intel/intel-device-plugins-<PLUGIN>
+
+
+

For example, qat plugin has these values:

+
$ helm show values intel/intel-device-plugins-qat
+name: qatdeviceplugin-sample
+
+image:
+  hub: intel
+  tag: ""
+
+initImage:
+  hub: intel
+  tag: ""
+
+dpdkDriver: vfio-pci
+kernelVfDrivers:
+  - c6xxvf
+  - 4xxxvf
+maxNumDevices: 128
+logLevel: 4
+
+nodeSelector:
+  intel.feature.node.kubernetes.io/qat: 'true'
+
+nodeFeatureRule: true
+
+
+
+
+

Uninstall

+

Uninstall each installed component with helm uninstall:

+
# repeat first step as many times as there are plugins installed
+helm uninstall -n inteldeviceplugins-system <PLUGIN>
+helm uninstall -n inteldeviceplugins-system dp-operator
+helm uninstall -n node-feature-discovery nfd
+helm uninstall -n cert-manager cert-manager
+
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/0.27/README.html b/0.27/README.html new file mode 100644 index 000000000..70af973b7 --- /dev/null +++ b/0.27/README.html @@ -0,0 +1,526 @@ + + + + + + + Overview — Intel® Device Plugins for Kubernetes documentation + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Overview

+

Build Status +Go Report Card +GoDoc

+

This repository contains a framework for developing plugins for the Kubernetes +device plugins framework, +along with a number of device plugin implementations utilizing that framework.

+

The v0.27.1 release +is the latest feature release with its documentation available here.

+

Table of Contents

+ +
+

Prerequisites

+

Prerequisites for building and running these device plugins include:

+ +
+
+

Plugins

+

The below sections detail existing plugins developed using the framework.

+
+

GPU Device Plugin

+

The GPU device plugin provides access to +discrete and integrated Intel GPU device files.

+

The demo subdirectory contains both a GPU plugin demo video +and an OpenCL sample deployment (intelgpu-job.yaml).

+
+
+

FPGA Device Plugin

+

The FPGA device plugin supports FPGA passthrough for +the following hardware:

+
    +
  • Intel® Arria® 10 devices

  • +
  • Intel® Stratix® 10 devices

  • +
+

The FPGA plugin comes as three parts.

+ +

Refer to each individual sub-components documentation for more details. +Brief overviews of the sub-components are below.

+

The demo subdirectory contains a +video showing deployment +and use of the FPGA plugin. Sources relating to the demo can be found in the +opae-nlb-demo subdirectory.

+
+

Device Plugin

+

The FPGA device plugin is responsible for +discovering and reporting FPGA devices to kubelet.

+
+
+

Admission Controller

+

The FPGA admission controller webhook +is responsible for performing mapping from user-friendly function IDs to the +Interface ID and Bitstream ID that are required for FPGA programming. It also +implements access control by namespacing FPGA configuration information.

+
+
+

CRI-O Prestart Hook

+

The FPGA prestart CRI-O hook performs discovery +of the requested FPGA function bitstream and programs FPGA devices based on the +environment variables in the workload description.

+
+
+
+

QAT Device Plugin

+

The QAT plugin supports device plugin for Intel QAT adapters, and includes +code showing deployment via DPDK.

+

The demo subdirectory includes details of both a +QAT DPDK demo +and a QAT OpenSSL demo. +Source for the OpenSSL demo can be found in the relevant subdirectory.

+

Details for integrating the QAT device plugin into Kata Containers +can be found in the +Kata Containers documentation repository.

+
+
+

VPU Device Plugin

+

The VPU device plugin supports Intel VCAC-A card +(https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/media-analytics-vcac-a-accelerator-card-by-celestica-datasheet.pdf) +the card has:

+
    +
  • 1 Intel Core i3-7100U processor

  • +
  • 12 MyriadX VPUs

  • +
  • 8GB DDR4 memory

  • +
+

The demo subdirectory includes details of a OpenVINO deployment and use of the +VPU plugin. Sources can be found in openvino-demo.

+
+
+

SGX Device Plugin

+

The SGX device plugin allows workloads to use +Intel® Software Guard Extensions (Intel® SGX) on +platforms with SGX Flexible Launch Control enabled, e.g.,:

+
    +
  • 3rd Generation Intel® Xeon® Scalable processor family, code-named “Ice Lake”

  • +
  • Intel® Xeon® E3 processor

  • +
  • Intel® NUC Kit NUC7CJYH

  • +
+

The Intel SGX plugin comes in three parts.

+ +

The demo subdirectory contains a video showing the deployment +and use of the Intel SGX device plugin. Sources relating to the demo can be found in the +sgx-sdk-demo and sgx-aesmd-demo subdirectories.

+

Brief overviews of the Intel SGX sub-components are given below.

+

+
+

device plugin

+

The SGX device plugin is responsible for discovering +and reporting Intel SGX device nodes to kubelet.

+

Containers requesting Intel SGX resources in the cluster should not use the +device plugins resources directly.

+
+
+

Intel SGX Admission Webhook

+

The Intel SGX admission webhook is responsible for performing Pod mutations based on +the sgx.intel.com/quote-provider pod annotation set by the user. The purpose +of the webhook is to hide the details of setting the necessary device resources +and volume mounts for using Intel SGX remote attestation in the cluster. Furthermore, +the Intel SGX admission webhook is responsible for writing a pod/sandbox +sgx.intel.com/epc annotation that is used by Kata Containers to dynamically +adjust its virtualized Intel SGX encrypted page cache (EPC) bank(s) size.

+

The Intel SGX admission webhook is available as part of +Intel Device Plugin Operator or +as a standalone SGX Admission webhook image.

+
+
+

Intel SGX EPC memory registration

+

The Intel SGX EPC memory available on each node is registered as a Kubernetes extended resource using +node-feature-discovery (NFD). An NFD Node Feature Rule is installed as part of +SGX device plugin +operator deployment and NFD is configured to register the Intel SGX EPC memory +extended resource.

+

Containers requesting Intel SGX EPC resources in the cluster use +sgx.intel.com/epc resource which is of +type memory.

+
+
+
+

DSA Device Plugin

+

The DSA device plugin supports acceleration using +the Intel Data Streaming accelerator(DSA).

+
+
+

DLB Device Plugin

+

The DLB device plugin supports Intel Dynamic Load +Balancer accelerator(DLB).

+
+
+

IAA Device Plugin

+

The IAA device plugin supports acceleration using +the Intel Analytics accelerator(IAA).

+
+
+
+

Device Plugins Operator

+

To simplify the deployment of the device plugins, a unified device plugins +operator is implemented.

+

Currently the operator has support for the DSA, DLB, FPGA, GPU, IAA, QAT, and +Intel SGX device plugins. Each device plugin has its own custom resource +definition (CRD) and the corresponding controller that watches CRUD operations +to those custom resources.

+

The Device plugins operator README gives the installation and usage details for the community operator available on operatorhub.io.

+

The Device plugins Operator for OCP gives the installation and usage details for the operator available on Red Hat OpenShift Container Platform.

+
+ +
+

Demos

+

The demo subdirectory contains a number of demonstrations for +a variety of the available plugins.

+
+
+

Workload Authors

+

For workloads to get accesss to devices managed by the plugins, the +Pod spec must specify the hardware resources needed:

+
spec:
+  containers:
+    - name: demo-container
+      image: <registry>/<image>:<version>
+      resources:
+        limits:
+          <device namespace>/<resource>: X
+
+
+

The summary of resources available via plugins in this repository is given in the list below.

+

Device Namespace : Registered Resource(s)

+ +
+
+

Developers

+

For information on how to develop a new plugin using the framework or work on development task in +this repository, see the Developers Guide.

+
+
+

Supported Kubernetes Versions

+

Releases are made under the github releases area. Supported releases and +matching Kubernetes versions are listed below:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BranchKubernetes branch/versionStatus
release-0.27Kubernetes 1.27 branch v1.27.xsupported
release-0.26Kubernetes 1.26 branch v1.26.xsupported
release-0.25Kubernetes 1.25 branch v1.25.xsupported
release-0.24Kubernetes 1.24 branch v1.24.xunsupported
release-0.23Kubernetes 1.23 branch v1.23.xunsupported
release-0.22Kubernetes 1.22 branch v1.22.xunsupported
release-0.21Kubernetes 1.21 branch v1.21.xunsupported
release-0.20Kubernetes 1.20 branch v1.20.xunsupported
release-0.19Kubernetes 1.19 branch v1.19.xunsupported
release-0.18Kubernetes 1.18 branch v1.18.xunsupported
release-0.17Kubernetes 1.17 branch v1.17.xunsupported
release-0.15Kubernetes 1.15 branch v1.15.xunsupported
release-0.11Kubernetes 1.11 branch v1.11.xunsupported
+
+

Pre-built plugin images

+

Pre-built images of the plugins are available on the Docker hub. These images +are automatically built and uploaded to the hub from the latest main branch of +this repository.

+

Release tagged images of the components are also available on the Docker hub, +tagged with their release version numbers in the format x.y.z, corresponding to +the branches and releases in this repository.

+

Note: the default deployment files and operators are configured with +imagePullPolicy +IfNotPresent and can be changed with scripts/set-image-pull-policy.sh.

+
+
+

License

+

All of the source code required to build intel-device-plugins-for-kubernetes +is available under Open Source licenses. The source code files identify external Go +modules used. Binaries are distributed as container images on +DockerHub*. Those images contain license texts and source code under /licenses.

+
+

Helm Charts

+

Device Plugins Helm Charts are located in Intel Helm Charts repository Intel Helm Charts. This is another way of distributing Kubernetes resources of the device plugins framework.

+

To add repo:

+
helm repo add intel https://intel.github.io/helm-charts
+
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/0.27/SECURITY.html b/0.27/SECURITY.html new file mode 100644 index 000000000..91af8d4aa --- /dev/null +++ b/0.27/SECURITY.html @@ -0,0 +1,120 @@ + + + + + + + <no title> — Intel® Device Plugins for Kubernetes documentation + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +

Reporting a Potential Security Vulnerability: If you have discovered +potential security vulnerability in this project, please send an e-mail to +secure@intel.com. Encrypt sensitive information using our +PGP public key.

+

Please provide as much information as possible, including:

+
    +
  • The projects and versions affected

  • +
  • Detailed description of the vulnerability

  • +
  • Information on known exploits

  • +
+

A member of the Intel Product Security Team will review your e-mail and +contact you to collaborate on resolving the issue. For more information on +how Intel works to resolve security issues, see Vulnerability Handling Guidelines.

+ + +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/0.27/_images/FPGA-af.png b/0.27/_images/FPGA-af.png new file mode 100644 index 000000000..64934fdfa Binary files /dev/null and b/0.27/_images/FPGA-af.png differ diff --git a/0.27/_images/FPGA-region.png b/0.27/_images/FPGA-region.png new file mode 100644 index 000000000..b57d9a1ed Binary files /dev/null and b/0.27/_images/FPGA-region.png differ diff --git a/0.27/_images/SGX-BIOS.PNG b/0.27/_images/SGX-BIOS.PNG new file mode 100644 index 000000000..f8510ecec Binary files /dev/null and b/0.27/_images/SGX-BIOS.PNG differ diff --git a/0.27/_images/verify-operator.PNG b/0.27/_images/verify-operator.PNG new file mode 100644 index 000000000..55adad1b5 Binary files /dev/null and b/0.27/_images/verify-operator.PNG differ diff --git a/0.27/_sources/DEVEL.md.txt b/0.27/_sources/DEVEL.md.txt new file mode 100644 index 000000000..900dc6337 --- /dev/null +++ b/0.27/_sources/DEVEL.md.txt @@ -0,0 +1,369 @@ +# Instructions for Device Plugin Development and Maintenance + +Table of Contents + +* [Day-to-day Development How to's](#day-to-day-development) + * [Get the Source Code](#get-the-source-code) + * [Build and Run Plugin Binaries](#build-and-run-plugin-binaries) + * [Build Container Images](#build-container-images) + * [Build Against a Newer Version of Kubernetes](#build-against-a-newer-version-of-kubernetes) + * [Work with Intel Device Plugins Operator Modifications](#work-with-intel-device-plugins-operator-modifications) + * [Publish a New Version of the Intel Device Plugins Operator to operatorhub.io](#publish-a-new-version-of-the-intel-device-plugins-operator-to-operatorhubio) + * [Run E2E Tests](#run-e2e-tests) + * [Run Controller Tests with a Local Control Plane](#run-controller-tests-with-a-local-control-plane) +* [How to Develop Simple Device Plugins](#how-to-develop-simple-device-plugins) + * [Logging](#logging) + * [Error Conventions](#error-conventions) +* [Checklist for New Device Plugins](#checklist-for-new-device-plugins) + +## Day-to-day Development How to's +### Get the Source Code + +With `git` installed on the system, just clone the repository: + +```bash +$ export INTEL_DEVICE_PLUGINS_SRC=/path/to/intel-device-plugins-for-kubernetes +$ git clone https://github.com/intel/intel-device-plugins-for-kubernetes ${INTEL_DEVICE_PLUGINS_SRC} +``` + +### Build and Run Plugin Binaries + +With `go` development environment installed on the system, build the plugin: + +```bash +$ cd ${INTEL_DEVICE_PLUGINS_SRC} +$ make +``` + +**Note:** All the available plugin build targets is roughly the output of `ls ${INTEL_DEVICE_PLUGINS_SRC}/cmd`. + +To test the plugin binary on the development system, run as administrator: + +```bash +$ sudo -E ${INTEL_DEVICE_PLUGINS_SRC}/cmd// +``` + +### Build Container Images + +The dockerfiles are generated on the fly from `.in` suffixed files and `.docker` include-snippets which are stitched together with +cpp preprocessor. You need to install cpp for that, e.g. in ubuntu it is found from build-essential (sudo apt install build-essential). +Don't edit the generated dockerfiles. Edit the inputs. + +The simplest way to build all the docker images, is: +``` +$ make images +``` + +But it is very slow. You can drastically speed it up by first running once: +``` +$ make vendor +``` + +Which brings the libraries into the builder container without downloading them again and again for each plugin. + +But it is still slow. You can further speed it up by first running once: +``` +$ make licenses +``` + +Which pre-creates the go-licenses for all plugins, instead of re-creating them for each built plugin, every time. + +But it is still rather slow to build all the images, and unnecessary, if you iterate on just one. Instead, build just the one you are iterating on, example: + +``` +$ make +``` + +**Note:** All the available image build targets is roughly the output of `ls ${INTEL_DEVICE_PLUGINS_SRC}/build/docker/*.Dockerfile`. + +If you iterate on only one plugin and if you know what its target cmd is (see folder `cmd/`), you can opt to pre-create just its licenses, example: +``` +$ make licenses/ +``` + +The container image target names in the Makefile are derived from the `.Dockerfile.in` suffixed filenames under folder `build/docker/templates/`. + +Recap: +``` +$ make vendor +$ make licenses (or just make licenses/) +$ make +``` + +Repeat the last step only, unless you change library dependencies. If you pull in new sources, start again from `make vendor`. + +**Note:** The image build tool can be changed from the default `docker` by setting the `BUILDER` argument +to the [`Makefile`](Makefile): `make BUILDER=`. Supported values are `docker`, `buildah`, and `podman`. + +### Build Against a Newer Version of Kubernetes + +First, you need to update module dependencies. The easiest way is to use +`scripts/upgrade_k8s.sh` copied [from a k/k issue](https://github.com/kubernetes/kubernetes/issues/79384#issuecomment-521493597): + +Just run it inside the repo's root, e.g. + +``` +$ ${INTEL_DEVICE_PLUGINS_SRC}/scripts/upgrade_k8s.sh +``` +Finally, run: + +``` +$ make generate +$ make test +``` + +and fix all new compilation issues. + +### Work with Intel Device Plugins Operator Modifications + +There are few useful steps when working with changes to Device Plugins CRDs and controllers: + +1. Install controller-gen: `GO111MODULE=on go get -u sigs.k8s.io/controller-tools/cmd/controller-gen@, e.g, v0.4.1` +2. Generate CRD and Webhook artifacts: `make generate` +3. Test local changes using [envtest](https://book.kubebuilder.io/reference/envtest.html): `make envtest` +4. Build a custom operator image: `make intel-deviceplugin-operator` +5. (Un)deploy operator: `kubectl [apply|delete] -k deployments/operator/default` + +### Publish a New Version of the Intel Device Plugins Operator to operatorhub.io + +Check if the fields mentioned below in the [base CSV manifest file](deployments/operator/manifests/bases/intel-device-plugins-operator.clusterserviceversion.yaml) have the correct values. If not, fix them manually (operator-sdk does not support updating these fields in any other way). +- spec.version +- spec.replaces +- metadata.annotations.containerImage +- metadata.annotations.createdAT + +Fork the [Community Operators](https://github.com/k8s-operatorhub/community-operators) repo and clone it: +``` +$ git clone https://github.com//community-operators +``` + +Generate bundle and build bundle image: +``` +$ make bundle TAG=0.X.Y CHANNELS=alpha DEFAULT_CHANNEL=alpha +$ make bundle-build +``` + +Push the image to a registry: +- If pushing to the Docker hub, specify `docker.io/` in front of the image name for running bundle. +- If pushing to the local registry, put the option `--use-http` for running bundle. + +Verify the operator deployment works OK via OLM in your development cluster: +``` +$ operator-sdk olm install +$ kubectl create namespace testoperator +$ operator-sdk run bundle : -n testoperator +# do verification checks +... +# do clean up +$ operator-sdk cleanup intel-device-plugins-operator --namespace testoperator +$ kubectl delete namespace testoperator +$ operator-sdk olm uninstall +``` + +Commit files: +``` +$ cd community-operators +$ git add operators/intel-device-plugins-operator/0.X.Y +$ git commit -am 'operators intel-device-plugins-operator (0.X.Y)' -s +``` + +Submit a PR to [Community Operators](https://github.com/k8s-operatorhub/community-operators) repo. + +Check operator page +https://operatorhub.io/operator/intel-device-plugins-operator +after PR is merged. + +### Run E2E Tests + +Currently the E2E tests require having a Kubernetes cluster already configured +on the nodes with the hardware required by the device plugins. Also all the +container images with the executables under test must be available in the +cluster. If these two conditions are satisfied, run the tests with: + +```bash +$ go test -v ./test/e2e/... +``` + +In case you want to run only certain tests, e.g., QAT ones, run: + +```bash +$ go test -v ./test/e2e/... -args -ginkgo.focus "QAT" +``` + +If you need to specify paths to your custom `kubeconfig` containing +embedded authentication info then add the `-kubeconfig` argument: + +```bash +$ go test -v ./test/e2e/... -args -kubeconfig /path/to/kubeconfig +``` + +The full list of available options can be obtained with: + +```bash +$ go test ./test/e2e/... -args -help +``` + +It is also possible to run the tests which don't depend on hardware +without a pre-configured Kubernetes cluster. Just make sure you have +[Kind](https://kind.sigs.k8s.io/) installed on your host and run: + +``` +$ make test-with-kind +``` + +### Run Controller Tests with a Local Control Plane + +The controller-runtime library provides a package for integration testing by +starting a local control plane. The package is called +[envtest](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest). The +operator uses this package for its integration testing. + +For setting up the environment for testing, `setup-envtest` can be used: + +```bash +$ go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest +$ setup-envtest use +$ KUBEBUILDER_ASSETS=$(setup-envtest use -i -p path ) make envtest +``` +## How to Develop Simple Device Plugins + +To create a simple device plugin without the hassle of developing your own gRPC +server, you can use a package included in this repository called +`github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin`. + +All you have to do is instantiate a `deviceplugin.Manager` and call +its `Run()` method: + +```go +func main() { + ... + + manager := dpapi.NewManager(namespace, plugin) + manager.Run() +} +``` + +The manager's constructor accepts two parameters: + +1. `namespace` which is a string like "color.example.com". All your devices + will be exposed under this name space, e.g. "color.example.com/yellow". + Please note that one device plugin can register many such "colors". + The manager will instantiate multiple gRPC servers for every registered "color". +2. `plugin` which is a reference to an object implementing one mandatory + interface `deviceplugin.Scanner`. + +`deviceplugin.Scanner` defines one method `Scan()` which is called only once +for every device plugin by `deviceplugin.Manager` in a goroutine and operates +in an infinite loop. A `Scan()` implementation scans the host for devices and +sends all found devices to a `deviceplugin.Notifier` instance. The +`deviceplugin.Notifier` is implemented and provided by the `deviceplugin` +package itself. The found devices are organized in an instance of +`deviceplugin.DeviceTree` object. The object is filled in with its +`AddDevice()` method: + +```go +func (dp *devicePlugin) Scan(notifier deviceplugin.Notifier) error { + for { + devTree := deviceplugin.NewDeviceTree() + ... + devTree.AddDevice("yellow", devID, deviceplugin.DeviceInfo{ + State: health, + Nodes: []pluginapi.DeviceSpec{ + { + HostPath: devPath, + ContainerPath: devPath, + Permissions: "rw", + }, + }, + }) + ... + notifier.Notify(devTree) + } +} +``` + +Optionally, your device plugin may also implement the +`deviceplugin.PostAllocator` interface. If implemented, its method +`PostAllocate()` modifies `pluginapi.AllocateResponse` responses just +before they are sent to `kubelet`. To see an example, refer to the FPGA +plugin which implements this interface to annotate its responses. + +In case you want to implement the whole allocation functionality in your +device plugin, you can implement the optional `deviceplugin.Allocator` +interface. In this case `PostAllocate()` is not called. But if you decide in your +implementation of `deviceplugin.Allocator` that you need to resort to the default +implementation of the allocation functionality then return an error of the type +`deviceplugin.UseDefaultMethodError`. + +### Logging + +The framework uses [`klog`](https://github.com/kubernetes/klog) as its logging +framework. It is encouraged for plugins to also use `klog` to maintain uniformity +in the logs and command line options. + +The framework initialises `klog`, so further calls to `klog.InitFlags()` by +plugins should not be necessary. This does add a number of log configuration +options to your plugin, which can be viewed with the `-h` command line option of your +plugin. + +The framework tries to adhere to the Kubernetes +[Logging Conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md). +The advise is to use the `V()` levels for `Info()` calls, as calling `Info()` +with no set level will make configuration and filtering of logging via the command +line more difficult. + +The default is to not log `Info()` calls. This can be changed using the plugin command +line `-v` parameter. The additional annotations prepended to log lines by 'klog' can be disabled +with the `-skip_headers` option. + +### Error Conventions + +The framework has a convention for producing and logging errors. Ideally plugins will also adhere +to the convention. + +Errors generated within the framework and plugins are instantiated with the `New()` and +`Errorf()` functions of the [errors package](https://golang.org/pkg/errors/): + +```golang + return errors.New("error message") +``` + +Errors generated from outside the plugins and framework are augmented with their stack dump with code such as + +```golang + return errors.WithStack(err) +``` + +or + +```golang + return errors.Wrap(err, "some additional error message") +``` + +These errors are then logged using a default struct value format like: + +```golang + klog.Errorf("Example of an internal error death: %+v", err) +``` + +at the line where it's certain that the error cannot be passed out farther nor handled gracefully. +Otherwise, they can be logged as simple values: + +```golang + klog.Warningf("Example of a warning due to an external error: %v", err) +``` + +## Checklist for New Device Plugins + +For new device plugins contributed to this repository, below is a +checklist to get the plugin on par feature and quality wise with +others: + +1. Plugin binary available in [`cmd/`](cmd), its corresponding Dockerfile in [`build/docker/`](build/docker) and deployment Kustomization/YAMLs in [`deployments/`](deployments). +2. Plugin binary Go unit tests implemented and passing with >80% coverage: `make test WHAT=./cmd/`. +3. Plugin binary linter checks passing: `make lint`. +4. Plugin e2e tests implemented in [`test/e2e/`](test/e2e) and passing: `go test -v ./test/e2e/... -args -ginkgo.focus ""`. +5. Plugin CRD API added to [`pkg/apis/deviceplugin/v1`](pkg/apis/deviceplugin/v1) and CRDs generated: `make generate`. +6. Plugin CRD validation tests implemented in [`test/envtest/`](test/envtest) and passing: `make envtest`. +7. Plugin CRD controller implemented in [`pkg/controllers/`](pkg/controllers) and added to the manager in `cmd/operator/main.go`. +8. Plugin documentation written `cmd//README.md` and optionally end to end demos created in [`demo`](demo). diff --git a/0.27/_sources/INSTALL.md.txt b/0.27/_sources/INSTALL.md.txt new file mode 100644 index 000000000..087cb6f5a --- /dev/null +++ b/0.27/_sources/INSTALL.md.txt @@ -0,0 +1,132 @@ +# Installing device plugins to cluster + +## Install device plugins via a DaemonSet + +Each plugin can be installed via a DaemonSet. The install changes slightly based on the desired plugin. See install instructions per [plugin](README.md#plugins). + +Installing plugins via DaemonSets deployes them to the ```default``` (or currently active) namespace. Use kubectl's ```--namespace``` argument to change the deployment namespace. + +## Install device plugins via device plugin operator + +A more advanced install method is via device plugin operator. Operator configures plugin deployments based on the supplied device plugin CRDs (Custom Resource Definitions). See installation instructions in the [operator README](cmd/operator/README.md#installation). + +Operator installs device plugins to the same namespace where the operator itself is deployed. The default operator namespace is ```inteldeviceplugins-system```. + +## Install with HELM charts + +Device plugins can also be installed to a cluster using the device plugin [operator Helm chart](https://github.com/intel/helm-charts/tree/main/charts/device-plugin-operator) (depending on cert-manager and NFD). Individual plugin projects are under https://github.com/intel/helm-charts/tree/main/charts/. + +These steps will install device plugin operator and plugins under ```inteldeviceplugins-system``` namespace. It's possible to change the target namespace by changing the ```--namespace``` value in the helm install command. + +### Installing HELM repositories + +```bash +helm repo add jetstack https://charts.jetstack.io # for cert-manager +helm repo add nfd https://kubernetes-sigs.github.io/node-feature-discovery/charts # for NFD +helm repo add intel https://intel.github.io/helm-charts/ # for device-plugin-operator and plugins +helm repo update +``` + +### Installing cert-manager + +```bash +helm install --wait \ + cert-manager jetstack/cert-manager \ + --namespace cert-manager \ + --create-namespace \ + --version v1.11.0 \ + --set installCRDs=true +``` + +NOTE: cert-manager install takes a while to complete. + +### Installing NFD + +```bash +helm install nfd nfd/node-feature-discovery \ + --namespace node-feature-discovery --create-namespace --version 0.12.1 \ + --set 'master.extraLabelNs={gpu.intel.com,sgx.intel.com}' \ + --set 'master.resourceLabels={gpu.intel.com/millicores,gpu.intel.com/memory.max,gpu.intel.com/tiles,sgx.intel.com/epc}' +``` + +### Installing operator + +```bash +helm install dp-operator intel/intel-device-plugins-operator --namespace inteldeviceplugins-system --create-namespace +``` + +### Installing specific plugins + +Replace PLUGIN with the desired plugin name. At least the following plugins are supported: **gpu, sgx, qat, dlb, dsa & iaa**. + +```bash +helm install intel/intel-device-plugins- --namespace inteldeviceplugins-system --create-namespace \ + --set nodeFeatureRule=true +``` + +### Listing available versions + +Use helm's search functionality to list available versions. + +```bash +helm search repo intel/intel-device-plugins-operator --versions +helm search repo intel/intel-device-plugins- --versions +``` + +For example, operator chart versions with development versions included. +```bash +$ helm search repo intel/intel-device-plugins-operator --versions --devel +NAME CHART VERSION APP VERSION DESCRIPTION +intel/intel-device-plugins-operator 0.26.0 0.26.0 A Helm chart for Intel Device Plugins Operator ... +intel/intel-device-plugins-operator 0.25.1 0.25.1 A Helm chart for Intel Device Plugins Operator ... +intel/intel-device-plugins-operator 0.25.1-helm.0 0.25.0 A Helm chart for Intel Device Plugins Operator ... +intel/intel-device-plugins-operator 0.25.0 0.25.0 A Helm chart for Intel Device Plugins Operator ... +intel/intel-device-plugins-operator 0.24.1 0.24.1 A Helm chart for Intel Device Plugins Operator ... +intel/intel-device-plugins-operator 0.24.1-helm.0 0.24.0 A Helm chart for Intel Device Plugins Operator ... +intel/intel-device-plugins-operator 0.24.0 0.24.0 A Helm chart for Intel Device Plugins Operator ... +``` + +### Customizing plugins + +To customize plugin features, see the available chart values: +```bash +helm show values intel/intel-device-plugins- +``` + +For example, qat plugin has these values: +```bash +$ helm show values intel/intel-device-plugins-qat +name: qatdeviceplugin-sample + +image: + hub: intel + tag: "" + +initImage: + hub: intel + tag: "" + +dpdkDriver: vfio-pci +kernelVfDrivers: + - c6xxvf + - 4xxxvf +maxNumDevices: 128 +logLevel: 4 + +nodeSelector: + intel.feature.node.kubernetes.io/qat: 'true' + +nodeFeatureRule: true +``` + +### Uninstall + +Uninstall each installed component with ```helm uninstall```: + +```bash +# repeat first step as many times as there are plugins installed +helm uninstall -n inteldeviceplugins-system +helm uninstall -n inteldeviceplugins-system dp-operator +helm uninstall -n node-feature-discovery nfd +helm uninstall -n cert-manager cert-manager +``` \ No newline at end of file diff --git a/0.27/_sources/README.md.txt b/0.27/_sources/README.md.txt new file mode 100644 index 000000000..8712b84d9 --- /dev/null +++ b/0.27/_sources/README.md.txt @@ -0,0 +1,310 @@ +# Overview +[![Build Status](https://github.com/intel/intel-device-plugins-for-kubernetes/workflows/CI/badge.svg?branch=main)](https://github.com/intel/intel-device-plugins-for-kubernetes/actions?query=workflow%3ACI) +[![Go Report Card](https://goreportcard.com/badge/github.com/intel/intel-device-plugins-for-kubernetes)](https://goreportcard.com/report/github.com/intel/intel-device-plugins-for-kubernetes) +[![GoDoc](https://godoc.org/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin?status.svg)](https://godoc.org/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin) + +This repository contains a framework for developing plugins for the Kubernetes +[device plugins framework](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/), +along with a number of device plugin implementations utilizing that framework. + +The [v0.27.1 release](https://github.com/intel/intel-device-plugins-for-kubernetes/releases/latest) +is the latest feature release with its documentation available [here](https://intel.github.io/intel-device-plugins-for-kubernetes/0.27/). + +Table of Contents + +* [Prerequisites](#prerequisites) +* [Plugins](#plugins) + * [GPU device plugin](#gpu-device-plugin) + * [FPGA device plugin](#fpga-device-plugin) + * [QAT device plugin](#qat-device-plugin) + * [VPU device plugin](#vpu-device-plugin) + * [SGX device plugin](#sgx-device-plugin) + * [DSA device plugin](#dsa-device-plugin) + * [DLB device plugin](#dlb-device-plugin) + * [IAA device plugin](#iaa-device-plugin) +* [Device Plugins Operator](#device-plugins-operator) +* [XeLink XPU-Manager sidecar](#xelink-xpu-manager-sidecar) +* [Demos](#demos) +* [Workload Authors](#workload-authors) +* [Developers](#developers) +* [Supported Kubernetes versions](#supported-kubernetes-versions) +* [Pre-built plugin images](#pre-built-plugin-images) +* [License](#license) +* [Helm charts](#helm-charts) + +## Prerequisites + +Prerequisites for building and running these device plugins include: + +- Appropriate hardware and drivers +- A fully configured [Kubernetes cluster] +- A working [Go environment], of at least version v1.16. + +## Plugins + +The below sections detail existing plugins developed using the framework. + +### GPU Device Plugin + +The [GPU device plugin](cmd/gpu_plugin/README.md) provides access to +discrete and integrated Intel GPU device files. + +The demo subdirectory contains both a [GPU plugin demo video](demo/readme.md#intel-gpu-device-plugin-demo-video) +and an OpenCL sample deployment (`intelgpu-job.yaml`). + +### FPGA Device Plugin + +The [FPGA device plugin](cmd/fpga_plugin/README.md) supports FPGA passthrough for +the following hardware: + +- Intel® Arria® 10 devices +- Intel® Stratix® 10 devices + +The FPGA plugin comes as three parts. + +- the [device plugin](#device-plugin) +- the [admission controller](#admission-controller) +- the [CRIO-O prestart hook](#cri-o-prestart-hook) + +Refer to each individual sub-components documentation for more details. +Brief overviews of the sub-components are below. + +The demo subdirectory contains a +[video](demo/readme.md#intel-fpga-device-plugin-demo-video) showing deployment +and use of the FPGA plugin. Sources relating to the demo can be found in the +[opae-nlb-demo](demo/opae-nlb-demo) subdirectory. + +#### Device Plugin + +The [FPGA device plugin](cmd/fpga_plugin/README.md) is responsible for +discovering and reporting FPGA devices to `kubelet`. + +#### Admission Controller + +The [FPGA admission controller webhook](cmd/fpga_admissionwebhook/README.md) +is responsible for performing mapping from user-friendly function IDs to the +Interface ID and Bitstream ID that are required for FPGA programming. It also +implements access control by namespacing FPGA configuration information. + +#### CRI-O Prestart Hook + +The [FPGA prestart CRI-O hook](cmd/fpga_crihook/README.md) performs discovery +of the requested FPGA function bitstream and programs FPGA devices based on the +environment variables in the workload description. + +### [QAT](https://developer.intel.com/quickassist) Device Plugin + +The [QAT plugin](cmd/qat_plugin/README.md) supports device plugin for Intel QAT adapters, and includes +code [showing deployment](cmd/qat_plugin/dpdkdrv) via [DPDK](https://doc.dpdk.org/guides/cryptodevs/qat.html). + +The demo subdirectory includes details of both a +[QAT DPDK demo](demo/readme.md#intel-quickassist-technology-device-plugin-with-dpdk-demo-video) +and a [QAT OpenSSL demo](demo/readme.md#intel-quickassist-technology-device-plugin-openssl-demo-video). +Source for the OpenSSL demo can be found in the [relevant subdirectory](demo/openssl-qat-engine). + +Details for integrating the QAT device plugin into [Kata Containers](https://katacontainers.io/) +can be found in the +[Kata Containers documentation repository](https://github.com/kata-containers/kata-containers/blob/main/docs/use-cases/using-Intel-QAT-and-kata.md). + +### VPU Device Plugin + +The [VPU device plugin](cmd/vpu_plugin/README.md) supports Intel VCAC-A card +(https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/media-analytics-vcac-a-accelerator-card-by-celestica-datasheet.pdf) +the card has: +- 1 Intel Core i3-7100U processor +- 12 MyriadX VPUs +- 8GB DDR4 memory + +The demo subdirectory includes details of a OpenVINO deployment and use of the +VPU plugin. Sources can be found in [openvino-demo](demo/ubuntu-demo-openvino). + +### SGX Device Plugin + +The [SGX device plugin](cmd/sgx_plugin/README.md) allows workloads to use +Intel® Software Guard Extensions (Intel® SGX) on +platforms with SGX Flexible Launch Control enabled, e.g.,: + +- 3rd Generation Intel® Xeon® Scalable processor family, code-named “Ice Lake” +- Intel® Xeon® E3 processor +- Intel® NUC Kit NUC7CJYH + +The Intel SGX plugin comes in three parts. + +- the [device plugin](#sgx-plugin) +- the [admission webhook](#sgx-admission-webhook) +- the [SGX EPC memory registration](#sgx-epc-memory-registration) + +The demo subdirectory contains a [video](demo/readme.md#intel-sgx-device-plugin-demo-video) showing the deployment +and use of the Intel SGX device plugin. Sources relating to the demo can be found in the +[sgx-sdk-demo](demo/sgx-sdk-demo) and [sgx-aesmd-demo](demo/sgx-aesmd-demo) subdirectories. + +Brief overviews of the Intel SGX sub-components are given below. + + +#### device plugin + +The [SGX device plugin](cmd/sgx_plugin/README.md) is responsible for discovering +and reporting Intel SGX device nodes to `kubelet`. + +Containers requesting Intel SGX resources in the cluster should not use the +device plugins resources directly. + +#### Intel SGX Admission Webhook + +The Intel SGX admission webhook is responsible for performing Pod mutations based on +the `sgx.intel.com/quote-provider` pod annotation set by the user. The purpose +of the webhook is to hide the details of setting the necessary device resources +and volume mounts for using Intel SGX remote attestation in the cluster. Furthermore, +the Intel SGX admission webhook is responsible for writing a pod/sandbox +`sgx.intel.com/epc` annotation that is used by Kata Containers to dynamically +adjust its virtualized Intel SGX encrypted page cache (EPC) bank(s) size. + +The Intel SGX admission webhook is available as part of +[Intel Device Plugin Operator](cmd/operator/README.md) or +as a standalone [SGX Admission webhook image](cmd/sgx_admissionwebhook/README.md). + +#### Intel SGX EPC memory registration + +The Intel SGX EPC memory available on each node is registered as a Kubernetes extended resource using +node-feature-discovery (NFD). An NFD Node Feature Rule is installed as part of +[SGX device plugin](cmd/sgx_plugin/README.md) +operator deployment and NFD is configured to register the Intel SGX EPC memory +extended resource. + +Containers requesting Intel SGX EPC resources in the cluster use +`sgx.intel.com/epc` resource which is of +type [memory](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory). + +### DSA Device Plugin + +The [DSA device plugin](cmd/dsa_plugin/README.md) supports acceleration using +the Intel Data Streaming accelerator(DSA). + +### DLB Device Plugin + +The [DLB device plugin](cmd/dlb_plugin/README.md) supports Intel Dynamic Load +Balancer accelerator(DLB). + +### IAA Device Plugin + +The [IAA device plugin](cmd/iaa_plugin/README.md) supports acceleration using +the Intel Analytics accelerator(IAA). + +## Device Plugins Operator + +To simplify the deployment of the device plugins, a unified device plugins +operator is implemented. + +Currently the operator has support for the DSA, DLB, FPGA, GPU, IAA, QAT, and +Intel SGX device plugins. Each device plugin has its own custom resource +definition (CRD) and the corresponding controller that watches CRUD operations +to those custom resources. + +The [Device plugins operator README](cmd/operator/README.md) gives the installation and usage details for the community operator available on [operatorhub.io](https://operatorhub.io/operator/intel-device-plugins-operator). + +The [Device plugins Operator for OCP](cmd/operator/ocp_quickstart_guide/README.md) gives the installation and usage details for the operator available on [Red Hat OpenShift Container Platform](https://catalog.redhat.com/software/operators/detail/61e9f2d7b9cdd99018fc5736). + +## XeLink XPU-Manager Sidecar + +To support interconnected GPUs in Kubernetes, XeLink sidecar is needed. + +The [XeLink XPU-Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it. + +## Demos + +The [demo subdirectory](demo/readme.md) contains a number of demonstrations for +a variety of the available plugins. + +## Workload Authors + +For workloads to get accesss to devices managed by the plugins, the +`Pod` spec must specify the hardware resources needed: + +``` +spec: + containers: + - name: demo-container + image: /: + resources: + limits: + /: X +``` + +The summary of resources available via plugins in this repository is given in the list below. + +**Device Namespace : Registered Resource(s)** + * `dlb.intel.com` : `pf` or `vf` + * [dlb-libdlb-demo-pod.yaml](demo/dlb-libdlb-demo-pod.yaml) + * `dsa.intel.com` : `wq-user-[shared or dedicated]` + * [dsa-accel-config-demo-pod.yaml](demo/dsa-accel-config-demo-pod.yaml) + * `fpga.intel.com` : custom, see [mappings](cmd/fpga_admissionwebhook/README.md#mappings) + * [intelfpga-job.yaml](demo/intelfpga-job.yaml) + * `gpu.intel.com` : `i915` + * [intelgpu-job.yaml](demo/intelgpu-job.yaml) + * `iaa.intel.com` : `wq-user-[shared or dedicated]` + * [iaa-qpl-demo-pod.yaml](demo/iaa-qpl-demo-pod.yaml) + * `qat.intel.com` : `generic` or `cy`/`dc` + * [crypto-perf-dpdk-pod-requesting-qat.yaml](deployments/qat_dpdk_app/base/crypto-perf-dpdk-pod-requesting-qat.yaml) + * `sgx.intel.com` : `epc` + * [intelsgx-job.yaml](deployments/sgx_enclave_apps/base/intelsgx-job.yaml) + * `vpu.intel.com` : `hddl` + * [intelvpu-job.yaml](demo/intelvpu-job.yaml) + +## Developers + +For information on how to develop a new plugin using the framework or work on development task in +this repository, see the [Developers Guide](DEVEL.md). + +## Supported Kubernetes Versions + +Releases are made under the github [releases area](https://github.com/intel/intel-device-plugins-for-kubernetes/releases). Supported releases and +matching Kubernetes versions are listed below: + +| Branch | Kubernetes branch/version | Status | +|:------------------|:-------------------------------|:------------| +| release-0.27 | Kubernetes 1.27 branch v1.27.x | supported | +| release-0.26 | Kubernetes 1.26 branch v1.26.x | supported | +| release-0.25 | Kubernetes 1.25 branch v1.25.x | supported | +| release-0.24 | Kubernetes 1.24 branch v1.24.x | unsupported | +| release-0.23 | Kubernetes 1.23 branch v1.23.x | unsupported | +| release-0.22 | Kubernetes 1.22 branch v1.22.x | unsupported | +| release-0.21 | Kubernetes 1.21 branch v1.21.x | unsupported | +| release-0.20 | Kubernetes 1.20 branch v1.20.x | unsupported | +| release-0.19 | Kubernetes 1.19 branch v1.19.x | unsupported | +| release-0.18 | Kubernetes 1.18 branch v1.18.x | unsupported | +| release-0.17 | Kubernetes 1.17 branch v1.17.x | unsupported | +| release-0.15 | Kubernetes 1.15 branch v1.15.x | unsupported | +| release-0.11 | Kubernetes 1.11 branch v1.11.x | unsupported | + +[Go environment]: https://golang.org/doc/install +[Kubernetes cluster]: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ + +## Pre-built plugin images + +Pre-built images of the plugins are available on the Docker hub. These images +are automatically built and uploaded to the hub from the latest main branch of +this repository. + +Release tagged images of the components are also available on the Docker hub, +tagged with their release version numbers in the format x.y.z, corresponding to +the branches and releases in this repository. + +**Note:** the default deployment files and operators are configured with +[imagePullPolicy](https://kubernetes.io/docs/concepts/containers/images/#updating-images) +```IfNotPresent``` and can be changed with ```scripts/set-image-pull-policy.sh```. + +## License + +All of the source code required to build intel-device-plugins-for-kubernetes +is available under Open Source licenses. The source code files identify external Go +modules used. Binaries are distributed as container images on +DockerHub*. Those images contain license texts and source code under `/licenses`. + +### Helm Charts + +Device Plugins Helm Charts are located in Intel Helm Charts repository [Intel Helm Charts](https://github.com/intel/helm-charts). This is another way of distributing Kubernetes resources of the device plugins framework. + +To add repo: +``` +helm repo add intel https://intel.github.io/helm-charts +``` diff --git a/0.27/_sources/SECURITY.md.txt b/0.27/_sources/SECURITY.md.txt new file mode 100644 index 000000000..e342bcacb --- /dev/null +++ b/0.27/_sources/SECURITY.md.txt @@ -0,0 +1,13 @@ +**Reporting a Potential Security Vulnerability**: If you have discovered +potential security vulnerability in this project, please send an e-mail to +secure@intel.com. Encrypt sensitive information using our +[PGP public key](https://www.intel.com/content/www/us/en/security-center/pgp-public-key.html). + +Please provide as much information as possible, including: + - The projects and versions affected + - Detailed description of the vulnerability + - Information on known exploits + +A member of the Intel Product Security Team will review your e-mail and +contact you to collaborate on resolving the issue. For more information on +how Intel works to resolve security issues, see [Vulnerability Handling Guidelines](https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html). diff --git a/0.27/_sources/cmd/dlb_plugin/README.md.txt b/0.27/_sources/cmd/dlb_plugin/README.md.txt new file mode 100644 index 000000000..d9abe2ef0 --- /dev/null +++ b/0.27/_sources/cmd/dlb_plugin/README.md.txt @@ -0,0 +1,228 @@ +# Intel DLB device plugin for Kubernetes + +Table of Contents + +* [Introduction](#introduction) +* [Installation](#installation) + * [Pre-built Images](#pre-built-images) + * [Verify Plugin Registration](#verify-plugin-registration) +* [Testing and Demos](#testing-and-demos) + +## Introduction + +This Intel DLB device plugin provides support for [Intel DLB](https://builders.intel.com/docs/networkbuilders/SKU-343247-001US-queue-management-and-load-balancing-on-intel-architecture.pdf) devices under Kubernetes. + +### DLB2 driver configuration for PFs +The DLB device plugin requires a Linux Kernel DLB driver to be installed and enabled to operate. Get [DLB software release](https://www.intel.com/content/www/us/en/download/686372/intel-dynamic-load-balancer.html), build and load the dlb2 driver module following the instruction of 'DLB_Driver_User_Guide.pdf' in the directory 'dlb/docs'. + +After successfully loading the module, available dlb device nodes are visible in devfs. +```bash +$ ls -1 /dev/dlb* +/dev/dlb0 /dev/dlb1 /dev/dlb2 ... +``` + +### VF configuration using a DPDK tool (but with dlb2 driver) +If you configure SR-IOV/VF (virtual functions), continue the following configurations. This instruction uses DPDK tool to check eventdev devices, unbind a VF device, and bind dlb2 driver to a VF device. + +Patch dpdk sources to work with DLB: +```bash +$ wget -q https://fast.dpdk.org/rel/dpdk-21.11.tar.xz -O- | tar -Jx +$ wget -q https://downloadmirror.intel.com/763709/dlb_linux_src_release8.0.0.txz -O- | tar -Jx +$ cd ./dpdk-*/ && patch -p1 < ../dlb/dpdk/dpdk_dlb_*_diff.patch +$ sed -i 's/270b,2710,2714/270b,2710,2711,2714/g' ./usertools/dpdk-devbind.py +``` + +List eventdev devices: +```bash +$ ./usertools/dpdk-devbind.py -s | grep -A10 ^Eventdev +Eventdev devices using kernel driver +==================================== +0000:6d:00.0 'Device 2710' drv=dlb2 unused= +0000:72:00.0 'Device 2710' drv=dlb2 unused= +... +``` + +Enable virtual functions: +```bash +$ echo 4 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/sriov_numvfs +``` +> **Note:**: If it fails saying "No such file or directory," it may be bound to vfio-pci driver. Bind the device to dlb2 driver. + +Check if new dlb device nodes appear: +```bash +$ ls -1 /dev/dlb* +/dev/dlb0 /dev/dlb1 /dev/dlb10 /dev/dlb11 ... /dev/dlb8 /dev/dlb9 +``` + +Check that new eventdev devices appear: +```bash +$ ./usertools/dpdk-devbind.py -s | grep -A14 ^Eventdev +Eventdev devices using kernel driver +==================================== +0000:6d:00.0 'Device 2710' drv=dlb2 unused= +0000:6d:00.1 'Device 2711' drv=dlb2 unused= +0000:6d:00.2 'Device 2711' drv=dlb2 unused= +0000:6d:00.3 'Device 2711' drv=dlb2 unused= +0000:6d:00.4 'Device 2711' drv=dlb2 unused= +0000:72:00.0 'Device 2710' drv=dlb2 unused= +... +``` + +Assign PF resources to VF: +> **Note:**: The process below is only for the first vf resource among 4 resources. Repeat for other vfN_resources in /sys/bus/pci/devices/0000\:6d\:00.0/, and then bind dlb2 driver to 0000:6d:00.M that corresponds to vfN_resources. + +- Unbind driver from the VF device before configuring it. +```bash +$ sudo ./usertools/dpdk-devbind.py --unbind 0000:6d:00.1 +``` + +- Assign PF resources to VF: +```bash +$ echo 2048 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_atomic_inflights && + echo 2048 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_dir_credits && + echo 64 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_dir_ports && + echo 2048 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_hist_list_entries && + echo 8192 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_ldb_credits && + echo 64 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_ldb_ports && + echo 32 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_ldb_queues && + echo 32 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_sched_domains && + echo 2 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_sn0_slots && + echo 2 | sudo tee -a /sys/bus/pci/devices/0000\:6d\:00.0/vf0_resources/num_sn1_slots +``` + +- Bind driver back to the VF device: +```bash +$ sudo ./usertools/dpdk-devbind.py --bind dlb2 0000:6d:00.1 +``` + + +### Verification of well-configured devices: +Run libdlb example app: +> **Note:**: Alternative way is to use this [Dockerfile](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/dlb-libdlb-demo/Dockerfile) for running tests. + +```bash +$ ls +dlb dpdk-21.11 +$ cd ./dlb/libdlb/ && make && sudo LD_LIBRARY_PATH=$PWD ./examples/dir_traffic -n 128 -d 1 +# For running test for /dev/dlbN, replace 1 with N. +``` + +Run dpdk example app: +> **Note:**: Alternative way is to use this [Dockerfile](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/dlb-dpdk-demo/Dockerfile) for patching and building DPDK and running tests. + +- Install build dependencies and build dpdk: +```bash +$ sudo apt-get update && sudo apt-get install build-essential meson python3-pyelftools libnuma-dev python3-pip && sudo pip install ninja +# This configuration is based on Ubuntu/Debian distribution. For other distributions that do not use apt, install the dependencies using another way. +$ ls +dlb dpdk-21.11 +$ cd ./dpdk-* && meson setup --prefix $(pwd)/installdir builddir && ninja -C builddir install +``` + +- Run eventdev test +```bash +sudo ./builddir/app/dpdk-test-eventdev --no-huge --vdev='dlb2_event,dev_id=1' -- --test=order_queue --nb_flows 64 --nb_pkts 512 --plcores 1 --wlcores 2-7 +# For running test for /dev/dlbN, replace 1 with N. +``` + +## Installation + +The following sections detail how to obtain, build, deploy and test the DLB device plugin. + +Examples are provided showing how to deploy the plugin either using a DaemonSet or by hand on a per-node basis. + +### Pre-built Images + +[Pre-built images](https://hub.docker.com/r/intel/intel-dlb-plugin) +of this component are available on the Docker hub. These images are automatically built and uploaded +to the hub from the latest main branch of this repository. + +Release tagged images of the components are also available on the Docker hub, tagged with their +release version numbers in the format `x.y.z`, corresponding to the branches and releases in this +repository. Thus the easiest way to deploy the plugin in your cluster is to run this command + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/dlb_plugin?ref=' +daemonset.apps/intel-dlb-plugin created +``` + +Where `` needs to be substituted with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +Nothing else is needed. See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. + +### Verify Plugin Registration + +You can verify the plugin has been registered with the expected nodes by searching for the relevant +resource allocation status on the nodes: + +```bash +$ kubectl get nodes -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{range $k,$v:=.status.allocatable}}{{" "}}{{$k}}{{": "}}{{$v}}{{"\n"}}{{end}}{{end}}' | grep '^\([^ ]\)\|\( dlb\)' +master + dlb.intel.com/pf: 7 + dlb.intel.com/vf: 4 +``` + +## Testing and Demos + +We can test the plugin is working by deploying the provided example test images (dlb-libdlb-demo and dlb-dpdk-demo). + +1. Build a Docker image and create a pod running unit tests off the local Docker image: + + ```bash + $ make dlb-libdlb-demo + ... + Successfully tagged intel/dlb-libdlb-demo:devel + + $ kubectl apply -f ${INTEL_DEVICE_PLUGINS_SRC}/demo/dlb-libdlb-demo-pod.yaml + pod/dlb-libdlb-demo-pod created + ``` + + ```bash + $ make dlb-dpdk-demo + ... + Successfully tagged intel/dlb-dpdk-demo:devel + + $ kubectl apply -f ${INTEL_DEVICE_PLUGINS_SRC}/demo/dlb-dpdk-demo-pod.yaml + pod/dlb-dpdk-demo-pod created + ``` + +1. Wait until pod is completed: + + ```bash + $ kubectl get pods | grep dlb-.*-demo + NAME READY STATUS RESTARTS AGE + dlb-dpdk-demo 0/2 Completed 0 79m + dlb-libdlb-demo 0/2 Completed 0 18h + ``` + +1. Review the job's logs: + + ```bash + $ kubectl logs dlb-libdlb-demo + + ``` + + ```bash + $ kubectl logs dlb-dpdk-demo + + ``` + + If the pod did not successfully launch, possibly because it could not obtain the DLB + resource, it will be stuck in the `Pending` status: + + ```bash + $ kubectl get pods + NAME READY STATUS RESTARTS AGE + dlb-dpdk-demo 0/2 Pending 0 3s + dlb-libdlb-demo 0/2 Pending 0 10s + ``` + + This can be verified by checking the Events of the pod: + + ```bash + $ kubectl describe pod dlb-libdlb-demo | grep -A3 Events: + Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedScheduling 85s default-scheduler 0/1 nodes are available: 1 Insufficient dlb.intel.com/pf, 1 Insufficient dlb.intel.com/vf. + ``` diff --git a/0.27/_sources/cmd/dsa_plugin/README.md.txt b/0.27/_sources/cmd/dsa_plugin/README.md.txt new file mode 100644 index 000000000..416ddf9cc --- /dev/null +++ b/0.27/_sources/cmd/dsa_plugin/README.md.txt @@ -0,0 +1,135 @@ +# Intel DSA device plugin for Kubernetes + +Table of Contents + +* [Introduction](#introduction) +* [Installation](#installation) + * [Pre-built Images](#pre-built-images) + * [Verify Plugin Registration](#verify-plugin-registration) +* [Testing and Demos](#testing-and-demos) + +## Introduction + +The DSA device plugin for Kubernetes supports acceleration using the Intel Data Streaming accelerator(DSA). + +The DSA plugin discovers DSA work queues and presents them as a node resources. + +The DSA plugin and operator optionally support provisioning of DSA devices and workqueues with the help of [accel-config](https://github.com/intel/idxd-config) utility through initcontainer. + +## Installation + +The following sections detail how to use the DSA device plugin. + +### Pre-built Images + +[Pre-built images](https://hub.docker.com/r/intel/intel-dsa-plugin) +of this component are available on the Docker hub. These images are automatically built and uploaded +to the hub from the latest main branch of this repository. + +Release tagged images of the components are also available on the Docker hub, tagged with their +release version numbers in the format `x.y.z`, corresponding to the branches and releases in this +repository. Thus the easiest way to deploy the plugin in your cluster is to run this command + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/dsa_plugin?ref=' +daemonset.apps/intel-dsa-plugin created +``` + +Where `` needs to be substituted with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +Nothing else is needed. See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. + +#### Automatic Provisioning + +There's a sample [idxd initcontainer](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/build/docker/intel-idxd-config-initcontainer.Dockerfile) included that provisions DSA devices and workqueues (1 engine / 1 group / 1 wq (user/dedicated)), to deploy: + +```bash +$ kubectl apply -k deployments/dsa_plugin/overlays/dsa_initcontainer/ +``` + +The provisioning [script](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/idxd-init.sh) and [template](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/master/demo/dsa.conf) are available for customization. + +The provisioning config can be optionally stored in the ProvisioningConfig configMap which is then passed to initcontainer through the volume mount. + +There's also a possibility for a node specific congfiguration through passing a nodename via NODE_NAME into initcontainer's environment and passing a node specific profile via configMap volume mount. + +To create a custom provisioning config: + +```bash +$ kubectl create configmap --namespace=inteldeviceplugins-system intel-dsa-config --from-file=demo/dsa.conf +``` + +### Verify Plugin Registration +You can verify the plugin has been registered with the expected nodes by searching for the relevant +resource allocation status on the nodes: + +```bash +$ kubectl get nodes -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{range $k,$v:=.status.allocatable}}{{" "}}{{$k}}{{": "}}{{$v}}{{"\n"}}{{end}}{{end}}' | grep '^\([^ ]\)\|\( dsa\)' +master + dsa.intel.com/wq-user-dedicated: 2 + dsa.intel.com/wq-user-shared: 8 +node1 + dsa.intel.com/wq-user-dedicated: 4 + dsa.intel.com/wq-user-shared: 20 +``` + +## Testing and Demos + +We can test the plugin is working by deploying the provided example accel-config test image. + +1. Build a Docker image with an accel-config tests: + + ```bash + $ make accel-config-demo + ... + Successfully tagged accel-config-demo:devel + ``` + +1. Create a pod running unit tests off the local Docker image: + + ```bash + $ kubectl apply -f ${INTEL_DEVICE_PLUGINS_SRC}/demo/dsa-accel-config-demo-pod.yaml + pod/dsa-accel-config-demo created + ``` + +1. Wait until pod is completed: + + ```bash + $ kubectl get pods |grep dsa-accel-config-demo + dsa-accel-config-demo 0/1 Completed 0 31m + +1. Review the job's logs: + + ```bash + $ kubectl logs dsa-accel-config-demo | tail + [debug] PF in sub-task[6], consider as passed + [debug] PF in sub-task[7], consider as passed + [debug] PF in sub-task[8], consider as passed + [debug] PF in sub-task[9], consider as passed + [debug] PF in sub-task[10], consider as passed + [debug] PF in sub-task[11], consider as passed + [debug] PF in sub-task[12], consider as passed + [debug] PF in sub-task[13], consider as passed + [debug] PF in sub-task[14], consider as passed + [debug] PF in sub-task[15], consider as passed + ``` + + If the pod did not successfully launch, possibly because it could not obtain the DSA + resource, it will be stuck in the `Pending` status: + + ```bash + $ kubectl get pods + NAME READY STATUS RESTARTS AGE + dsa-accel-config-demo 0/1 Pending 0 7s + ``` + + This can be verified by checking the Events of the pod: + + ```bash + + $ kubectl describe pod dsa-accel-config-demo | grep -A3 Events: + Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedScheduling 2m26s default-scheduler 0/1 nodes are available: 1 Insufficient dsa.intel.com/wq-user-dedicated, 1 Insufficient dsa.intel.com/wq-user-shared. + ``` diff --git a/0.27/_sources/cmd/fpga_admissionwebhook/README.md.txt b/0.27/_sources/cmd/fpga_admissionwebhook/README.md.txt new file mode 100644 index 000000000..89daa3b0d --- /dev/null +++ b/0.27/_sources/cmd/fpga_admissionwebhook/README.md.txt @@ -0,0 +1,186 @@ +# Intel FPGA admission controller for Kubernetes + +Table of Contents + +* [Introduction](#introduction) +* [Dependencies](#dependencies) +* [Installation](#installation) + * [Pre-requisites](#pre-requisites) + * [Mappings](#mappings) + * [Deployment](#deployment) + * [Webhook deployment](#webhook-deployment) + * [Mappings deployment](#mappings-deployment) +* [Next steps](#next-steps) + +## Introduction + +The FPGA admission controller is one of the components used to add support for Intel FPGA +devices to Kubernetes. + +> **NOTE:** Installation of the FPGA admission controller can be skipped if the +> [FPGA device plugin](../fpga_plugin/README.md) is operated with the Intel Device Plugins Operator +> since it integrates the controller's functionality. + +The FPGA admission controller webhook is responsible for performing mapping from user-friendly +function IDs to the Interface ID and Bitstream ID that are required for FPGA programming by +the [FPGA CRI-O hook](../fpga_crihook/README.md). + +Mappings are stored in namespaced custom resource definition (CRD) objects, therefore the admission +controller also performs access control, determining which bitstream can be used for which namespace. +More details can be found in the [Mappings](#mappings) section. + +The admission controller also keeps the user from bypassing namespaced mapping restrictions, +by denying admission of any pods that are trying to use internal knowledge of InterfaceID or +Bitstream ID environment variables used by the prestart hook. + +## Dependencies + +This component is one of a set of components that work together. You may also want to +install the following: + +- [FPGA device plugin](../fpga_plugin/README.md) +- [FPGA prestart CRI-O hook](../fpga_crihook/README.md) + +All components have the same basic dependencies as the +[generic plugin framework dependencies](../../README.md#about) + +## Installation + +The following sections detail how to obtain, build and deploy the admission +controller webhook plugin. + +### Pre-requisites + +The default webhook deployment depends on having [cert-manager](https://cert-manager.io/) +installed. See its installation instructions [here](https://cert-manager.io/docs/installation/kubectl/). + +Also if your cluster operates behind a corporate proxy make sure that the API +server is configured not to send requests to cluster services through the +proxy. You can check that with the following command: + +```bash +$ kubectl describe pod kube-apiserver --namespace kube-system | grep -i no_proxy | grep "\.svc" +``` + +In case there's no output and your cluster was deployed with `kubeadm` open +`/etc/kubernetes/manifests/kube-apiserver.yaml` at the control plane nodes and +append `.svc` and `.svc.cluster.local` to the `no_proxy` environment variable: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + ... +spec: + containers: + - command: + - kube-apiserver + - --advertise-address=10.237.71.99 + ... + env: + - name: http_proxy + value: http://proxy.host:8080 + - name: https_proxy + value: http://proxy.host:8433 + - name: no_proxy + value: 127.0.0.1,localhost,.example.com,10.0.0.0/8,.svc,.svc.cluster.local + ... +``` + +**Note:** To build clusters using `kubeadm` with the right `no_proxy` settings from the very beginning, +set the cluster service names to `$no_proxy` before `kubeadm init`: + +``` +$ export no_proxy=$no_proxy,.svc,.svc.cluster.local +``` + +## Mappings + +Mappings is a an essential part of the setup that gives a flexible instrument to a cluster +administrator to manage FPGA bitstreams and to control access to them. Being a set of +custom resource definitions they are used to configure the way FPGA resource requests get +translated into actual resources provided by the cluster. + +For the following mapping + +```yaml +apiVersion: fpga.intel.com/v2 +kind: AcceleratorFunction +metadata: + name: arria10.dcp1.2-nlb0-preprogrammed +spec: + afuId: d8424dc4a4a3c413f89e433683f9040b + interfaceId: 69528db6eb31577a8c3668f9faa081f6 + mode: af +``` + +requested FPGA resources are translated to AF resources. For example, +`fpga.intel.com/arria10.dcp1.2-nlb0-preprogrammed` is translated to +`fpga.intel.com/af-695.d84.aVKNtusxV3qMNmj5-qCB9thCTcSko8QT-J5DNoP5BAs` where the `af-` +prefix indicates the plugin's mode (`af`), `695` is the first three characters of +the region interface ID, `d84` is the first three characters of the accelerator function ID +and the last part `aVKNtusxV3qMNmj5-qCB9thCTcSko8QT-J5DNoP5BAs` is a base64-encoded concatenation +of the full region interface ID and accelerator function ID. +The format of resource names (e.g. `arria10.dcp1.2-nlb0-preprogrammed`) can be any and is up +to a cluster administrator. + +The same mapping, but with its mode field set to `region`, would translate +`fpga.intel.com/arria10.dcp1.2-nlb0-preprogrammed` to `fpga.intel.com/region-69528db6eb31577a8c3668f9faa081f6`, +and the corresponding AF IDs are set in environment variables for the container. +Though in this case the cluster administrator would probably want to rename +the mapping `arria10.dcp1.2-nlb0-preprogrammed` to something like `arria10.dcp1.2-nlb0-orchestrated` +to reflect its mode. The [FPGA CRI-O hook](../fpga_crihook/README.md) then loads the requested +bitstream to a region before the container is started. + +Mappings of resource names are configured with objects of `AcceleratorFunction` and +`FpgaRegion` custom resource definitions found respectively in +[`./deployment/fpga_admissionwebhook/crd/bases/fpga.intel.com_af.yaml`](/deployments/fpga_admissionwebhook/crd/bases/fpga.intel.com_acceleratorfunctions.yaml) +and [`./deployment/fpga_admissionwebhook/crd/bases/fpga.intel.com_region.yaml`](/deployments/fpga_admissionwebhook/crd/bases/fpga.intel.com_fpgaregions.yaml). + +Example mappings between 'names' and 'ID's are controlled by the admission controller mappings collection file found in +[`./deployments/fpga_admissionwebhook/mappings-collection.yaml`](/deployments/fpga_admissionwebhook/mappings-collection.yaml). + + +### Deployment + +#### Webhook deployment + +To deploy the webhook, run + +```bash +$ kubectl apply -k https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/fpga_admissionwebhook/default?ref=main +namespace/intelfpgawebhook-system created +customresourcedefinition.apiextensions.k8s.io/acceleratorfunctions.fpga.intel.com created +customresourcedefinition.apiextensions.k8s.io/fpgaregions.fpga.intel.com created +mutatingwebhookconfiguration.admissionregistration.k8s.io/intelfpgawebhook-mutating-webhook-configuration created +clusterrole.rbac.authorization.k8s.io/intelfpgawebhook-manager-role created +clusterrolebinding.rbac.authorization.k8s.io/intelfpgawebhook-manager-rolebinding created +service/intelfpgawebhook-webhook-service created +deployment.apps/intelfpgawebhook-webhook created +certificate.cert-manager.io/intelfpgawebhook-serving-cert created +issuer.cert-manager.io/intelfpgawebhook-selfsigned-issuer created +``` + +#### Mappings deployment + +Mappings deployment is a mandatory part of the webhook deployment. You should +prepare and deploy mappings that describe FPGA bitstreams available in your cluster. + +Example mappings collection [`./deployments/fpga_admissionwebhook/mappings-collection.yaml`](/deployments/fpga_admissionwebhook/mappings-collection.yaml) +can be used as an example for cluster mappings. This collection is not intended to be deployed as is, +it should be used as a reference and example of your own cluster mappings. + +To deploy the mappings, run + +```bash +$ kubectl apply -f + +``` + +Note that the mappings are scoped to the namespaces they were created in +and they are applicable to pods created in the corresponding namespaces. + + +## Next steps + +Continue with [FPGA prestart CRI-O hook](../fpga_crihook/README.md). diff --git a/0.27/_sources/cmd/fpga_crihook/README.md.txt b/0.27/_sources/cmd/fpga_crihook/README.md.txt new file mode 100644 index 000000000..59c26fd8a --- /dev/null +++ b/0.27/_sources/cmd/fpga_crihook/README.md.txt @@ -0,0 +1,49 @@ +# Intel FPGA prestart CRI-O webhook for Kubernetes + +Table of Contents + +* [Introduction](#introduction) +* [Dependencies](#dependencies) +* [Configuring CRI-O](#configuring-cri-o) + +## Introduction + +The FPGA CRI-O webhook is one of the components used to add support for Intel FPGA +devices to Kubernetes. + +The FPGA prestart CRI-O hook is triggered by container annotations, such as set by the +[FPGA device plugin](../fpga_plugin/README.md). It performs discovery of the requested FPGA +function bitstream and then programs FPGA devices based on the environment variables +in the workload description. + +The CRI-O prestart hook is only *required* when the +[FPGA admission webhook](../fpga_admissionwebhook/README.md) is configured for orchestration +programmed mode, and is benign (un-used) otherwise. + +> **Note:** The fpga CRI-O webhook is usually installed by the same DaemonSet as the +> FPGA device plugin. If building and installing the CRI-O webhook by hand, it is +> recommended you reference the +> [fpga plugin DaemonSet YAML](/deployments/fpga_plugin/base/intel-fpga-plugin-daemonset.yaml ) for +> more details. + +## Dependencies + +This component is one of a set of components that work together. You may also want to +install the following: + +- [FPGA device plugin](../fpga_plugin/README.md) +- [FPGA admission controller](../fpga_admissionwebhook/README.md) + +All components have the same basic dependencies as the +[generic plugin framework dependencies](../../README.md#about) + +See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the CRI hook. + +## Configuring CRI-O + +Recent versions of [CRI-O](https://github.com/cri-o/cri-o) are shipped with default configuration +file that prevents CRI-O to discover and configure hooks automatically. +For FPGA orchestration programmed mode, the OCI hooks are the key component. +Please ensure that your `/etc/crio/crio.conf` parameter `hooks_dir` is either unset +(to enable default search paths for OCI hooks configuration) or contains the directory +`/etc/containers/oci/hooks.d`. diff --git a/0.27/_sources/cmd/fpga_plugin/README.md.txt b/0.27/_sources/cmd/fpga_plugin/README.md.txt new file mode 100644 index 000000000..b726e461c --- /dev/null +++ b/0.27/_sources/cmd/fpga_plugin/README.md.txt @@ -0,0 +1,228 @@ +# Intel FPGA device plugin for Kubernetes + +Table of Contents + +* [Introduction](#introduction) + * [Component Overview](#component-overview) +* [Modes and Configuration Options](#modes-and-configuration-options) +* [Installation](#installation) + * [Prerequisites](#prerequisites) + * [Pre-built Images](#pre-built-images) + * [Verify Plugin Registration](#verify-plugin-registration) + +## Introduction + +This FPGA device plugin is part of a collection of Kubernetes components found within this +repository that enable integration of Intel FPGA hardware into Kubernetes. + +The following hardware platforms are supported: + +- Intel Arria 10 +- Intel Stratix 10 + +The components support the [Open Programmable Acceleration Engine (OPAE)](https://opae.github.io/latest/index.html) +interface. + +The components together implement the following features: + +- discovery of pre-programmed accelerator functions +- discovery of programmable regions +- orchestration of FPGA programming +- access control for FPGA hardware + +### Component Overview + +The following components are part of this repository, and work together to support Intel FPGAs under +Kubernetes: + +- [FPGA device plugin](README.md) (this component) + + A Kubernetes [device plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/) + that discovers available FPGA resources on a node and advertises them to the Kubernetes control plane + via the node kubelet. + +- [FPGA admission controller webhook](../fpga_admissionwebhook/README.md) + + A Kubernetes [admission controller webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) + which can be used to dynamically convert logical resource names in pod specifications into actual FPGA + resource names, as advertised by the device plugin. + + The webhook can also set environment variables to instruct the CRI-O prestart hook to program the FPGA + before launching the container. + + > **NOTE:** Installation of the [FPGA admission controller webhook](../fpga_admissionwebhook/README.md) can be skipped if the + > FPGA device plugin is operated with the Intel Device Plugins Operator + > since it integrates the controller's functionality. + > However, [the mappings](../fpga_admissionwebhook/README.md#mappings-deployment) still must be deployed." + +- [FPGA CRI-O prestart hook](../fpga_crihook/README.md) + + A [CRI-O](https://github.com/cri-o/cri-o) prestart hook that, upon instruction from the FPGA admission + controller, allocates and programs the FPGA before the container is launched. + +The repository also contains an [FPGA helper tool](../fpga_tool/README.md) that may be useful during +development, initial deployment and debugging. + +### Modes and Configuration Options + +The FPGA plugin set can run in one of two modes: + +- `region` mode, where the plugins locate and advertise + regions of the FPGA, and facilitate programing of those regions with the + requested bistreams. +- `af` mode, where the FPGA bitstreams are already loaded + onto the FPGA, and the plugins discover and advertises the existing + Accelerator Functions (AF). + +The example YAML deployments described in this document only currently support +`af` mode. To utilise `region` mode, either modify the existing YAML appropriately, +or deploy 'by hand'. + +Overview diagrams of `af` and `region` modes are below: + +region mode: + +![Overview of `region` mode](pictures/FPGA-region.png) + +af mode: + +![Overview of `af` mode](pictures/FPGA-af.png) + +## Installation + +The below sections cover how to use this component. + +### Prerequisites + +All components have the same basic dependencies as the +[generic plugin framework dependencies](../../README.md#about) + +To obtain a fully operational FPGA enabled cluster, you must install all three +major components: + +- [FPGA device plugin](README.md) (this component) +- [FPGA admission controller webhook](../fpga_admissionwebhook/README.md) +- [FPGA prestart CRI-O hook](../fpga_crihook/README.md) + +The CRI-O hook is only *required* if `region` mode is being used, but is installed by default by the +[FPGA plugin DaemonSet YAML](/deployments/fpga_plugin/base/intel-fpga-plugin-daemonset.yaml), and is benign +in `af` mode. + +If using the `af` mode, and therefore *not* using the +CRI-O prestart hook, runtimes other than CRI-O can be used (that is, the CRI-O hook presently +*only* works with the CRI-O runtime). + +The FPGA device plugin requires a Linux Kernel FPGA driver to be installed and enabled to +operate. The plugin supports the use of either of following two drivers, and auto detects +which is present and thus to use: + +- The Linux Kernel in-tree [DFL](https://www.kernel.org/doc/html/latest/fpga/dfl.html) driver +- The out of tree [OPAE](https://opae.github.io/latest/docs/drv_arch/drv_arch.html) driver + +Install this component (FPGA device plugin) first, and then follow the links +and instructions to install the other components. + +The FPGA webhook deployment depends on having [cert-manager](https://cert-manager.io/) +installed. See its installation instructions [here](https://cert-manager.io/docs/installation/kubectl/). + +```bash +$ kubectl get pods -n cert-manager +NAME READY STATUS RESTARTS AGE +cert-manager-7747db9d88-bd2nl 1/1 Running 0 1m +cert-manager-cainjector-87c85c6ff-59sb5 1/1 Running 0 1m +cert-manager-webhook-64dc9fff44-29cfc 1/1 Running 0 1m + +``` + +### Pre-built Images + +Pre-built images of the components are available on the [Docker hub](https://hub.docker.com/u/intel). +These images are automatically built and uploaded to the hub from the latest `main` branch of +this repository. + +Release tagged images of the components are also available on the Docker hub, tagged with their +release version numbers (of the form `x.y.z`, matching the branch/tag release number in this repo). + +The following images are available on the Docker hub: + +- [The FPGA plugin](https://hub.docker.com/r/intel/intel-fpga-plugin) +- [The FPGA admisson webhook](https://hub.docker.com/r/intel/intel-fpga-admissionwebhook) +- [The FPGA CRI-O prestart hook (in the `initcontainer` image)](https://hub.docker.com/r/intel/intel-fpga-initcontainer) + +Depending on the FPGA mode, run either +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/fpga_plugin/overlays/af?ref=' +namespace/intelfpgaplugin-system created +customresourcedefinition.apiextensions.k8s.io/acceleratorfunctions.fpga.intel.com created +customresourcedefinition.apiextensions.k8s.io/fpgaregions.fpga.intel.com created +mutatingwebhookconfiguration.admissionregistration.k8s.io/intelfpgaplugin-mutating-webhook-configuration created +clusterrole.rbac.authorization.k8s.io/intelfpgaplugin-manager-role created +clusterrole.rbac.authorization.k8s.io/intelfpgaplugin-node-getter created +clusterrolebinding.rbac.authorization.k8s.io/intelfpgaplugin-get-nodes created +clusterrolebinding.rbac.authorization.k8s.io/intelfpgaplugin-manager-rolebinding created +service/intelfpgaplugin-webhook-service created +deployment.apps/intelfpgaplugin-webhook created +daemonset.apps/intelfpgaplugin-fpgadeviceplugin created +certificate.cert-manager.io/intelfpgaplugin-serving-cert created +issuer.cert-manager.io/intelfpgaplugin-selfsigned-issuer created +``` +or +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/fpga_plugin/overlays/region?ref=' +namespace/intelfpgaplugin-system created +customresourcedefinition.apiextensions.k8s.io/acceleratorfunctions.fpga.intel.com created +customresourcedefinition.apiextensions.k8s.io/fpgaregions.fpga.intel.com created +mutatingwebhookconfiguration.admissionregistration.k8s.io/intelfpgaplugin-mutating-webhook-configuration created +clusterrole.rbac.authorization.k8s.io/intelfpgaplugin-manager-role created +clusterrole.rbac.authorization.k8s.io/intelfpgaplugin-node-getter created +clusterrolebinding.rbac.authorization.k8s.io/intelfpgaplugin-get-nodes created +clusterrolebinding.rbac.authorization.k8s.io/intelfpgaplugin-manager-rolebinding created +service/intelfpgaplugin-webhook-service created +deployment.apps/intelfpgaplugin-webhook created +daemonset.apps/intelfpgaplugin-fpgadeviceplugin created +certificate.cert-manager.io/intelfpgaplugin-serving-cert created +issuer.cert-manager.io/intelfpgaplugin-selfsigned-issuer created +``` + +Where `` needs to be substituted with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +The command should result in two pods running: +```bash +$ kubectl get pods -n intelfpgaplugin-system +NAME READY STATUS RESTARTS AGE +intelfpgaplugin-fpgadeviceplugin-skcw5 1/1 Running 0 57s +intelfpgaplugin-webhook-7d6bcb8b57-k52b9 1/1 Running 0 57s +``` + +If you need the FPGA plugin on some nodes to operate in a different mode then add this +annotation to the nodes: + +```bash +$ kubectl annotate node 'fpga.intel.com/device-plugin-mode=region' +``` +or +```bash +$ kubectl annotate node 'fpga.intel.com/device-plugin-mode=af' +``` +And restart the pods on the nodes. + +> **Note:** The FPGA plugin [DaemonSet YAML](/deployments/fpga_plugin/base/intel-fpga-plugin-daemonset.yaml) +> also deploys the [FPGA CRI-O hook](../fpga_crihook/README.md) `initcontainer` image, but it will be +> benign (un-used) when running the FPGA plugin in `af` mode. + +#### Verify Plugin Registration + +Verify the FPGA plugin has been deployed on the nodes. The below shows the output +you can expect in `region` mode, but similar output should be expected for `af` +mode: + +```bash +$ kubectl describe nodes | grep fpga.intel.com +fpga.intel.com/region-ce48969398f05f33946d560708be108a: 1 +fpga.intel.com/region-ce48969398f05f33946d560708be108a: 1 +``` + +> **Note:** The FPGA plugin [DaemonSet YAML](/deployments/fpga_plugin/fpga_plugin.yaml) +> also deploys the [FPGA CRI-O hook](../fpga_crihook/README.md) `initcontainer` image as well. You may +> also wish to build that image locally before deploying the FPGA plugin to avoid deploying +> the Docker hub default image. diff --git a/0.27/_sources/cmd/fpga_tool/README.md.txt b/0.27/_sources/cmd/fpga_tool/README.md.txt new file mode 100644 index 000000000..39752c890 --- /dev/null +++ b/0.27/_sources/cmd/fpga_tool/README.md.txt @@ -0,0 +1,29 @@ +# Intel FPGA test tool + +## Introduction + +This directory contains an FPGA test tool that can be used to locate, examine and program Intel +FPGAs. + +### Command line and usage + +The tool has the following command line arguments: + +```bash +info, fpgainfo, install, list, fmeinfo, portinfo, list-fme, list-port, pr, release, assign +``` + +and the following command line options: + +```bash +Usage of ./fpga_tool: + -b string + Path to bitstream file (GBS or AOCX) + -d string + Path to device node (FME or Port) + -dry-run + Don't write/program, just validate and log + -force + Force overwrite operation for installing bitstreams + -q Quiet mode. Only errors will be reported +``` \ No newline at end of file diff --git a/0.27/_sources/cmd/gpu_fakedev/README.md.txt b/0.27/_sources/cmd/gpu_fakedev/README.md.txt new file mode 100644 index 000000000..94938ea41 --- /dev/null +++ b/0.27/_sources/cmd/gpu_fakedev/README.md.txt @@ -0,0 +1,47 @@ +# Fake (GPU) device file generator + +Table of Contents +* [Introduction](#introduction) +* [Configuration](#configuration) +* [Potential improvements](#potential-improvements) +* [Related tools](#related-tools) + +## Introduction + +This is a tool for generating (large number of) fake device files for +k8s device scheduling scalability testing. But it can also be used +just to test (GPU) device plugin functionality without having +corresponding device HW. + +Its "intel-gpu-fakedev" container is intended to be run as first init +container in a device plugin pod, so that device plugin (and its NFD +labeler) see the fake (sysfs + devfs) files generated by the tool, +instead of real host sysfs and devfs content. + +## Configuration + +[Configs](configs/) subdirectory contains example JSON configuration +file(s) for the generator. Currently there's only one example JSON +file, but each new device variant adding feature(s) that have specific +support in device plugin, could have their own fake device config. + +## Potential improvements + +If support for mixed device environment is needed, tool can be updated +to use node / configuration file mapping. Such mappings could be e.g. +in configuration files themselves as node name include / exlude lists, +and tool would use first configuration file matching the node it's +running on. For now, one would need to use different pod / config +specs for different nodes to achieve that... + +Currently JSON config file options and the generated files are tied to +what GPU plugin uses, but if needed, they could be changed to fake +also sysfs + devfs device files used by other plugins. + +## Related tools + +[fakedev-exporter](#https://github.com/intel/fakedev-exporter) project +can be used to schedule suitably configured fake workloads on the fake +devices, and to provide provide fake activity metrics for them to +Prometheus, that look like they were reported by real Prometheus +metric exporters for real workloads running on real devices. diff --git a/0.27/_sources/cmd/gpu_nfdhook/README.md.txt b/0.27/_sources/cmd/gpu_nfdhook/README.md.txt new file mode 100644 index 000000000..2735f8470 --- /dev/null +++ b/0.27/_sources/cmd/gpu_nfdhook/README.md.txt @@ -0,0 +1,92 @@ +# Intel GPU NFD hook + +Table of Contents + +* [Introduction](#introduction) +* [GPU memory](#gpu-memory) +* [Default labels](#default-labels) +* [PCI-groups (optional)](#pci-groups-optional) +* [Capability labels (optional)](#capability-labels-optional) +* [Limitations](#limitations) + +## Introduction + +This is the [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) +binary hook implementation for the Intel GPUs. The intel-gpu-initcontainer (which +is built with the other images) can be used as part of the gpu-plugin deployment +to copy hook to the host systems on which gpu-plugin itself is deployed. + +When NFD worker runs this hook, it will add a number of labels to the nodes, +which can be used for example to deploy services to nodes with specific GPU +types. Selected numeric labels can be turned into kubernetes extended resources +by the NFD, allowing for finer grained resource management for GPU-using PODs. + +In the NFD deployment, the hook requires `/host-sys` -folder to have the host `/sys`-folder content mounted. Write access is not necessary. + +## GPU memory + +GPU memory amount is read from sysfs `gt/gt*` files and turned into a label. +There are two supported environment variables named `GPU_MEMORY_OVERRIDE` and +`GPU_MEMORY_RESERVED`. Both are supposed to hold numeric byte amounts. For systems with +older kernel drivers or GPUs which do not support reading the GPU memory +amount, the `GPU_MEMORY_OVERRIDE` environment variable value is turned into a GPU +memory amount label instead of a read value. `GPU_MEMORY_RESERVED` value will be +scoped out from the GPU memory amount found from sysfs. + +## Default labels + +Following labels are created by default. You may turn numeric labels into extended resources with NFD. + +name | type | description| +-----|------|------| +|`gpu.intel.com/millicores`| number | node GPU count * 1000. Can be used as a finer grained shared execution fraction. +|`gpu.intel.com/memory.max`| number | sum of detected [GPU memory amounts](#gpu-memory) in bytes OR environment variable value * GPU count +|`gpu.intel.com/cards`| string | list of card names separated by '`.`'. The names match host `card*`-folders under `/sys/class/drm/`. Deprecated, use `gpu-numbers`. +|`gpu.intel.com/gpu-numbers`| string | list of numbers separated by '`.`'. The numbers correspond to device file numbers for the primary nodes of given GPUs in kernel DRI subsystem, listed as `/dev/dri/card` in devfs, and `/sys/class/drm/card` in sysfs. +|`gpu.intel.com/tiles`| number | sum of all detected GPU tiles in the system. +|`gpu.intel.com/numa-gpu-map`| string | list of numa node to gpu mappings. + +If the value of the `gpu-numbers` label would not fit into the 63 character length limit, you will also get labels `gpu-numbers2`, +`gpu-numbers3`... until all the gpu numbers have been labeled. + +The tile count `gpu.intel.com/tiles` describes the total amount of tiles on the system. System is expected to be homogeneous, and thus the number of tiles per GPU can be calculated by dividing the tile count with GPU count. + +The `numa-gpu-map` label is a list of numa to gpu mapping items separated by `_`. Each list item has a numa node id combined with a list of gpu indices. e.g. 0-1.2.3 would mean: numa node 0 has gpus 1, 2 and 3. More complex example would be: 0-0.1_1-3.4 where numa node 0 would have gpus 0 and 1, and numa node 1 would have gpus 3 and 4. As with `gpu-numbers`, this label will be extended to multiple labels if the length of the value exceeds the max label length. + +## PCI-groups (optional) + +GPUs which share the same pci paths under `/sys/devices/pci*` can be grouped into a label. GPU nums are separated by '`.`' and +groups are separated by '`_`'. The label is created only if environment variable named `GPU_PCI_GROUPING_LEVEL` has a value greater +than zero. GPUs are considered to belong to the same group, if as many identical folder names are found for the GPUs, as is the value +of the environment variable. Counting starts from the folder name which starts with `pci`. + +For example, the SG1 card has 4 GPUs, which end up sharing pci-folder names under `/sys/devices`. With a `GPU_PCI_GROUPING_LEVEL` +of 3, a node with two such SG1 cards could produce a `pci-groups` label with a value of `0.1.2.3_4.5.6.7`. + +name | type | description| +-----|------|------| +|`gpu.intel.com/pci-groups`| string | list of pci-groups separated by '`_`'. GPU numbers in the groups are separated by '`.`'. The numbers correspond to device file numbers for the primary nodes of given GPUs in kernel DRI subsystem, listed as `/dev/dri/card` in devfs, and `/sys/class/drm/card` in sysfs. + +If the value of the `pci-groups` label would not fit into the 63 character length limit, you will also get labels `pci-groups2`, +`pci-groups3`... until all the pci groups have been labeled. + +## Capability labels (optional) + +Capability labels are created from information found inside debugfs, and therefore +unfortunately require running the NFD worker as root. Due to coming from debugfs, +which is not guaranteed to be stable, these are not guaranteed to be stable either. +If you do not need these, simply do not run NFD worker as root, that is also more secure. +Depending on your kernel driver, running the NFD hook as root may introduce following labels: + +name | type | description| +-----|------|------| +|`gpu.intel.com/platform_gen`| string | GPU platform generation name, typically an integer. Deprecated. +|`gpu.intel.com/media_version`| string | GPU platform Media pipeline generation name, typically a number. Deprecated. +|`gpu.intel.com/graphics_version`| string | GPU platform graphics/compute pipeline generation name, typically a number. Deprecated. +|`gpu.intel.com/platform_.count`| number | GPU count for the named platform. +|`gpu.intel.com/platform_.tiles`| number | GPU tile count in the GPUs of the named platform. +|`gpu.intel.com/platform_.present`| string | "true" for indicating the presense of the GPU platform. + +## Limitations + +For the above to work as intended, GPUs on the same node must be identical in their capabilities. diff --git a/0.27/_sources/cmd/gpu_plugin/README.md.txt b/0.27/_sources/cmd/gpu_plugin/README.md.txt new file mode 100644 index 000000000..484c09e2f --- /dev/null +++ b/0.27/_sources/cmd/gpu_plugin/README.md.txt @@ -0,0 +1,402 @@ +# Intel GPU device plugin for Kubernetes + +Table of Contents + +* [Introduction](#introduction) +* [Modes and Configuration Options](#modes-and-configuration-options) +* [Operation modes for different workload types](#operation-modes-for-different-workload-types) +* [Installation](#installation) + * [Prerequisites](#prerequisites) + * [Drivers for discrete GPUs](#drivers-for-discrete-gpus) + * [Kernel driver](#kernel-driver) + * [Intel DKMS packages](#intel-dkms-packages) + * [Upstream kernel](#upstream-kernel) + * [GPU Version](#gpu-version) + * [GPU Firmware](#gpu-firmware) + * [User-space drivers](#user-space-drivers) + * [Drivers for older (integrated) GPUs](#drivers-for-older-integrated-gpus) + * [Pre-built Images](#pre-built-images) + * [Install to all nodes](#install-to-all-nodes) + * [Install to nodes with Intel GPUs with NFD](#install-to-nodes-with-intel-gpus-with-nfd) + * [Install to nodes with NFD, Monitoring and Shared-dev](#install-to-nodes-with-nfd-monitoring-and-shared-dev) + * [Install to nodes with Intel GPUs with Fractional resources](#install-to-nodes-with-intel-gpus-with-fractional-resources) + * [Fractional resources details](#fractional-resources-details) + * [Verify Plugin Registration](#verify-plugin-registration) +* [Testing and Demos](#testing-and-demos) +* [Issues with media workloads on multi-GPU setups](#issues-with-media-workloads-on-multi-gpu-setups) + * [Workaround for QSV and VA-API](#workaround-for-qsv-and-va-api) + + +## Introduction + +Intel GPU plugin facilitates Kubernetes workload offloading by providing access to +discrete (including Intel® Data Center GPU Flex Series) and integrated Intel GPU devices +supported by the host kernel. + +Use cases include, but are not limited to: +- Media transcode +- Media analytics +- Cloud gaming +- High performance computing +- AI training and inference + +For example containers with Intel media driver (and components using that), can offload +video transcoding operations, and containers with the Intel OpenCL / oneAPI Level Zero +backend libraries can offload compute operations to GPU. + +## Modes and Configuration Options + +| Flag | Argument | Default | Meaning | +|:---- |:-------- |:------- |:------- | +| -enable-monitoring | - | disabled | Enable 'i915_monitoring' resource that provides access to all Intel GPU devices on the node | +| -resource-manager | - | disabled | Enable fractional resource management, [see also dependencies](#fractional-resources) | +| -shared-dev-num | int | 1 | Number of containers that can share the same GPU device | +| -allocation-policy | string | none | 3 possible values: balanced, packed, none. For shared-dev-num > 1: _balanced_ mode spreads workloads among GPU devices, _packed_ mode fills one GPU fully before moving to next, and _none_ selects first available device from kubelet. Default is _none_. Allocation policy does not have an effect when resource manager is enabled. | + +The plugin also accepts a number of other arguments (common to all plugins) related to logging. +Please use the -h option to see the complete list of logging related options. + +## Operation modes for different workload types + +Intel GPU-plugin supports a few different operation modes. Depending on the workloads the cluster is running, some modes make more sense than others. Below is a table that explains the differences between the modes and suggests workload types for each mode. Mode selection applies to the whole GPU plugin deployment, so it is a cluster wide decision. + +| Mode | Sharing | Intended workloads | Suitable for time critical workloads | +|:---- |:-------- |:------- |:------- | +| shared-dev-num == 1 | No, 1 container per GPU | Workloads using all GPU capacity, e.g. AI training | Yes | +| shared-dev-num > 1 | Yes, >1 containers per GPU | (Batch) workloads using only part of GPU resources, e.g. inference, media transcode/analytics, or CPU bound GPU workloads | No | +| shared-dev-num > 1 && resource-management | Yes and no, 1>= containers per GPU | Any. For best results, all workloads should declare their expected GPU resource usage (memory, millicores). Requires [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling). See also [fractional use](#fractional-resources-details) | Yes. 1000 millicores = exclusive GPU usage. See note below. | + +> **Note**: Exclusive GPU usage with >=1000 millicores requires that also *all other GPU containers* specify (non-zero) millicores resource usage. + +## Installation + +The following sections detail how to obtain, build, deploy and test the GPU device plugin. + +Examples are provided showing how to deploy the plugin either using a DaemonSet or by hand on a per-node basis. + +### Prerequisites + +Access to a GPU device requires firmware, kernel and user-space +drivers supporting it. Firmware and kernel driver need to be on the +host, user-space drivers in the GPU workload containers. + +Intel GPU devices supported by the current kernel can be listed with: +``` +$ grep i915 /sys/class/drm/card?/device/uevent +/sys/class/drm/card0/device/uevent:DRIVER=i915 +/sys/class/drm/card1/device/uevent:DRIVER=i915 +``` + +#### Drivers for discrete GPUs + +> **Note**: Kernel (on host) and user-space drivers (in containers) +> should be installed from the same repository as there are some +> differences between DKMS and upstream GPU driver uAPI. + +##### Kernel driver + +###### Intel DKMS packages + +`i915` GPU driver DKMS[^dkms] package is recommended for Intel +discrete GPUs, until their support in upstream is complete. DKMS +package(s) can be installed from Intel package repositories for a +subset of older kernel versions used in enterprise / LTS +distributions: +https://dgpu-docs.intel.com/installation-guides/index.html + +[^dkms]: [intel-gpu-i915-backports](https://github.com/intel-gpu/intel-gpu-i915-backports). + +###### Upstream kernel + +Upstream Linux kernel 6.2 or newer is needed for Intel discrete GPU +support. For now, upstream kernel is still missing support for a few +of the features available in DKMS kernels (e.g. Level-Zero Sysman API +GPU error counters). + +##### GPU Version + +PCI IDs for the Intel GPUs on given host can be listed with: +``` +$ lspci | grep -e VGA -e Display | grep Intel +88:00.0 Display controller: Intel Corporation Device 56c1 (rev 05) +8d:00.0 Display controller: Intel Corporation Device 56c1 (rev 05) +``` + +(`lspci` lists GPUs with display support as "VGA compatible controller", +and server GPUs without display support, as "Display controller".) + +Mesa "Iris" 3D driver header provides a mapping between GPU PCI IDs and their Intel brand names: +https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/include/pci_ids/iris_pci_ids.h + +###### GPU Firmware + +If your kernel build does not find the correct firmware version for +a given GPU from the host (see `dmesg | grep i915` output), latest +firmware versions are available in upstream: +https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915 + +##### User-space drivers + +Until new enough user-space drivers (supporting also discrete GPUs) +are available directly from distribution package repositories, they +can be installed to containers from Intel package repositories. See: +https://dgpu-docs.intel.com/installation-guides/index.html + +Example container is listed in [Testing and demos](#testing-and-demos). + +Validation status against *upstream* kernel is listed in the user-space drivers release notes: +* Media driver: https://github.com/intel/media-driver/releases +* Compute driver: https://github.com/intel/compute-runtime/releases + +#### Drivers for older (integrated) GPUs + +For the older (integrated) GPUs, new enough firmware and kernel driver +are typically included already with the host OS, and new enough +user-space drivers (for the GPU containers) are in the host OS +repositories. + +### Pre-built Images + +[Pre-built images](https://hub.docker.com/r/intel/intel-gpu-plugin) +of this component are available on the Docker hub. These images are automatically built and uploaded +to the hub from the latest main branch of this repository. + +Release tagged images of the components are also available on the Docker hub, tagged with their +release version numbers in the format `x.y.z`, corresponding to the branches and releases in this +repository. Thus the easiest way to deploy the plugin in your cluster is to run this command + +> **Note**: Replace `` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +> **Note**: Add ```--dry-run=client -o yaml``` to the ```kubectl``` commands below to visualize the yaml content being applied. + +See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. + +#### Install to all nodes + +Simplest option to enable use of Intel GPUs in Kubernetes Pods. + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin?ref=' +``` + +#### Install to nodes with Intel GPUs with NFD + +Deploying GPU plugin to only nodes that have Intel GPU attached. [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) is required to detect the presence of Intel GPUs. + +```bash +# Start NFD - if your cluster doesn't have NFD installed yet +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd?ref=' + +# Create NodeFeatureRules for detecting GPUs on nodes +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd/overlays/node-feature-rules?ref=' + +# Create GPU plugin daemonset +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin/overlays/nfd_labeled_nodes?ref=' +``` + +#### Install to nodes with NFD, Monitoring and Shared-dev + +Same as above, but configures GPU plugin with logging, [monitoring and shared-dev](#modes-and-configuration-options) features enabled. This option is useful when there is a desire to retrieve GPU metrics from nodes. For example with [XPU-Manager](https://github.com/intel/xpumanager/) or [collectd](https://github.com/collectd/collectd/tree/collectd-6.0). + +```bash +# Start NFD - if your cluster doesn't have NFD installed yet +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd?ref=' + +# Create NodeFeatureRules for detecting GPUs on nodes +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd/overlays/node-feature-rules?ref=' + +# Create GPU plugin daemonset +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/?ref=' +``` + +#### Install to nodes with Intel GPUs with Fractional resources + +With the experimental fractional resource feature you can use additional kubernetes extended +resources, such as GPU memory, which can then be consumed by deployments. PODs will then only +deploy to nodes where there are sufficient amounts of the extended resources for the containers. + +(For this to work properly, all GPUs in a given node should provide equal amount of resources +i.e. heteregenous GPU nodes are not supported.) + +Enabling the fractional resource feature isn't quite as simple as just enabling the related +command line flag. The DaemonSet needs additional RBAC-permissions +and access to the kubelet podresources gRPC service, plus there are other dependencies to +take care of, which are explained below. For the RBAC-permissions, gRPC service access and +the flag enabling, it is recommended to use kustomization by running: + +```bash +# Start NFD with GPU related configuration changes +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd/overlays/gpu?ref=' + +# Create NodeFeatureRules for detecting GPUs on nodes +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd/overlays/node-feature-rules?ref=' + +# Create GPU plugin daemonset +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/gpu_plugin/overlays/fractional_resources?ref=' +``` + +##### Fractional resources details + +Usage of these fractional GPU resources requires that the cluster has node +extended resources with the name prefix `gpu.intel.com/`. Those can be created with NFD +by running the [hook](/cmd/gpu_nfdhook/) installed by the plugin initcontainer. When fractional resources are +enabled, the plugin lets a [scheduler extender](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling) +do card selection decisions based on resource availability and the amount of extended +resources requested in the [pod spec](https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#pods). + +The scheduler extender then needs to annotate the pod objects with unique +increasing numeric timestamps in the annotation `gas-ts` and container card selections in +`gas-container-cards` annotation. The latter has container separator '`|`' and card separator +'`,`'. Example for a pod with two containers and both containers getting two cards: +`gas-container-cards:card0,card1|card2,card3`. Enabling the fractional-resource support +in the plugin without running such an annotation adding scheduler extender in the cluster +will only slow down GPU-deployments, so do not enable this feature unnecessarily. + +In multi-tile systems, containers can request individual tiles to improve GPU resource usage. +Tiles targeted for containers are specified to pod via `gas-container-tiles` annotation where the the annotation +value describes a set of card and tile combinations. For example in a two container pod, the annotation +could be `gas-container-tiles:card0:gt0+gt1|card1:gt1,card2:gt0`. Similarly to `gas-container-cards`, the container +details are split via `|`. In the example above, the first container gets tiles 0 and 1 from card 0, +and the second container gets tile 1 from card 1 and tile 0 from card 2. + +> **Note**: It is also possible to run the GPU device plugin using a non-root user. To do this, +the nodes' DAC rules must be configured to device plugin socket creation and kubelet registration. +Furthermore, the deployments `securityContext` must be configured with appropriate `runAsUser/runAsGroup`. + +### Verify Plugin Registration + +You can verify the plugin has been registered with the expected nodes by searching for the relevant +resource allocation status on the nodes: + +```bash +$ kubectl get nodes -o=jsonpath="{range .items[*]}{.metadata.name}{'\n'}{' i915: '}{.status.allocatable.gpu\.intel\.com/i915}{'\n'}" +master + i915: 1 +``` + +## Testing and Demos + +The GPU plugin functionality can be verified by deploying an [OpenCL image](../../demo/intel-opencl-icd/) which runs `clinfo` outputting the GPU capabilities (detected by driver installed to the image). + +1. Make the image available to the cluster: + + Build image: + + ```bash + $ make intel-opencl-icd + ``` + + Tag and push the `intel-opencl-icd` image to a repository available in the cluster. Then modify the `intelgpu-job.yaml`'s image location accordingly: + + ```bash + $ docker tag intel/intel-opencl-icd:devel /intel/intel-opencl-icd:latest + $ docker push /intel/intel-opencl-icd:latest + $ $EDITOR ${INTEL_DEVICE_PLUGINS_SRC}/demo/intelgpu-job.yaml + ``` + + If you are running the demo on a single node cluster, and do not have your own registry, you can add image to node image cache instead. For example, to import docker image to containerd cache: + + ```bash + $ IMAGE_NAME=opencl-icd.tar + $ docker save -o $IMAGE_NAME intel/intel-opencl-icd:devel + $ ctr -n=k8s.io images import $IMAGE_NAME + $ rm $IMAGE_NAME + ``` + +1. Create a job: + + ```bash + $ kubectl apply -f ${INTEL_DEVICE_PLUGINS_SRC}/demo/intelgpu-job.yaml + job.batch/intelgpu-demo-job created + ``` + +1. Review the job's logs: + + ```bash + $ kubectl get pods | fgrep intelgpu + # substitute the 'xxxxx' below for the pod name listed in the above + $ kubectl logs intelgpu-demo-job-xxxxx + + ``` + + If the pod did not successfully launch, possibly because it could not obtain + the requested GPU resource, it will be stuck in the `Pending` status: + + ```bash + $ kubectl get pods + NAME READY STATUS RESTARTS AGE + intelgpu-demo-job-xxxxx 0/1 Pending 0 8s + ``` + + This can be verified by checking the Events of the pod: + + ```bash + $ kubectl describe pod intelgpu-demo-job-xxxxx + ... + Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 Insufficient gpu.intel.com/i915. + ``` + + +## Issues with media workloads on multi-GPU setups + +OneVPL media API, 3D and compute APIs provide device discovery +functionality for applications and work fine in multi-GPU setups. +VA-API and legacy QSV (MediaSDK) media APIs do not, and do not +provide (e.g. environment variable) override for their _default_ +device file. + +As result, media applications using VA-API or QSV, fail to locate the +correct GPU device file unless it is the first ("renderD128") one, or +device file name is explictly specified with an application option. + +Kubernetes device plugins expose only requested number of device +files, and their naming matches host device file names (for several +reasons unrelated to media). Therefore, on multi-GPU hosts, the only +GPU device file mapped to the media container can differ from +"renderD128", and media applications using VA-API or QSV need to be +explicitly told which one to use. + +These options differ from application to application. Relevant FFmpeg +options are documented here: +* VA-API: https://trac.ffmpeg.org/wiki/Hardware/VAAPI +* QSV: https://github.com/Intel-Media-SDK/MediaSDK/wiki/FFmpeg-QSV-Multi-GPU-Selection-on-Linux + + +### Workaround for QSV and VA-API + +[Render device](render-device.sh) shell script locates and outputs the +correct device file name. It can be added to the container and used +to give device file name for the application. + +Use it either from another script invoking the application, or +directly from the Pod YAML command line. In latter case, it can be +used either to add the device file name to the end of given command +line, like this: + +```bash +command: ["render-device.sh", "vainfo", "--display", "drm", "--device"] + +=> /usr/bin/vainfo --display drm --device /dev/dri/renderDXXX +``` + +Or inline, like this: + +```bash +command: ["/bin/sh", "-c", + "vainfo --device $(render-device.sh 1) --display drm" + ] +``` + +If device file name is needed for multiple commands, one can use shell variable: + +```bash +command: ["/bin/sh", "-c", + "dev=$(render-device.sh 1) && vainfo --device $dev && " + ] +``` + +With argument N, script outputs name of the Nth suitable GPU device +file, which can be used when more than one GPU resource was requested. diff --git a/0.27/_sources/cmd/iaa_plugin/README.md.txt b/0.27/_sources/cmd/iaa_plugin/README.md.txt new file mode 100644 index 000000000..d2f067a1f --- /dev/null +++ b/0.27/_sources/cmd/iaa_plugin/README.md.txt @@ -0,0 +1,120 @@ +# Intel IAA device plugin for Kubernetes + +Table of Contents + +* [Introduction](#introduction) +* [Installation](#installation) + * [Pre-built images](#pre-built-images) + * [Verify plugin registration](#verify-plugin-registration) +* [Testing and Demos](#testing-and-demos) + +## Introduction + +The IAA device plugin for Kubernetes supports acceleration using the Intel Analytics accelerator(IAA). + +The IAA plugin discovers IAA work queues and presents them as a node resources. + +The IAA plugin and operator optionally support provisioning of IAA devices and workqueues with the help of [accel-config](https://github.com/intel/idxd-config) utility through initcontainer. + +## Installation + +The following sections detail how to use the IAA device plugin. + +### Pre-built Images + +[Pre-built images](https://hub.docker.com/r/intel/intel-iaa-plugin) +of this component are available on the Docker hub. These images are automatically built and uploaded +to the hub from the latest main branch of this repository. + +Release tagged images of the components are also available on the Docker hub, tagged with their +release version numbers in the format `x.y.z`, corresponding to the branches and releases in this +repository. Thus the easiest way to deploy the plugin in your cluster is to run this command + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/iaa_plugin?ref=' +daemonset.apps/intel-iaa-plugin created +``` + +Where `` needs to be substituted with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +Nothing else is needed. See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. + +#### Automatic Provisioning + +There's a sample [idxd initcontainer](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/build/docker/intel-idxd-config-initcontainer.Dockerfile) included that provisions IAA devices and workqueues (1 engine / 1 group / 1 wq (user/dedicated)), to deploy: + +```bash +$ kubectl apply -k deployments/iaa_plugin/overlays/iaa_initcontainer/ +``` + +The provisioning [script](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/idxd-init.sh) and [template](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/demo/iaa.conf) are available for customization. + +The provisioning config can be optionally stored in the ProvisioningConfig configMap which is then passed to initcontainer through the volume mount. + +There's also a possibility for a node specific congfiguration through passing a nodename via NODE_NAME into initcontainer's environment and passing a node specific profile via configMap volume mount. + +To create a custom provisioning config: + +```bash +$ kubectl create configmap --namespace=inteldeviceplugins-system intel-iaa-config --from-file=demo/iaa.conf +``` + +### Verify Plugin Registration + +You can verify the plugin has been registered with the expected nodes by searching for the relevant +resource allocation status on the nodes: + +```bash +$ kubectl get nodes -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{range $k,$v:=.status.allocatable}}{{" "}}{{$k}}{{": "}}{{$v}}{{"\n"}}{{end}}{{end}}' | grep '^\([^ ]\)\|\( iaa\)' +master + iaa.intel.com/wq-user-dedicated: 2 + iaa.intel.com/wq-user-shared: 10 +node1 + iaa.intel.com/wq-user-dedicated: 4 + iaa.intel.com/wq-user-shared: 30 +``` + +## Testing and Demos + +We can test the plugin is working by deploying the provided example iaa-qpl-demo test image. + +1. Build a Docker image with an accel-config tests: + + ```bash + $ make iaa-qpl-demo + ... + Successfully tagged intel/iaa-qpl-demo:devel + ``` + +1. Create a pod running unit tests off the local Docker image: + + ```bash + $ kubectl apply -f ./demo/iaa-qpl-demo-pod.yaml + pod/iaa-qpl-demo created + ``` + +1. Wait until pod is completed: + + ```bash + $ kubectl get pods |grep iaa-qpl-demo + iaa-qpl-demo 0/1 Completed 0 31m + + If the pod did not successfully launch, possibly because it could not obtain the IAA + resource, it will be stuck in the `Pending` status: + + ```bash + $ kubectl get pods + NAME READY STATUS RESTARTS AGE + iaa-qpl-demo 0/1 Pending 0 7s + ``` + + This can be verified by checking the Events of the pod: + + ```bash + + $ kubectl describe pod iaa-qpl-demo | grep -A3 Events: + Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedScheduling 2m26s default-scheduler 0/1 nodes are available: 1 Insufficient iaa.intel.com/wq-user-dedicated, 1 Insufficient iaa.intel.com/wq-user-shared. + ``` diff --git a/0.27/_sources/cmd/operator/README.md.txt b/0.27/_sources/cmd/operator/README.md.txt new file mode 100644 index 000000000..22da75113 --- /dev/null +++ b/0.27/_sources/cmd/operator/README.md.txt @@ -0,0 +1,189 @@ +# Intel Device Plugins Operator + +Table of Contents + +* [Introduction](#introduction) +* [Installation](#installation) +* [Upgrade](#upgrade) +* [Limiting Supported Devices](#limiting-supported-devices) +* [Known issues](#known-issues) + +## Introduction + +Intel Device Plugins Operator is a Kubernetes custom controller whose goal is to serve the +installation and lifecycle management of Intel device plugins for Kubernetes. +It provides a single point of control for GPU, QAT, SGX, FPGA, DSA and DLB devices to a cluster +administrators. + +## Installation + +The default operator deployment depends on NFD and cert-manager. Those components have to be installed to the cluster before the operator can be deployed. + +> **Note**: Operator can also be installed via Helm charts. See [INSTALL.md](../../INSTALL.md) for details. + +### NFD + +Install NFD (if it's not already installed) and node labelling rules (requires NFD v0.13+): + +``` +# deploy NFD +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd?ref=' +# deploy NodeFeatureRules +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/nfd/overlays/node-feature-rules?ref=' +``` +Make sure both NFD master and worker pods are running: + +``` +$ kubectl get pods -n node-feature-discovery +NAME READY STATUS RESTARTS AGE +nfd-master-599c58dffc-9wql4 1/1 Running 0 25h +nfd-worker-qqq4h 1/1 Running 0 25h +``` + +Note that labelling is not performed immediately. Give NFD 1 minute to pick up the rules and label nodes. + +As a result all found devices should have correspondent labels, e.g. for Intel DLB devices the label is +`intel.feature.node.kubernetes.io/dlb`: +``` +$ kubectl get no -o json | jq .items[].metadata.labels |grep intel.feature.node.kubernetes.io/dlb + "intel.feature.node.kubernetes.io/dlb": "true", +``` + +Full list of labels can be found in the deployments/operator/samples directory: +``` +$ grep -r feature.node.kubernetes.io/ deployments/operator/samples/ +deployments/operator/samples/deviceplugin_v1_dlbdeviceplugin.yaml: intel.feature.node.kubernetes.io/dlb: 'true' +deployments/operator/samples/deviceplugin_v1_qatdeviceplugin.yaml: intel.feature.node.kubernetes.io/qat: 'true' +deployments/operator/samples/deviceplugin_v1_sgxdeviceplugin.yaml: intel.feature.node.kubernetes.io/sgx: 'true' +deployments/operator/samples/deviceplugin_v1_gpudeviceplugin.yaml: intel.feature.node.kubernetes.io/gpu: "true" +deployments/operator/samples/deviceplugin_v1_fpgadeviceplugin.yaml: intel.feature.node.kubernetes.io/fpga-arria10: 'true' +deployments/operator/samples/deviceplugin_v1_dsadeviceplugin.yaml: intel.feature.node.kubernetes.io/dsa: 'true' +``` + +### Cert-Manager + +The default operator deployment depends on [cert-manager](https://cert-manager.io/) running in the cluster. +See installation instructions [here](https://cert-manager.io/docs/installation/kubectl/). + +Make sure all the pods in the `cert-manager` namespace are up and running: + +``` +$ kubectl get pods -n cert-manager +NAME READY STATUS RESTARTS AGE +cert-manager-7747db9d88-bd2nl 1/1 Running 0 21d +cert-manager-cainjector-87c85c6ff-59sb5 1/1 Running 0 21d +cert-manager-webhook-64dc9fff44-29cfc 1/1 Running 0 21d +``` + +### Device Plugin Operator + +Finally deploy the operator itself: + +``` +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/operator/default?ref=' +``` + +Now you can deploy the device plugins by creating corresponding custom resources. +The samples for them are available [here](/deployments/operator/samples/). + +### Device Plugin Custom Resource + +Deploy your device plugin by applying its custom resource, e.g. +`GpuDevicePlugin` with + +```bash +$ kubectl apply -f https://raw.githubusercontent.com/intel/intel-device-plugins-for-kubernetes/main/deployments/operator/samples/deviceplugin_v1_gpudeviceplugin.yaml +``` + +Observe it is up and running: + +```bash +$ kubectl get GpuDevicePlugin +NAME DESIRED READY NODE SELECTOR AGE +gpudeviceplugin-sample 1 1 5s +``` + +## Upgrade + +The upgrade of the deployed plugins can be done by simply installing a new release of the operator. + +The operator auto-upgrades operator-managed plugins (CR images and thus corresponding deployed daemonsets) to the current release of the operator. + +During upgrade the tag in the image path is updated (e.g. docker.io/intel/intel-sgx-plugin:tag), but the rest of the path is left intact. + +No upgrade is done for: +- Non-operator managed deployments +- Operator deployments without numeric tags + +## Limiting Supported Devices + +In order to limit the deployment to a specific device type, +use one of kustomizations under `deployments/operator/device`. + +For example, to limit the deployment to FPGA, use: + +```bash +$ kubectl apply -k deployments/operator/device/fpga +``` + +Operator also supports deployments with multiple selected device types. +In this case, create a new kustomization with the necessary resources +that passes the desired device types to the operator using `--device` +command line argument multiple times. + +## Known issues + +### Cluster behind a proxy + +If your cluster operates behind a corporate proxy make sure that the API +server is configured not to send requests to cluster services through the +proxy. You can check that with the following command: + +```bash +$ kubectl describe pod kube-apiserver --namespace kube-system | grep -i no_proxy | grep "\.svc" +``` + +In case there's no output and your cluster was deployed with `kubeadm` open +`/etc/kubernetes/manifests/kube-apiserver.yaml` at the control plane nodes and +append `.svc` and `.svc.cluster.local` to the `no_proxy` environment variable: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + ... +spec: + containers: + - command: + - kube-apiserver + - --advertise-address=10.237.71.99 + ... + env: + - name: http_proxy + value: http://proxy.host:8080 + - name: https_proxy + value: http://proxy.host:8433 + - name: no_proxy + value: 127.0.0.1,localhost,.example.com,10.0.0.0/8,.svc,.svc.cluster.local + ... +``` + +**Note:** To build clusters using `kubeadm` with the right `no_proxy` settings from the very beginning, +set the cluster service names to `$no_proxy` before `kubeadm init`: + +``` +$ export no_proxy=$no_proxy,.svc,.svc.cluster.local +``` + +### Leader election enabled + +When the operator is run with leader election enabled, that is with the option +`--leader-elect`, make sure the cluster is not overloaded with excessive +number of pods. Otherwise a heart beat used by the leader election code may trigger +a timeout and crash. We are going to use different clients for the controller and +leader election code to alleviate the issue. See more details in +https://github.com/intel/intel-device-plugins-for-kubernetes/issues/476. + +In case the deployment is limited to specific device type(s), +the CRDs for other device types are still created, but no controllers +for them are registered. diff --git a/0.27/_sources/cmd/operator/ocp_quickstart_guide/README.md.txt b/0.27/_sources/cmd/operator/ocp_quickstart_guide/README.md.txt new file mode 100644 index 000000000..6c6bf0ebd --- /dev/null +++ b/0.27/_sources/cmd/operator/ocp_quickstart_guide/README.md.txt @@ -0,0 +1,66 @@ +# Intel® Device Plugins Operator for Red Hat OpenShift Container Platform + +## Table of Contents +* [Introduction](#introduction) +* [Minimum Hardware Requirements](#minimum-hardware-requirements) + * [Intel SGX Enabled Server](#intel-sgx-enabled-server) +* [Installation](#installation) + * [Prerequisites](#prerequisites) + * [Install Operator using OpenShift Web Console](#install-operator-using-openshift-web-console) + * [Verify Operator installation](#verify-operator-installation) +* [Deploying Intel Device Plugins](#deploying-intel-device-plugins) + * [Intel SGX Device Plugin](#intel-sgx-device-plugin) + +## Introduction +The Intel Device Plugins Operator for OpenShift Container Platform is a collection of device plugins advertising Intel specific hardware resources to the kubelet. It provides a single point of control for Intel® Software Guard Extensions (Intel® SGX), Intel GPUs, Intel® QuickAccess Technology (Intel® QAT), Intel® Data Streaming Accelerator (Intel® DSA), and Intel® In-Memory Analytics Accelerator (Intel® IAA) devices to cluster administrators. The [`v0.24.0`](https://github.com/intel/intel-device-plugins-for-kubernetes/releases/tag/v0.24.0) release of the operator only supports Intel SGX and Intel QAT device plugins. GPU, Intel DSA, Intel IAA, and other device plugins will be supported in future releases. + +## Minimum Hardware Requirements +### Intel SGX Enabled Server +- Third Generation Intel® Xeon® Scalable Platform, code-named “Ice Lake” or later +- Configure BIOS using below details + ![SGX Server BIOS](images/SGX-BIOS.PNG) + [**Note:** The BIOS configuration shown above is just for the reference. Please contact your BIOS vendor for details] + +## Installation +### Prerequisites +- Make sure Red Hat OpenShift Cluster is ready to use and the developing machine is RHEL and `oc` command is installed and configured properly. Please note that the following operation is verified on Red Hat OpenShift Cluster 4.11 and working machine RHEL-8.6 +- Install the `oc` command to your development machine +- Follow the [link](https://docs.openshift.com/container-platform/4.11/hardware_enablement/psap-node-feature-discovery-operator.html) to install **NFD operator** (if it's not already installed). + **Note:** Please only install the NFD operator and use steps below to create the NodeFeatureDiscovery instance. + - Create the NodeFeatureDiscovery instance + ``` + $ oc apply -f https://raw.githubusercontent.com/intel/intel-device-plugins-for-kubernetes/v0.24.0/deployments/nfd/overlays/node-feature-discovery/node-feature-discovery-openshift.yaml + ``` + - Create the NodeFeatureRule instance + ``` + $ oc apply -f https://raw.githubusercontent.com/intel/intel-device-plugins-for-kubernetes/v0.24.0/deployments/nfd/overlays/node-feature-rules/node-feature-rules-openshift.yaml + ``` +- Deploy SELinux Policy for OCP 4.10 - + The SGX device plugin and Init container run as a label `container_device_plugin_t` and `container_device_plugin_init_t` respectively. This requires a custom SELinux policy to be deployed before the SGX plugin can be run. To deploy this policy, run + ``` + $ oc apply -f https://raw.githubusercontent.com/intel/user-container-selinux/main/policy-deployment.yaml + ``` + +### Install Operator using OpenShift Web Console +1. In OpenShift web console navigate to **Operator** -> **OperatorHub** +2. Search for **Intel Device Plugins Operator ->** Click **Install** + + +### Verify Operator installation +1. Go to **Operator** -> **Installed Operators** +2. Verify the status of operator as **Succeeded** +3. Click **Intel Device Plugins Operator** to view the details + ![Verify Operator](images/verify-operator.PNG) + + +## Deploying Intel Device Plugins + +### Intel SGX Device Plugin +Follow the steps below to deploy Intel SGX Device Plugin Custom Resource +1. Go to **Operator** -> **Installed Operators** +2. Open **Intel Device Plugins Operator** +3. Navigate to tab **Intel Software Guard Extensions Device Plugin** +4. Click **Create SgxDevicePlugin ->** set correct parameters -> Click **Create** + OR for any customizations, please select `YAML view` and edit details. Once done, click **Create** +5. Verify CR by checking the status of DaemonSet **`intel-sgx-plugin`** +6. Now `SgxDevicePlugin` is ready to deploy any workloads diff --git a/0.27/_sources/cmd/qat_plugin/README.md.txt b/0.27/_sources/cmd/qat_plugin/README.md.txt new file mode 100644 index 000000000..15875480e --- /dev/null +++ b/0.27/_sources/cmd/qat_plugin/README.md.txt @@ -0,0 +1,256 @@ +# Intel QuickAssist Technology (QAT) device plugin for Kubernetes + +Table of Contents + +* [Introduction](#introduction) +* [Modes and Configuration Options](#modes-and-configuration-options) +* [Installation](#installation) + * [Prerequisites](#prerequisites) + * [Pre-built Images](#pre-built-images) + * [Verify Plugin Registration](#verify-plugin-registration) +* [Demos and Testing](#demos-and-testing) + * [DPDK QAT Demos](#dpdk-qat-demos) + * [DPDK Prerequisites](#dpdk-prerequisites) + * [Deploy the pod](#deploy-the-pod) + * [Manual test run](#manual-test-run) + * [Automated test run](#automated-test-run) + * [OpenSSL QAT Demo](#openssl-qat-demo) +* [Checking for Hardware](#checking-for-hardware) + +## Introduction + +This Intel QAT device plugin provides support for Intel QAT devices under Kubernetes. +The supported devices are determined by the VF device drivers available in your Linux +Kernel. See the [Prerequisites](#prerequisites) section for more details. + +Supported Devices include, but may not be limited to, the following: + +- [Intel® Xeon® with Intel® C62X Series Chipset][1] +- Intel® Xeon® with Intel® QAT Gen4 devices +- [Intel® Atom™ Processor C3000][2] +- [Intel® Communications Chipset 8925 to 8955 Series][3] + +The QAT device plugin provides access to QAT hardware accelerated cryptographic and compression features. +Demonstrations are provided utilising [DPDK](https://doc.dpdk.org/) and [OpenSSL](https://www.openssl.org/). + +[Kata Containers](https://katacontainers.io/) QAT integration is documented in the +[Kata Containers documentation repository][6]. + +## Modes and Configuration Options + +The QAT plugin can take a number of command line arguments, summarised in the following table: + +| Flag | Argument | Meaning | +|:---- |:-------- |:------- | +| -dpdk-driver | string | DPDK Device driver for configuring the QAT device (default: `vfio-pci`) | +| -kernel-vf-drivers | string | Comma separated VF Device Driver of the QuickAssist Devices in the system. Devices supported: DH895xCC, C62x, C3xxx, 4xxx/401xx/402xx, C4xxx and D15xx (default: `c6xxvf,4xxxvf`) | +| -max-num-devices | int | maximum number of QAT devices to be provided to the QuickAssist device plugin (default: `64`) | +| -mode | string | plugin mode which can be either `dpdk` or `kernel` (default: `dpdk`) | +| -allocation-policy | string | 2 possible values: balanced and packed. Balanced mode spreads allocated QAT VF resources balanced among QAT PF devices, and packed mode packs one QAT PF device full of QAT VF resources before allocating resources from the next QAT PF. (There is no default.) | + +The plugin also accepts a number of other arguments related to logging. Please use the `-h` option to see +the complete list of logging related options. + +For more details on the `-dpdk-driver` choice, see +[DPDK Linux Driver Guide](http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html). + +> **Note:**: With Linux 5.9+ kernels the `vfio-pci` module must be loaded with +> `disable_denylist=1` parameter for the QAT device plugin to work correctly with +> devices prior to Gen4 (`4xxx`). + +For more details on the available options to the `-kernel-vf-drivers` option, see the list of +vf drivers available in the [Linux Kernel](https://github.com/torvalds/linux/tree/master/drivers/crypto/qat). + +If the `-mode` parameter is set to `kernel`, no other parameter documented above are valid, +except the `klog` logging related parameters. +`kernel` mode implements resource allocation based on system configured [logical instances][7]. + +> **Note**: `kernel` mode is excluded by default from all builds (including those hosted on the Docker hub), +> by default. See the [Build the plugin image](#build-the-plugin-image) section for more details. + +The `kernel` mode does not guarantee full device isolation between containers +and therefore it's not recommended. This mode will be deprecated and removed once `libqat` +implements non-UIO based device access. + +## Installation + +The below sections cover how to obtain, build and install this component. + +The component can be installed either using a DaemonSet or running 'by hand' on each node. + +### Prerequisites + +The component has the same basic dependancies as the +[generic plugin framework dependencies](../../README.md#about). + +You will also need [appropriate hardware installed](#checking-for-hardware). + +The QAT plugin requires Linux Kernel VF QAT drivers to be available. These drivers +are available via two methods. One of them must be installed and enabled: + +- [Linux Kernel upstream drivers](https://github.com/torvalds/linux/tree/master/drivers/crypto/qat) +- [Intel QuickAssist Technology software for Linux][9] + +The demonstrations have their own requirements, listed in their own specific sections. + +### Pre-built Images + +[Pre-built images](https://hub.docker.com/r/intel/intel-qat-plugin) +of this component are available on the Docker hub. These images are automatically built and uploaded +to the hub from the latest main branch of this repository. + +Release tagged images of the components are also available on the Docker hub, tagged with their +release version numbers in the format `x.y.z`, corresponding to the branches and releases in this +repository. Thus the easiest way to deploy the plugin in your cluster is to run this command + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/qat_plugin?ref=' +``` + +Where `` needs to be substituted with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +An alternative kustomization for deploying the plugin is with the debug mode switched on: + +```bash +$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/qat_plugin/overlays/debug?ref=' +``` + +> **Note**: It is also possible to run the QAT device plugin using a non-root user. To do this, +> the nodes' DAC rules must be configured to allow PCI driver unbinding/binding, device plugin +> socket creation and kubelet registration. Furthermore, the deployments `securityContext` must +> be configured with appropriate `runAsUser/runAsGroup`. + +#### Automatic Provisioning + +There's a sample [qat initcontainer](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/build/docker/intel-qat-initcontainer.Dockerfile). Regardless of device types, the script running inside the initcontainer enables QAT SR-IOV VFs. + +To deploy, run as follows: + +```bash +$ kubectl apply -k deployments/qat_plugin/overlays/qat_initcontainer/ +``` + +In addition to the default configuration, you can add device-specific configurations via ConfigMap. + +| Device | Possible Configuration | How To Customize | Options | Notes | +|:-------|:-----------------------|:-----------------|:--------|:------| +| 4xxx, 401xx,402xx | [cfg_services](https://github.com/torvalds/linux/blob/42e66b1cc3a070671001f8a1e933a80818a192bf/Documentation/ABI/testing/sysfs-driver-qat) reports the configured services (crypto services or compression services) of the QAT device. | `ServicesEnabled=` | compress:`dc`, crypto:`sym;asym` | Linux 6.0+ kernel is required. | + +To create a provisioning `configMap`, run the following command before deploying initcontainer: + +```bash +$ kubectl create configmap --namespace=inteldeviceplugins-system qat-config --from-file=/path/to/qat.conf +``` +or +```bash +$ kubectl create configmap --namespace=inteldeviceplugins-system --from-literal "qat.conf=ServicesEnabled=