Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu levelzero sidecar #1803

Merged
merged 6 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
/cmd/gpu_fakedev/ @tkatila @uniemimu @eero-t
/cmd/gpu_plugin/ @tkatila @bart0sh @uniemimu
/cmd/gpu_nfdhook/ @tkatila @bart0sh @uniemimu
/cmd/gpu_levelzero/ @tkatila @eero-t @uniemimu
/cmd/qat_plugin/ @hj-johannes-lee @mythi
/cmd/sgx_plugin/ @hj-johannes-lee @mythi
/cmd/dsa_plugin/ @hj-johannes-lee @ozhuraki @mythi
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/lib-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
- intel-gpu-fakedev
- intel-gpu-initcontainer
- intel-gpu-plugin
- intel-gpu-levelzero
- intel-fpga-plugin
- intel-qat-initcontainer
- intel-qat-plugin
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/lib-e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ jobs:
- intel-iaa-plugin
- crypto-perf
- intel-gpu-plugin
- intel-gpu-levelzero
- intel-sgx-plugin
- intel-sgx-initcontainer
- intel-sgx-admissionwebhook
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/lib-publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
required: false
type: string
env:
no_base_check: "['intel-qat-plugin-kerneldrv', 'intel-idxd-config-initcontainer', 'crypto-perf', 'opae-nlb-demo']"
no_base_check: "['intel-qat-plugin-kerneldrv', 'intel-idxd-config-initcontainer', 'crypto-perf', 'opae-nlb-demo', 'intel-gpu-levelzero']"

permissions:
contents: read
Expand Down Expand Up @@ -48,6 +48,7 @@ jobs:
- intel-fpga-initcontainer
- intel-gpu-initcontainer
- intel-gpu-plugin
#- intel-gpu-levelzero
- intel-fpga-plugin
- intel-qat-initcontainer
- intel-qat-plugin
Expand Down
10 changes: 10 additions & 0 deletions .github/workflows/lib-validate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ jobs:
with:
go-version-file: go.mod
check-latest: true
- name: install levelzero dev
run: |
sudo apt-get update
sudo apt-get install -y libze1 libze-dev
- name: golangci-lint
uses: golangci/golangci-lint-action@aaa42aa0628b4ae2578232a66b541047968fac86 # v6
with:
Expand All @@ -53,11 +57,17 @@ jobs:
with:
go-version-file: go.mod
check-latest: true
- name: install levelzero dev
run: |
sudo apt-get update
sudo apt-get install -y libze1 libze-dev
- name: Check Dockerfiles
run: make check-dockerfiles
- run: make go-mod-tidy
- run: make BUILDTAGS=kerneldrv
- run: make test BUILDTAGS=kerneldrv
env:
UNITTEST: 1
- run: make check-github-actions
#- name: Codecov report
# run: bash <(curl -s https://codecov.io/bash)
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ cmd/operator/operator

deployments/fpga_admissionwebhook/base/intel-fpga-webhook-certs-secret

*.h
*.gbs.*
*.aocx
*.aocx.*
Expand Down
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Table of Contents
* [IAA device plugin](#iaa-device-plugin)
* [Device Plugins Operator](#device-plugins-operator)
* [XeLink XPU Manager sidecar](#xelink-xpu-manager-sidecar)
* [Intel GPU Level-Zero sidecar](#intel-gpu-levelzero)
* [Demos](#demos)
* [Workload Authors](#workload-authors)
* [Developers](#developers)
Expand Down Expand Up @@ -201,6 +202,12 @@ To support interconnected GPUs in Kubernetes, XeLink sidecar is needed.

The [XeLink XPU Manager sidecar README](cmd/xpumanager_sidecar/README.md) gives information how the sidecar functions and how to use it.

## Intel GPU Level-Zero sidecar

Sidecar uses Level-Zero API to provide additional GPU information for the GPU plugin that it cannot get through sysfs interfaces.

See [Intel GPU Level-Zero sidecar README](cmd/gpu_levelzero/README.md) for more details.

## Demos

The [demo subdirectory](demo/readme.md) contains a number of demonstrations for
Expand Down
7 changes: 6 additions & 1 deletion build/docker/build-image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,12 @@ if [ -d $(dirname $0)/../../vendor ] ; then
BUILD_ARGS="${BUILD_ARGS} --build-arg DIR=/go/src/github.com/intel/intel-device-plugins-for-kubernetes --build-arg GO111MODULE=off"
fi

BUILD_ARGS="${BUILD_ARGS} --build-arg FINAL_BASE=gcr.io/distroless/static"
BUILD_ARGS="${BUILD_ARGS} \
--build-arg FINAL_BASE=gcr.io/distroless/static \
--build-arg BUILD_BASE=golang:1.23-bookworm \
--build-arg FINAL_BASE_DYN=debian:unstable-slim \
--build-arg ROCKYLINUX=0"

if [ -z "${BUILDER}" -o "${BUILDER}" = 'docker' -o "${BUILDER}" = 'podman' ] ; then
${BUILDER} build --pull -t ${IMG}:${TAG} ${BUILD_ARGS} -f ${DOCKERFILE} .
elif [ "${BUILDER}" = 'buildah' ] ; then
Expand Down
91 changes: 91 additions & 0 deletions build/docker/intel-gpu-levelzero.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
## This is a generated file, do not edit directly. Edit build/docker/templates/intel-gpu-levelzero.Dockerfile.in instead.
##
## Copyright 2022 Intel Corporation. All Rights Reserved.
##
## Licensed under the Apache License, Version 2.0 (the "License");
## you may not use this file except in compliance with the License.
## You may obtain a copy of the License at
##
## http://www.apache.org/licenses/LICENSE-2.0
##
## Unless required by applicable law or agreed to in writing, software
## distributed under the License is distributed on an "AS IS" BASIS,
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
## See the License for the specific language governing permissions and
## limitations under the License.
###
ARG CMD=gpu_levelzero
ARG ROCKYLINUX=1
## FINAL_BASE_DYN can be used to configure the base image of the final image.
## The project default is 1) which sets FINAL_BASE_DYN=gcr.io/distroless/cc-debian12
## (see build-image.sh).
## 2) and the default FINAL_BASE is primarily used to build Redhat Certified Openshift Operator container images that must be UBI based.
## The RedHat build tool does not allow additional image build parameters.
ARG BUILD_BASE=rockylinux:9
ARG FINAL_BASE_DYN=registry.access.redhat.com/ubi9/ubi-minimal:9.3
###
FROM ${BUILD_BASE} as builder
ARG DIR=/intel-device-plugins-for-kubernetes
ENV CGO_CFLAGS="-pipe -fno-plt"
ENV CGO_LDFLAGS="-fstack-protector-strong -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now,-z,noexecstack,-z,defs,-s,-w"
ENV CGOFLAGS="-trimpath -mod=readonly -buildmode=pie"
ENV GCFLAGS="all=-spectre=all -N -l"
ENV ASMFLAGS="all=-spectre=all"
ENV LDFLAGS="all=-linkmode=external -s -w"
ARG GOLICENSES_VERSION
ARG CMD
ARG ROCKYLINUX
ARG CGO_VERSION=1.23
RUN mkdir /runtime
RUN if [ $ROCKYLINUX -eq 0 ]; then \
apt-get update && apt-get install --no-install-recommends -y wget libc6-dev ca-certificates ocl-icd-libopencl1 && \
cd /runtime && \
wget -q https://github.com/intel/compute-runtime/releases/download/24.26.30049.6/intel-level-zero-gpu_1.3.30049.6_amd64.deb && \
wget -q https://github.com/intel/compute-runtime/releases/download/24.26.30049.6/intel-opencl-icd_24.26.30049.6_amd64.deb && \
wget -q https://github.com/intel/compute-runtime/releases/download/24.26.30049.6/libigdgmm12_22.3.20_amd64.deb && \
wget -q https://github.com/oneapi-src/level-zero/releases/download/v1.17.6/level-zero-devel_1.17.6+u22.04_amd64.deb && \
wget -q https://github.com/oneapi-src/level-zero/releases/download/v1.17.6/level-zero_1.17.6+u22.04_amd64.deb && \
dpkg --ignore-depends=intel-igc-core,intel-igc-opencl -i *.deb && \
rm -rf /var/lib/apt/lists/\*; \
else \
source /etc/os-release && dnf install -y gcc jq wget 'dnf-command(config-manager)' && \
dnf config-manager --add-repo https://repositories.intel.com/gpu/rhel/${VERSION_ID}/lts/2350/unified/intel-gpu-${VERSION_ID}.repo && \
dnf install -y intel-opencl level-zero level-zero-devel intel-level-zero-gpu intel-gmmlib intel-ocloc && \
dnf clean all && \
LATEST_GO=$(curl --no-progress-meter https://go.dev/dl/?mode=json | jq ".[] | select(.version | startswith(\"go${CGO_VERSION}\")).version" | tr -d "\"") && \
wget -q https://go.dev/dl/$LATEST_GO.linux-amd64.tar.gz -O - | tar -xz -C /usr/local && \
cp -a /etc/OpenCL /usr/lib64/libocloc.so /usr/lib64/libze_intel_gpu.* /usr/lib64/libze_loader.* /usr/lib64/libigdgmm.* /runtime/ && \
mkdir /runtime/licenses/ && cd /usr/share/licenses/ && cp -a level-zero intel-gmmlib intel-level-zero-gpu intel-ocloc /runtime/licenses/; \
fi
ARG EP=/usr/local/bin/intel_gpu_levelzero
ARG CMD
WORKDIR ${DIR}
COPY . .
RUN export PATH=$PATH:/usr/local/go/bin/ && cd cmd/${CMD} && \
GO111MODULE=on CGO_ENABLED=1 go install $CGOFLAGS --gcflags="$GCFLAGS" --asmflags="$ASMFLAGS" --ldflags="$LDFLAGS"
RUN [ $ROCKYLINUX -eq 0 ] && install -D /go/bin/${CMD} /install_root${EP} || install -D /root/go/bin/${CMD} /install_root${EP}
RUN install -D ${DIR}/LICENSE /install_root/licenses/intel-device-plugins-for-kubernetes/LICENSE \
&& if [ ! -d "licenses/$CMD" ] ; then \
GO111MODULE=on go run github.com/google/go-licenses@${GOLICENSES_VERSION} save "./cmd/$CMD" \
--save_path /install_root/licenses/$CMD/go-licenses ; \
else mkdir -p /install_root/licenses/$CMD/go-licenses/ && cd licenses/$CMD && cp -r * /install_root/licenses/$CMD/go-licenses/ ; fi
FROM ${FINAL_BASE_DYN}
ARG CMD
ARG ROCKYLINUX
COPY --from=builder /runtime /runtime
RUN if [ $ROCKYLINUX -eq 0 ]; then \
apt-get update && apt-get install --no-install-recommends -y ocl-icd-libopencl1 && \
rm /runtime/level-zero-devel_*.deb && \
cd /runtime && dpkg --ignore-depends=intel-igc-core,intel-igc-opencl -i *.deb && rm -rf /runtime && \
rm "/lib/x86_64-linux-gnu/libze_validation"* && rm "/lib/x86_64-linux-gnu/libze_tracing_layer"*; \
else \
cp -a /runtime//*.so* /usr/lib64/ && cp -a /runtime/OpenCL /etc/ && cp -a /runtime/licenses/* /usr/share/licenses/; \
fi
COPY --from=builder /install_root /
ENTRYPOINT ["/usr/local/bin/intel_gpu_levelzero"]
LABEL vendor='Intel®'
LABEL version='devel'
LABEL release='1'
LABEL name='intel-gpu-levelzero'
LABEL summary='Intel® GPU levelzero for Kubernetes'
LABEL description='The GPU levelzero container provides access to Levelzero API for the Intel GPU plugin'
87 changes: 87 additions & 0 deletions build/docker/templates/intel-gpu-levelzero.Dockerfile.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#define _ENTRYPOINT_ /usr/local/bin/intel_gpu_levelzero

ARG CMD=gpu_levelzero
ARG ROCKYLINUX=1

## FINAL_BASE_DYN can be used to configure the base image of the final image.
## The project default is 1) which sets FINAL_BASE_DYN=gcr.io/distroless/cc-debian12
## (see build-image.sh).
## 2) and the default FINAL_BASE is primarily used to build Redhat Certified Openshift Operator container images that must be UBI based.
## The RedHat build tool does not allow additional image build parameters.
ARG BUILD_BASE=rockylinux:9
ARG FINAL_BASE_DYN=registry.access.redhat.com/ubi9/ubi-minimal:9.3
###

FROM ${BUILD_BASE} as builder

ARG DIR=/intel-device-plugins-for-kubernetes

ENV CGO_CFLAGS="-pipe -fno-plt"
tkatila marked this conversation as resolved.
Show resolved Hide resolved
ENV CGO_LDFLAGS="-fstack-protector-strong -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now,-z,noexecstack,-z,defs,-s,-w"
ENV CGOFLAGS="-trimpath -mod=readonly -buildmode=pie"
ENV GCFLAGS="all=-spectre=all -N -l"
ENV ASMFLAGS="all=-spectre=all"
ENV LDFLAGS="all=-linkmode=external -s -w"

ARG GOLICENSES_VERSION
ARG CMD
ARG ROCKYLINUX
ARG CGO_VERSION=1.23

RUN mkdir /runtime

RUN if [ $ROCKYLINUX -eq 0 ]; then \N
apt-get update && apt-get install --no-install-recommends -y wget libc6-dev ca-certificates ocl-icd-libopencl1 && \N
cd /runtime && \N
wget -q https://github.com/intel/compute-runtime/releases/download/24.26.30049.6/intel-level-zero-gpu_1.3.30049.6_amd64.deb && \N
wget -q https://github.com/intel/compute-runtime/releases/download/24.26.30049.6/intel-opencl-icd_24.26.30049.6_amd64.deb && \N
wget -q https://github.com/intel/compute-runtime/releases/download/24.26.30049.6/libigdgmm12_22.3.20_amd64.deb && \N
wget -q https://github.com/oneapi-src/level-zero/releases/download/v1.17.6/level-zero-devel_1.17.6+u22.04_amd64.deb && \N
wget -q https://github.com/oneapi-src/level-zero/releases/download/v1.17.6/level-zero_1.17.6+u22.04_amd64.deb && \N
dpkg --ignore-depends=intel-igc-core,intel-igc-opencl -i *.deb && \N
rm -rf /var/lib/apt/lists/\*; \N
else \N
source /etc/os-release && dnf install -y gcc jq wget 'dnf-command(config-manager)' && \N
dnf config-manager --add-repo https://repositories.intel.com/gpu/rhel/${VERSION_ID}/lts/2350/unified/intel-gpu-${VERSION_ID}.repo && \N
dnf install -y intel-opencl level-zero level-zero-devel intel-level-zero-gpu intel-gmmlib intel-ocloc && \N
dnf clean all && \N
LATEST_GO=$(curl --no-progress-meter https://go.dev/dl/?mode=json | jq ".[] | select(.version | startswith(\"go${CGO_VERSION}\")).version" | tr -d "\"") && \N
wget -q https://go.dev/dl/$LATEST_GO.linux-amd64.tar.gz -O - | tar -xz -C /usr/local && \N
cp -a /etc/OpenCL /usr/lib64/libocloc.so /usr/lib64/libze_intel_gpu.* /usr/lib64/libze_loader.* /usr/lib64/libigdgmm.* /runtime/ && \N
mkdir /runtime/licenses/ && cd /usr/share/licenses/ && cp -a level-zero intel-gmmlib intel-level-zero-gpu intel-ocloc /runtime/licenses/; \N
fi

ARG EP=_ENTRYPOINT_
ARG CMD

WORKDIR ${DIR}
COPY . .

RUN export PATH=$PATH:/usr/local/go/bin/ && cd cmd/${CMD} && \N
GO111MODULE=on CGO_ENABLED=1 go install $CGOFLAGS --gcflags="$GCFLAGS" --asmflags="$ASMFLAGS" --ldflags="$LDFLAGS"
RUN [ $ROCKYLINUX -eq 0 ] && install -D /go/bin/${CMD} /install_root${EP} || install -D /root/go/bin/${CMD} /install_root${EP}

#include "default_licenses.docker"

FROM ${FINAL_BASE_DYN}

ARG CMD
ARG ROCKYLINUX

COPY --from=builder /runtime /runtime

RUN if [ $ROCKYLINUX -eq 0 ]; then \N
apt-get update && apt-get install --no-install-recommends -y ocl-icd-libopencl1 && \N
rm /runtime/level-zero-devel_*.deb && \N
cd /runtime && dpkg --ignore-depends=intel-igc-core,intel-igc-opencl -i *.deb && rm -rf /runtime && \N
rm "/lib/x86_64-linux-gnu/libze_validation"* && rm "/lib/x86_64-linux-gnu/libze_tracing_layer"*; \N
else \N
cp -a /runtime//*.so* /usr/lib64/ && cp -a /runtime/OpenCL /etc/ && cp -a /runtime/licenses/* /usr/share/licenses/; \N
fi

#include "default_end.docker"
#include "default_labels.docker"

LABEL name='intel-gpu-levelzero'
LABEL summary='Intel® GPU levelzero for Kubernetes'
LABEL description='The GPU levelzero container provides access to Levelzero API for the Intel GPU plugin'
38 changes: 38 additions & 0 deletions cmd/gpu_levelzero/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Intel GPU Level-Zero sidecar

Table of Contents

* [Introduction](#introduction)
* [Install](#install)

## Introduction

Intel GPU Level-Zero sidecar is an extension for the Intel GPU plugin to query additional GPU details from the oneAPI/Level-Zero API. As the Level-Zero is a C/C++ API, it is preferred to keep the original GPU plugin as-is and add the additional functionality via the Level-Zero sidecar. The GPU plugin can be configured to use the Level-Zero sidecar with an overlay, see [install](#install).

Intel GPU plugin and the Level-Zero sidecar communicate via gRPC on a local socket visible only to the containers.

> **NOTE**: Intel Device Plugin Operator doesn't yet support enabling Level-Zero sidecar in the GPU CR object.

## Modes and Configuration Options

| Flag | Argument | Default | Meaning |
|:---- |:-------- |:------- |:------- |
| -socket | unix socket path | /var/lib/levelzero/server.sock | Unix socket path which the server registers itself into. |
| -wsl | - | disabled | Adapt sidecar to run in the WSL environment. |
| -v | verbosity | 1 | Log verbosity |

## Install

Installing the sidecar along with the GPU plugin happens via two possible overlays: [health](../../deployments/gpu_plugin/overlays/health/) and [wsl](../../deployments/gpu_plugin/overlays/wsl/).

Health overlay adds the sidecar to the base GPU plugin deployment and configures GPU plugin to retrieve device health indicators from the Level-Zero API:

```bash
$ kubectl -k deployments/gpu_plugin/overlays/health
```

WSL layer enables Intel GPU detection with WSL (Windows Subsystem for Linux) Kubernetes clusters. It also leverages the Level-Zero sidecar:

```bash
$ kubectl -k deployments/gpu_plugin/overlays/wsl
```
Loading
Loading