Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV by alexnorell · Pull Request #2321 · roboflow/inference

alexnorell · 2026-05-11T14:32:14Z

Summary

JP62: build OpenCV 4.13.0 from source with WITH_GSTREAMER=ON + CUDA, and add the runtime GStreamer stack (python3-gi, GIR typelibs, base plugins) so downstream code can build hardware capture/encode pipelines. Fixes the 4K USB camera workflow that was running at ~1.3 inference FPS against ~21 camera FPS — pre-PR cv2 was CPU-only and cv2.getBuildInformation() reported GStreamer: NO.
JP62: drop the attempt to bake libgstnv*.so into the image. Those plugins live on the host JetPack BSP and are mounted in at runtime by nvidia-container-runtime via the CSVs under /etc/nvidia-container-runtime/host-files-for-container.d/. Container now ships only the cv2-facing half of the stack; the host provides the NVIDIA plugins.
JP62: fix libnvdla_compiler.so source path (tegra/ -> nvidia/). The original glob in Fix missing libnvdla_compiler.so in Jetson 6.x TRT runtime #2201 silently matched zero files, so onnxruntime kept falling back to CPU on Jetson 6.x. Same one-liner as Fix libnvdla_compiler.so COPY path for Jetson 6.x runtime #2306 for jp61.
JP62 + JP71: add BuildKit cache mounts to both Jetson dockerfiles — neither had any cache mounts before this PR, so warm rebuilds were doing the full 30-60 min OpenCV / PyTorch / torchvision / onnxruntime recompiles every run. New mounts cover apt plus per-build-tree directories for each heavy C++ compile (and the TRT .deb download on JP62). Each Jetson workflow also moves onto its own Depot project so the JP62 and JP71 builds stop evicting each other's NVMe state.
JP51: OpenCV 4.12.0 -> 4.13.0 to stay aligned with JP62.
All workflows: bump every GitHub Action in .github/workflows/ to its latest major (59 files touched). Done in one mechanical pass while we were already poking the Jetson workflows. See "Action version bumps" below.

Why cv2 was the bottleneck (JP62)

cv2.getBuildInformation() in the pre-PR JP62 image reported GStreamer: NO. The pip opencv-python wheel ships without GStreamer, so every frame went through YUYV uncompressed capture, CPU YUV-to-BGR, and the raw 4K stream got fed downstream to detection + visualization.

Even if cv2 had been compiled with GStreamer support, the runtime base l4t-cuda:12.6.11-runtime doesn't ship the JetPack multimedia stack, so the NVIDIA elements (nvv4l2decoder, nvvidconv, nvjpegenc/dec) wouldn't have been reachable anyway. That's why the fix is two-sided: build cv2 with GStreamer, and rely on the host BSP for the NVIDIA plugins.

JP71 ships the same pip cv2 wheel and would hit the same wall at 4K, but the GStreamer rebuild isn't ported there yet — only the build-perf and workflow changes apply to JP71 in this PR.

OpenCV build details (`Dockerfile.onnx.jetson.6.2.0`)

Builder:

OpenCV 4.13.0 + opencv_contrib from source, WITH_GSTREAMER=ON, WITH_FFMPEG=ON, WITH_LIBV4L=ON, WITH_CUDA=ON, CUDA_ARCH_BIN=8.7. Pip's opencv-python / -headless / -contrib-python are uninstalled first; the wheel built from python_loader/ is installed in their place.
Build-time cv2.getBuildInformation() assertion so any regression in detected flags fails the docker build, not the device.
Per-platform OpenCV values pulled into ARGs (OPENCV_CUDA_ARCH, OPENCV_PYTHON_INSTALL_PATH, OPENCV_PYTHON_INCLUDE_DIR, OPENCV_PYTHON_VERSION) so porting this block to jp5.1.1 / jp7.1.0 is four ARG edits rather than cmake surgery.
cv2 install-tree config rescue: ninja install writes correct-path config*.py, then pip install of the python_loader wheel overwrites them with BUILD-TREE paths that don't exist in the runtime stage and produce ImportError: recursion is detected during loading of "cv2". The fix snapshots the install-tree configs around the pip install and restores them after. A follow-up sanity check strips /build/ from sys.path and imports cv2 so this fails the docker build instead of the device.

Runtime stage adds libgstreamer1.0-0, gstreamer1.0-plugins-{base,good,tools}, python3-gi, gir1.2-gstreamer-1.0, gir1.2-gst-plugins-base-1.0, plus an inline comment explaining where the NVIDIA plugins come from.

Build perf: cache mounts added (`Dockerfile.onnx.jetson.6.2.0` + `Dockerfile.onnx.jetson.7.1.0`)

Neither Jetson dockerfile had BuildKit cache mounts before this PR. Adding them on both:

# syntax=docker/dockerfile:1.7 pinned so ARG interpolation in cache-mount id= works.
Apt cache mounts (/var/cache/apt + /var/lib/apt, sharing=locked) on every apt RUN, ids scoped per JetPack x (builder|runtime) x arch. docker-clean disabled and Keep-Downloaded-Packages "true" so the cache is actually used. The trailing rm -rf /var/lib/apt/lists/* came out since that path is now the cache mount.
Per-build-tree cache mounts on PyTorch, torchvision, onnxruntime (build/cuda12 on JP62, build/Linux on JP71), OpenCV (release/ on JP62), and the TRT .deb download on JP62. Each id is scoped by version + arch + SM so a version bump starts clean.
git clone is now a separate RUN from the cache-mounted build RUN. The cache mount auto-creates its parent before the body executes, so a combined git clone && build hit destination path 'pytorch' already exists. Two RUNs now: clone (no mount), then build (cache mount overlays build/).
OpenCV stays last in the JP62 builder on purpose. The upstream inference wheel installs aren't --no-deps, so moving OpenCV earlier risks opencv-python being reinstalled by a transitive dep and clobbering the from-source build. The build-dir cache mount makes the forced rebuild after COPY . . invalidation a fast ninja no-op rather than a 30-60 min recompile.

Workflows

.github/workflows/docker.jetson.6.2.0.yml and .github/workflows/docker.jetson.7.1.0.yml now point at their own Depot projects (JP62 2rp7mfjw7q, JP71 v1xzfwkc4b) instead of sharing grl7ffzxd7. Sharing one project meant the two builds were evicting each other's NVMe layers and cache-mount state on every run.

Action version bumps

Everything in .github/workflows/ got bumped to its latest major in a single pass:

Action	Old	New
`actions/cache`	v3	v5
`actions/checkout`	v2 / v3 / v4	v6
`actions/create-github-app-token`	v1	v3
`actions/setup-node`	v3	v6
`actions/setup-python`	v2 / v5	v6
`actions/upload-artifact`	v4	v7
`aws-actions/amazon-ecr-login`	v1	v2
`aws-actions/configure-aws-credentials`	v2	v6
`dcarbone/install-jq-action`	v2.1.0	v3.2.0
`digicert/ssm-code-signing`	v1.0.0	v1.2.1
`docker/login-action`	v2 / v3	v4
`docker/setup-buildx-action`	v2	v4
`docker/setup-qemu-action`	v2	v4
`google-github-actions/auth`	v2	v3
`google-github-actions/setup-gcloud`	v2	v3

Left as-is:

depot/setup-action@v1 and depot/build-push-action@v1 already track the floating major (latest is v1.7.1 / v1.17.0 within v1).
actions/upload-release-asset@v1 — still latest major; the upstream is archived but the v1 release is what it is.
pypa/gh-action-pypi-publish@release/v1 — upstream's recommended stable-channel ref, kept verbatim.

Most bumps are pure Node-version updates (Node 20 -> 24) and won't change behaviour. The two worth eyeballing in CI before merge are actions/upload-artifact (v4 -> v7 is a multi-major jump, though we only use it with name / path / if-no-files-found / retention-days which have been stable) and aws-actions/configure-aws-credentials (v2 -> v6, used with aws-region + access-key inputs which are also stable).

Test plan

…o JP62 The pip opencv-python wheel ships without GStreamer, and the l4t-cuda runtime base used for the JP62 image has no multimedia stack, so cv2.VideoCapture on Jetson silently falls through to the plain v4l2 ioctl path -- pulling raw 4K YUYV, doing CPU YUV->BGR, and never reaching NVIDIA's hardware engines. Three pieces, all needed together: 1. Compile OpenCV from source in the builder stage with WITH_GSTREAMER=ON, WITH_FFMPEG=ON, WITH_LIBV4L=ON, WITH_CUDA=ON. Replaces the pip-installed opencv-python/opencv-contrib-python with a wheel built from python_loader. Adds a build-time cv2.getBuildInformation() check so a regression in the detected flags fails the build instead of silently shipping CPU-only cv2. 2. Add the GStreamer runtime + PyGObject stack to the runtime stage (libgstreamer1.0-0, plugins-base/good, python3-gi, gir typelibs) so that downstream Python code using "from gi.repository import Gst" to construct hardware capture/encode pipelines can import and run inside the container. 3. Copy NVIDIA's GStreamer plugins (libgstnv*.so) plus the tegra runtime libs from the JetPack builder stage into the runtime image, instead of relying on nvidia-container-passthrough host mounts. Self-contained -- works without any host-side changes deployed. Branched off v1.2.7 to keep the diff minimal.

CMake 4.x removed compatibility with cmake_minimum_required(VERSION <3.5). OpenCV 4.10.0's OpenCVGenPkgconfig.cmake still uses the old declaration, so the configure step aborts with: CMake Error at .../OpenCVGenPkgconfig.cmake:113 (cmake_minimum_required): Compatibility with CMake < 3.5 has been removed from CMake. Same workaround the existing onnxruntime build step uses.

OpenCV 4.13.0 (released 2025-12-31) is the current latest stable; it builds cleanly under CMake 4.x without the cmake_minimum_required workaround the 4.10.0 source needed. Aligns both Jetson images. JP62-specific: - Move the from-source OpenCV step to AFTER the inference packages install so the cv2 we install (built with GStreamer + FFmpeg) is the last thing to touch /usr/local/lib/python3.10/dist-packages/cv2/. Previously the inference_core/cli/sdk pip install would refetch opencv-python (as a transitive dep) and clobber our from-source cv2. - Drop CMAKE_POLICY_VERSION_MINIMUM=3.5 -- not needed on 4.13.0.

The l4t-jetpack:r36.4.0 builder image does not ship the tegra runtime libs as a regular populated /usr/lib/aarch64-linux-gnu/tegra directory. Those libs are tied to the host JetPack BSP and only appear inside the container at run time via nvidia-container-runtime's CSV-driven host mounts. So `COPY --from=builder /usr/lib/aarch64-linux-gnu/tegra ...` errors during the build with "not found". This image now scopes to enabling the cv2 side of the GStreamer stack (WITH_GSTREAMER=ON OpenCV, python-gi, base gstreamer runtime + GIR typelibs). The host system has to provide the NVIDIA plugins for any nv* pipeline element to resolve at run time.

BuildKit needs to lstat the source parent dir before evaluating a glob COPY pattern, even when "no match" is allowed. The libnvdla fix in PR #2201 copies tegra/libnvdla_compiler.so*, but l4t-jetpack:r36.4.0 doesn't always ship /usr/lib/aarch64-linux-gnu/tegra/ as a real directory -- that path is normally populated by nvidia-container-runtime at container start. Recent builds intermittently fail at this step depending on layer cache state. mkdir -p the path so BuildKit can process the COPY. The glob will copy the real file if it's there, or silently no-op if not. Doesn't paper over the underlying flakiness of the libnvdla source path, but lets this PR's build progress.

The actual location in nvcr.io/nvidia/l4t-jetpack:r36.4.0 is /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so, not tegra/. The original COPY in PR #2201 used the wrong path; the glob silently matched zero files, so the libnvdla fix shipped an empty layer and ONNX Runtime kept falling back to CPU on Jetson 6.x. Same one-line correction as PR #2306. Bundling it here so this PR's build can also actually ship libnvdla_compiler.so.

Without OPENCV_PYTHON3_INSTALL_PATH / PYTHON3_INCLUDE_DIR / PYTHON_VERSION the python_loader config.py bakes the BUILDER path (/build/opencv/<ver>/release/lib/python3) into sys.path. That path only exists in the builder stage, so in the runtime image the loader stub fails to find the native cv2 .so, falls back to importlib.import_module("cv2") which re-enters the same stub package, and Python raises: ImportError: ERROR: recursion is detected during loading of "cv2" binary extensions. Check OpenCV installation. JP51's PR #2100 passes these flags explicitly; replicating the same set here so the loader references /usr/local/lib/python3.10/dist-packages.

Two related changes to make the from-source OpenCV build correct and reusable across jp5.1 / jp6.2 / jp7.1: 1. Rescue the install-tree config files. After ninja install, the files at ${INSTALL_PATH}/cv2/config*.py have install-tree paths (correct). The pip install of the wheel built from /build/.../python_loader then overwrites them with BUILD-TREE config files (paths like /build/opencv/opencv-X.Y.Z/release/lib/python3) which only exist in the builder stage. In a multi-stage runtime image those paths are dead, so the loader's bootstrap can't find the native cv2 .so, re-imports the stub package, and Python raises: ImportError: recursion is detected during loading of "cv2" Fix by saving config*.py between ninja install and pip wheel/install, restoring after. 2. Pull per-platform variables out of the cmake invocation into ARGs at the top of the builder (OPENCV_CUDA_ARCH, OPENCV_PYTHON_INSTALL_PATH, OPENCV_PYTHON_INCLUDE_DIR, OPENCV_PYTHON_VERSION). Replicating this OpenCV block in jp5.1.1 / jp7.1.0 dockerfiles is now a copy-paste plus four ARG value changes; no surgery inside the cmake invocation. Adds a build-time sanity check that simulates the runtime stage by stripping /build/ from sys.path before importing cv2. If the install- tree rescue ever regresses, the docker build fails here -- not 90 minutes later on the device.

Pins syntax to dockerfile:1.7 on both jp62 and jp71 so ARG interpolation in cache-mount ids works. Adds apt cache mounts (sharing=locked, id scoped per JetPack + builder/runtime + arch) to every apt RUN so .debs and lists persist across builds. Disables docker-clean and turns on Keep-Downloaded-Packages so the apt cache is actually used. Drops the trailing rm of /var/lib/apt/lists/* since that path is now the cache mount. Adds per-build-tree cache mounts on the expensive C++ compiles: PyTorch, torchvision, onnxruntime (build/cuda12 on jp62, build/Linux on jp71), OpenCV (release/ on jp62), with the TRT .deb download also cached on jp62. Each id is scoped by version + arch + SM so a version bump starts clean instead of reusing a stale build tree. The OpenCV step on jp62 stays positioned last in the builder because the inference wheel installs upstream are not --no-deps and an earlier opencv-python pull would clobber the from-source build. The build-dir cache mount makes the forced rebuild after COPY-invalidation a fast ninja no-op rather than a 30-60 min recompile. Points jp62 and jp71 workflows at the new dedicated Depot projects (2rp7mfjw7q for jp62, v1xzfwkc4b for jp71) so they no longer share an NVMe cache pool and evict each other.

The cache mount on build/ auto-creates its parent directory before the RUN body executes, so a combined 'git clone ... && cd ... && make' fails with 'destination path already exists and is not an empty directory'. First jp62 attempt with these cache mounts failed at the PyTorch step: fatal: destination path 'pytorch' already exists and is not an empty directory. Split each of PyTorch / torchvision / onnxruntime on jp62 and jp71 into two RUNs: a clone RUN with no mount, then a build RUN whose cache mount overlays build/ on the cloned source layer.

Bumps: - actions/cache v3 -> v5 - actions/checkout v2/v3/v4 -> v6 - actions/create-github-app-token v1 -> v3 - actions/setup-node v3 -> v6 - actions/setup-python v2/v5 -> v6 - actions/upload-artifact v4 -> v7 - aws-actions/amazon-ecr-login v1 -> v2 - aws-actions/configure-aws-credentials v2 -> v6 - dcarbone/install-jq-action v2.1.0 -> v3.2.0 - digicert/ssm-code-signing v1.0.0 -> v1.2.1 - docker/login-action v2/v3 -> v4 - docker/setup-buildx-action v2 -> v4 - docker/setup-qemu-action v2 -> v4 - google-github-actions/auth v2 -> v3 - google-github-actions/setup-gcloud v2 -> v3 depot/* already pinned to floating v1 (latest major). actions/upload-release-asset and pypa/gh-action-pypi-publish kept on their existing pins (v1 already latest; pypi-publish uses the release/v1 stable-channel ref recommended by upstream).

alexnorell force-pushed the fix/jp62-opencv-gstreamer-nvidia branch from f8e245f to 9ea044b Compare May 11, 2026 14:53

alexnorell force-pushed the fix/jp62-opencv-gstreamer-nvidia branch from 9ea044b to 1f6f621 Compare May 11, 2026 15:02

alexnorell added 7 commits May 11, 2026 09:06

alexnorell force-pushed the fix/jp62-opencv-gstreamer-nvidia branch from 119562a to a781139 Compare May 11, 2026 20:05

alexnorell added 2 commits May 11, 2026 14:31

alexnorell changed the title ~~[DRAFT] JP62: build OpenCV from source with GStreamer + bake NVIDIA gst plugins~~ [DRAFT] JP62: build OpenCV from source with GStreamer + build perf May 12, 2026

alexnorell marked this pull request as ready for review May 12, 2026 05:16

alexnorell requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, hansent, probicheaux, rafel-roboflow and yeldarby as code owners May 12, 2026 05:16

alexnorell changed the title ~~[DRAFT] JP62: build OpenCV from source with GStreamer + build perf~~ JP62: build OpenCV from source with GStreamer + build perf May 12, 2026

alexnorell changed the title ~~JP62: build OpenCV from source with GStreamer + build perf~~ Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV May 12, 2026

alexnorell mentioned this pull request May 12, 2026

Fix libnvdla_compiler.so COPY path for Jetson 6.x runtime #2306

Closed

3 tasks

dkosowski87 approved these changes May 12, 2026

View reviewed changes

PawelPeczek-Roboflow approved these changes May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV#2321

Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV#2321
alexnorell wants to merge 11 commits into
mainfrom
fix/jp62-opencv-gstreamer-nvidia

alexnorell commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexnorell commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why cv2 was the bottleneck (JP62)

OpenCV build details (Dockerfile.onnx.jetson.6.2.0)

Build perf: cache mounts added (Dockerfile.onnx.jetson.6.2.0 + Dockerfile.onnx.jetson.7.1.0)

Workflows

Action version bumps

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexnorell commented May 11, 2026 •

edited

Loading

OpenCV build details (`Dockerfile.onnx.jetson.6.2.0`)

Build perf: cache mounts added (`Dockerfile.onnx.jetson.6.2.0` + `Dockerfile.onnx.jetson.7.1.0`)