Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV#2321
Open
alexnorell wants to merge 11 commits into
Open
Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV#2321alexnorell wants to merge 11 commits into
alexnorell wants to merge 11 commits into
Conversation
f8e245f to
9ea044b
Compare
…o JP62 The pip opencv-python wheel ships without GStreamer, and the l4t-cuda runtime base used for the JP62 image has no multimedia stack, so cv2.VideoCapture on Jetson silently falls through to the plain v4l2 ioctl path -- pulling raw 4K YUYV, doing CPU YUV->BGR, and never reaching NVIDIA's hardware engines. Three pieces, all needed together: 1. Compile OpenCV from source in the builder stage with WITH_GSTREAMER=ON, WITH_FFMPEG=ON, WITH_LIBV4L=ON, WITH_CUDA=ON. Replaces the pip-installed opencv-python/opencv-contrib-python with a wheel built from python_loader. Adds a build-time cv2.getBuildInformation() check so a regression in the detected flags fails the build instead of silently shipping CPU-only cv2. 2. Add the GStreamer runtime + PyGObject stack to the runtime stage (libgstreamer1.0-0, plugins-base/good, python3-gi, gir typelibs) so that downstream Python code using "from gi.repository import Gst" to construct hardware capture/encode pipelines can import and run inside the container. 3. Copy NVIDIA's GStreamer plugins (libgstnv*.so) plus the tegra runtime libs from the JetPack builder stage into the runtime image, instead of relying on nvidia-container-passthrough host mounts. Self-contained -- works without any host-side changes deployed. Branched off v1.2.7 to keep the diff minimal.
9ea044b to
1f6f621
Compare
CMake 4.x removed compatibility with cmake_minimum_required(VERSION <3.5).
OpenCV 4.10.0's OpenCVGenPkgconfig.cmake still uses the old declaration,
so the configure step aborts with:
CMake Error at .../OpenCVGenPkgconfig.cmake:113 (cmake_minimum_required):
Compatibility with CMake < 3.5 has been removed from CMake.
Same workaround the existing onnxruntime build step uses.
OpenCV 4.13.0 (released 2025-12-31) is the current latest stable; it builds cleanly under CMake 4.x without the cmake_minimum_required workaround the 4.10.0 source needed. Aligns both Jetson images. JP62-specific: - Move the from-source OpenCV step to AFTER the inference packages install so the cv2 we install (built with GStreamer + FFmpeg) is the last thing to touch /usr/local/lib/python3.10/dist-packages/cv2/. Previously the inference_core/cli/sdk pip install would refetch opencv-python (as a transitive dep) and clobber our from-source cv2. - Drop CMAKE_POLICY_VERSION_MINIMUM=3.5 -- not needed on 4.13.0.
The l4t-jetpack:r36.4.0 builder image does not ship the tegra runtime libs as a regular populated /usr/lib/aarch64-linux-gnu/tegra directory. Those libs are tied to the host JetPack BSP and only appear inside the container at run time via nvidia-container-runtime's CSV-driven host mounts. So `COPY --from=builder /usr/lib/aarch64-linux-gnu/tegra ...` errors during the build with "not found". This image now scopes to enabling the cv2 side of the GStreamer stack (WITH_GSTREAMER=ON OpenCV, python-gi, base gstreamer runtime + GIR typelibs). The host system has to provide the NVIDIA plugins for any nv* pipeline element to resolve at run time.
BuildKit needs to lstat the source parent dir before evaluating a glob COPY pattern, even when "no match" is allowed. The libnvdla fix in PR #2201 copies tegra/libnvdla_compiler.so*, but l4t-jetpack:r36.4.0 doesn't always ship /usr/lib/aarch64-linux-gnu/tegra/ as a real directory -- that path is normally populated by nvidia-container-runtime at container start. Recent builds intermittently fail at this step depending on layer cache state. mkdir -p the path so BuildKit can process the COPY. The glob will copy the real file if it's there, or silently no-op if not. Doesn't paper over the underlying flakiness of the libnvdla source path, but lets this PR's build progress.
The actual location in nvcr.io/nvidia/l4t-jetpack:r36.4.0 is /usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so, not tegra/. The original COPY in PR #2201 used the wrong path; the glob silently matched zero files, so the libnvdla fix shipped an empty layer and ONNX Runtime kept falling back to CPU on Jetson 6.x. Same one-line correction as PR #2306. Bundling it here so this PR's build can also actually ship libnvdla_compiler.so.
Without OPENCV_PYTHON3_INSTALL_PATH / PYTHON3_INCLUDE_DIR / PYTHON_VERSION
the python_loader config.py bakes the BUILDER path
(/build/opencv/<ver>/release/lib/python3) into sys.path. That path only
exists in the builder stage, so in the runtime image the loader stub
fails to find the native cv2 .so, falls back to importlib.import_module("cv2")
which re-enters the same stub package, and Python raises:
ImportError: ERROR: recursion is detected during loading of "cv2"
binary extensions. Check OpenCV installation.
JP51's PR #2100 passes these flags explicitly; replicating the same set
here so the loader references /usr/local/lib/python3.10/dist-packages.
Two related changes to make the from-source OpenCV build correct and
reusable across jp5.1 / jp6.2 / jp7.1:
1. Rescue the install-tree config files. After ninja install, the files
at ${INSTALL_PATH}/cv2/config*.py have install-tree paths (correct).
The pip install of the wheel built from /build/.../python_loader then
overwrites them with BUILD-TREE config files (paths like
/build/opencv/opencv-X.Y.Z/release/lib/python3) which only exist in
the builder stage. In a multi-stage runtime image those paths are
dead, so the loader's bootstrap can't find the native cv2 .so,
re-imports the stub package, and Python raises:
ImportError: recursion is detected during loading of "cv2"
Fix by saving config*.py between ninja install and pip wheel/install,
restoring after.
2. Pull per-platform variables out of the cmake invocation into ARGs at
the top of the builder (OPENCV_CUDA_ARCH, OPENCV_PYTHON_INSTALL_PATH,
OPENCV_PYTHON_INCLUDE_DIR, OPENCV_PYTHON_VERSION). Replicating this
OpenCV block in jp5.1.1 / jp7.1.0 dockerfiles is now a copy-paste
plus four ARG value changes; no surgery inside the cmake invocation.
Adds a build-time sanity check that simulates the runtime stage by
stripping /build/ from sys.path before importing cv2. If the install-
tree rescue ever regresses, the docker build fails here -- not 90
minutes later on the device.
119562a to
a781139
Compare
Pins syntax to dockerfile:1.7 on both jp62 and jp71 so ARG interpolation in cache-mount ids works. Adds apt cache mounts (sharing=locked, id scoped per JetPack + builder/runtime + arch) to every apt RUN so .debs and lists persist across builds. Disables docker-clean and turns on Keep-Downloaded-Packages so the apt cache is actually used. Drops the trailing rm of /var/lib/apt/lists/* since that path is now the cache mount. Adds per-build-tree cache mounts on the expensive C++ compiles: PyTorch, torchvision, onnxruntime (build/cuda12 on jp62, build/Linux on jp71), OpenCV (release/ on jp62), with the TRT .deb download also cached on jp62. Each id is scoped by version + arch + SM so a version bump starts clean instead of reusing a stale build tree. The OpenCV step on jp62 stays positioned last in the builder because the inference wheel installs upstream are not --no-deps and an earlier opencv-python pull would clobber the from-source build. The build-dir cache mount makes the forced rebuild after COPY-invalidation a fast ninja no-op rather than a 30-60 min recompile. Points jp62 and jp71 workflows at the new dedicated Depot projects (2rp7mfjw7q for jp62, v1xzfwkc4b for jp71) so they no longer share an NVMe cache pool and evict each other.
The cache mount on build/ auto-creates its parent directory before the RUN body executes, so a combined 'git clone ... && cd ... && make' fails with 'destination path already exists and is not an empty directory'. First jp62 attempt with these cache mounts failed at the PyTorch step: fatal: destination path 'pytorch' already exists and is not an empty directory. Split each of PyTorch / torchvision / onnxruntime on jp62 and jp71 into two RUNs: a clone RUN with no mount, then a build RUN whose cache mount overlays build/ on the cloned source layer.
3 tasks
Bumps: - actions/cache v3 -> v5 - actions/checkout v2/v3/v4 -> v6 - actions/create-github-app-token v1 -> v3 - actions/setup-node v3 -> v6 - actions/setup-python v2/v5 -> v6 - actions/upload-artifact v4 -> v7 - aws-actions/amazon-ecr-login v1 -> v2 - aws-actions/configure-aws-credentials v2 -> v6 - dcarbone/install-jq-action v2.1.0 -> v3.2.0 - digicert/ssm-code-signing v1.0.0 -> v1.2.1 - docker/login-action v2/v3 -> v4 - docker/setup-buildx-action v2 -> v4 - docker/setup-qemu-action v2 -> v4 - google-github-actions/auth v2 -> v3 - google-github-actions/setup-gcloud v2 -> v3 depot/* already pinned to floating v1 (latest major). actions/upload-release-asset and pypa/gh-action-pypi-publish kept on their existing pins (v1 already latest; pypi-publish uses the release/v1 stable-channel ref recommended by upstream).
dkosowski87
approved these changes
May 12, 2026
PawelPeczek-Roboflow
approved these changes
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WITH_GSTREAMER=ON+ CUDA, and add the runtime GStreamer stack (python3-gi, GIR typelibs, base plugins) so downstream code can build hardware capture/encode pipelines. Fixes the 4K USB camera workflow that was running at ~1.3 inference FPS against ~21 camera FPS — pre-PR cv2 was CPU-only andcv2.getBuildInformation()reportedGStreamer: NO.libgstnv*.sointo the image. Those plugins live on the host JetPack BSP and are mounted in at runtime bynvidia-container-runtimevia the CSVs under/etc/nvidia-container-runtime/host-files-for-container.d/. Container now ships only the cv2-facing half of the stack; the host provides the NVIDIA plugins.libnvdla_compiler.sosource path (tegra/->nvidia/). The original glob in Fix missing libnvdla_compiler.so in Jetson 6.x TRT runtime #2201 silently matched zero files, so onnxruntime kept falling back to CPU on Jetson 6.x. Same one-liner as Fix libnvdla_compiler.so COPY path for Jetson 6.x runtime #2306 for jp61..debdownload on JP62). Each Jetson workflow also moves onto its own Depot project so the JP62 and JP71 builds stop evicting each other's NVMe state..github/workflows/to its latest major (59 files touched). Done in one mechanical pass while we were already poking the Jetson workflows. See "Action version bumps" below.Why cv2 was the bottleneck (JP62)
cv2.getBuildInformation()in the pre-PR JP62 image reportedGStreamer: NO. The pipopencv-pythonwheel ships without GStreamer, so every frame went through YUYV uncompressed capture, CPU YUV-to-BGR, and the raw 4K stream got fed downstream to detection + visualization.Even if cv2 had been compiled with GStreamer support, the runtime base
l4t-cuda:12.6.11-runtimedoesn't ship the JetPack multimedia stack, so the NVIDIA elements (nvv4l2decoder,nvvidconv,nvjpegenc/dec) wouldn't have been reachable anyway. That's why the fix is two-sided: build cv2 with GStreamer, and rely on the host BSP for the NVIDIA plugins.JP71 ships the same pip cv2 wheel and would hit the same wall at 4K, but the GStreamer rebuild isn't ported there yet — only the build-perf and workflow changes apply to JP71 in this PR.
OpenCV build details (
Dockerfile.onnx.jetson.6.2.0)Builder:
WITH_GSTREAMER=ON,WITH_FFMPEG=ON,WITH_LIBV4L=ON,WITH_CUDA=ON,CUDA_ARCH_BIN=8.7. Pip'sopencv-python/-headless/-contrib-pythonare uninstalled first; the wheel built frompython_loader/is installed in their place.cv2.getBuildInformation()assertion so any regression in detected flags fails the docker build, not the device.OPENCV_CUDA_ARCH,OPENCV_PYTHON_INSTALL_PATH,OPENCV_PYTHON_INCLUDE_DIR,OPENCV_PYTHON_VERSION) so porting this block to jp5.1.1 / jp7.1.0 is four ARG edits rather than cmake surgery.ninja installwrites correct-pathconfig*.py, thenpip installof the python_loader wheel overwrites them with BUILD-TREE paths that don't exist in the runtime stage and produceImportError: recursion is detected during loading of "cv2". The fix snapshots the install-tree configs around the pip install and restores them after. A follow-up sanity check strips/build/fromsys.pathand imports cv2 so this fails the docker build instead of the device.Runtime stage adds
libgstreamer1.0-0,gstreamer1.0-plugins-{base,good,tools},python3-gi,gir1.2-gstreamer-1.0,gir1.2-gst-plugins-base-1.0, plus an inline comment explaining where the NVIDIA plugins come from.Build perf: cache mounts added (
Dockerfile.onnx.jetson.6.2.0+Dockerfile.onnx.jetson.7.1.0)Neither Jetson dockerfile had BuildKit cache mounts before this PR. Adding them on both:
# syntax=docker/dockerfile:1.7pinned so ARG interpolation in cache-mountid=works./var/cache/apt+/var/lib/apt,sharing=locked) on every apt RUN, ids scoped per JetPack x (builder|runtime) x arch.docker-cleandisabled andKeep-Downloaded-Packages "true"so the cache is actually used. The trailingrm -rf /var/lib/apt/lists/*came out since that path is now the cache mount.build/cuda12on JP62,build/Linuxon JP71), OpenCV (release/on JP62), and the TRT.debdownload on JP62. Each id is scoped by version + arch + SM so a version bump starts clean.git cloneis now a separate RUN from the cache-mounted build RUN. The cache mount auto-creates its parent before the body executes, so a combinedgit clone && buildhitdestination path 'pytorch' already exists. Two RUNs now: clone (no mount), then build (cache mount overlaysbuild/).--no-deps, so moving OpenCV earlier risksopencv-pythonbeing reinstalled by a transitive dep and clobbering the from-source build. The build-dir cache mount makes the forced rebuild afterCOPY . .invalidation a fast ninja no-op rather than a 30-60 min recompile.Workflows
.github/workflows/docker.jetson.6.2.0.ymland.github/workflows/docker.jetson.7.1.0.ymlnow point at their own Depot projects (JP622rp7mfjw7q, JP71v1xzfwkc4b) instead of sharinggrl7ffzxd7. Sharing one project meant the two builds were evicting each other's NVMe layers and cache-mount state on every run.Action version bumps
Everything in
.github/workflows/got bumped to its latest major in a single pass:actions/cacheactions/checkoutactions/create-github-app-tokenactions/setup-nodeactions/setup-pythonactions/upload-artifactaws-actions/amazon-ecr-loginaws-actions/configure-aws-credentialsdcarbone/install-jq-actiondigicert/ssm-code-signingdocker/login-actiondocker/setup-buildx-actiondocker/setup-qemu-actiongoogle-github-actions/authgoogle-github-actions/setup-gcloudLeft as-is:
depot/setup-action@v1anddepot/build-push-action@v1already track the floating major (latest is v1.7.1 / v1.17.0 within v1).actions/upload-release-asset@v1— still latest major; the upstream is archived but the v1 release is what it is.pypa/gh-action-pypi-publish@release/v1— upstream's recommended stable-channel ref, kept verbatim.Most bumps are pure Node-version updates (Node 20 -> 24) and won't change behaviour. The two worth eyeballing in CI before merge are
actions/upload-artifact(v4 -> v7 is a multi-major jump, though we only use it withname/path/if-no-files-found/retention-dayswhich have been stable) andaws-actions/configure-aws-credentials(v2 -> v6, used withaws-region+ access-key inputs which are also stable).Test plan
gh workflow run docker.jetson.6.2.0.yml --ref fix/jp62-opencv-gstreamer-nvidia -f force_push=true -f custom_tag=jp62-gstreamer-nvidiadocker.jetson.7.1.0.ymlnvidia-container-runtimeconfigured:python3 -c "import cv2; print(cv2.getBuildInformation())"reportsGStreamer: YES,FFMPEG: YES,CUDA: YESgst-inspect-1.0 nvv4l2decoderresolves inside the container (relies on host CSV mounts)gst-inspect-1.0 nvvidconvresolvesgst-inspect-1.0 nvjpegencresolvespython3 -c "from gi.repository import Gst; Gst.init(None)"succeeds#X CACHED), so the cache mounts didn't need to engage. Layer-cache eviction (a real source change touchingCOPY . .) is what will exercise the cache mounts; not yet observed.