Skip to content

Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV#2321

Open
alexnorell wants to merge 11 commits into
mainfrom
fix/jp62-opencv-gstreamer-nvidia
Open

Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV#2321
alexnorell wants to merge 11 commits into
mainfrom
fix/jp62-opencv-gstreamer-nvidia

Conversation

@alexnorell
Copy link
Copy Markdown
Contributor

@alexnorell alexnorell commented May 11, 2026

Summary

  • JP62: build OpenCV 4.13.0 from source with WITH_GSTREAMER=ON + CUDA, and add the runtime GStreamer stack (python3-gi, GIR typelibs, base plugins) so downstream code can build hardware capture/encode pipelines. Fixes the 4K USB camera workflow that was running at ~1.3 inference FPS against ~21 camera FPS — pre-PR cv2 was CPU-only and cv2.getBuildInformation() reported GStreamer: NO.
  • JP62: drop the attempt to bake libgstnv*.so into the image. Those plugins live on the host JetPack BSP and are mounted in at runtime by nvidia-container-runtime via the CSVs under /etc/nvidia-container-runtime/host-files-for-container.d/. Container now ships only the cv2-facing half of the stack; the host provides the NVIDIA plugins.
  • JP62: fix libnvdla_compiler.so source path (tegra/ -> nvidia/). The original glob in Fix missing libnvdla_compiler.so in Jetson 6.x TRT runtime #2201 silently matched zero files, so onnxruntime kept falling back to CPU on Jetson 6.x. Same one-liner as Fix libnvdla_compiler.so COPY path for Jetson 6.x runtime #2306 for jp61.
  • JP62 + JP71: add BuildKit cache mounts to both Jetson dockerfiles — neither had any cache mounts before this PR, so warm rebuilds were doing the full 30-60 min OpenCV / PyTorch / torchvision / onnxruntime recompiles every run. New mounts cover apt plus per-build-tree directories for each heavy C++ compile (and the TRT .deb download on JP62). Each Jetson workflow also moves onto its own Depot project so the JP62 and JP71 builds stop evicting each other's NVMe state.
  • JP51: OpenCV 4.12.0 -> 4.13.0 to stay aligned with JP62.
  • All workflows: bump every GitHub Action in .github/workflows/ to its latest major (59 files touched). Done in one mechanical pass while we were already poking the Jetson workflows. See "Action version bumps" below.

Why cv2 was the bottleneck (JP62)

cv2.getBuildInformation() in the pre-PR JP62 image reported GStreamer: NO. The pip opencv-python wheel ships without GStreamer, so every frame went through YUYV uncompressed capture, CPU YUV-to-BGR, and the raw 4K stream got fed downstream to detection + visualization.

Even if cv2 had been compiled with GStreamer support, the runtime base l4t-cuda:12.6.11-runtime doesn't ship the JetPack multimedia stack, so the NVIDIA elements (nvv4l2decoder, nvvidconv, nvjpegenc/dec) wouldn't have been reachable anyway. That's why the fix is two-sided: build cv2 with GStreamer, and rely on the host BSP for the NVIDIA plugins.

JP71 ships the same pip cv2 wheel and would hit the same wall at 4K, but the GStreamer rebuild isn't ported there yet — only the build-perf and workflow changes apply to JP71 in this PR.

OpenCV build details (Dockerfile.onnx.jetson.6.2.0)

Builder:

  • OpenCV 4.13.0 + opencv_contrib from source, WITH_GSTREAMER=ON, WITH_FFMPEG=ON, WITH_LIBV4L=ON, WITH_CUDA=ON, CUDA_ARCH_BIN=8.7. Pip's opencv-python / -headless / -contrib-python are uninstalled first; the wheel built from python_loader/ is installed in their place.
  • Build-time cv2.getBuildInformation() assertion so any regression in detected flags fails the docker build, not the device.
  • Per-platform OpenCV values pulled into ARGs (OPENCV_CUDA_ARCH, OPENCV_PYTHON_INSTALL_PATH, OPENCV_PYTHON_INCLUDE_DIR, OPENCV_PYTHON_VERSION) so porting this block to jp5.1.1 / jp7.1.0 is four ARG edits rather than cmake surgery.
  • cv2 install-tree config rescue: ninja install writes correct-path config*.py, then pip install of the python_loader wheel overwrites them with BUILD-TREE paths that don't exist in the runtime stage and produce ImportError: recursion is detected during loading of "cv2". The fix snapshots the install-tree configs around the pip install and restores them after. A follow-up sanity check strips /build/ from sys.path and imports cv2 so this fails the docker build instead of the device.

Runtime stage adds libgstreamer1.0-0, gstreamer1.0-plugins-{base,good,tools}, python3-gi, gir1.2-gstreamer-1.0, gir1.2-gst-plugins-base-1.0, plus an inline comment explaining where the NVIDIA plugins come from.

Build perf: cache mounts added (Dockerfile.onnx.jetson.6.2.0 + Dockerfile.onnx.jetson.7.1.0)

Neither Jetson dockerfile had BuildKit cache mounts before this PR. Adding them on both:

  • # syntax=docker/dockerfile:1.7 pinned so ARG interpolation in cache-mount id= works.
  • Apt cache mounts (/var/cache/apt + /var/lib/apt, sharing=locked) on every apt RUN, ids scoped per JetPack x (builder|runtime) x arch. docker-clean disabled and Keep-Downloaded-Packages "true" so the cache is actually used. The trailing rm -rf /var/lib/apt/lists/* came out since that path is now the cache mount.
  • Per-build-tree cache mounts on PyTorch, torchvision, onnxruntime (build/cuda12 on JP62, build/Linux on JP71), OpenCV (release/ on JP62), and the TRT .deb download on JP62. Each id is scoped by version + arch + SM so a version bump starts clean.
  • git clone is now a separate RUN from the cache-mounted build RUN. The cache mount auto-creates its parent before the body executes, so a combined git clone && build hit destination path 'pytorch' already exists. Two RUNs now: clone (no mount), then build (cache mount overlays build/).
  • OpenCV stays last in the JP62 builder on purpose. The upstream inference wheel installs aren't --no-deps, so moving OpenCV earlier risks opencv-python being reinstalled by a transitive dep and clobbering the from-source build. The build-dir cache mount makes the forced rebuild after COPY . . invalidation a fast ninja no-op rather than a 30-60 min recompile.

Workflows

.github/workflows/docker.jetson.6.2.0.yml and .github/workflows/docker.jetson.7.1.0.yml now point at their own Depot projects (JP62 2rp7mfjw7q, JP71 v1xzfwkc4b) instead of sharing grl7ffzxd7. Sharing one project meant the two builds were evicting each other's NVMe layers and cache-mount state on every run.

Action version bumps

Everything in .github/workflows/ got bumped to its latest major in a single pass:

Action Old New
actions/cache v3 v5
actions/checkout v2 / v3 / v4 v6
actions/create-github-app-token v1 v3
actions/setup-node v3 v6
actions/setup-python v2 / v5 v6
actions/upload-artifact v4 v7
aws-actions/amazon-ecr-login v1 v2
aws-actions/configure-aws-credentials v2 v6
dcarbone/install-jq-action v2.1.0 v3.2.0
digicert/ssm-code-signing v1.0.0 v1.2.1
docker/login-action v2 / v3 v4
docker/setup-buildx-action v2 v4
docker/setup-qemu-action v2 v4
google-github-actions/auth v2 v3
google-github-actions/setup-gcloud v2 v3

Left as-is:

  • depot/setup-action@v1 and depot/build-push-action@v1 already track the floating major (latest is v1.7.1 / v1.17.0 within v1).
  • actions/upload-release-asset@v1 — still latest major; the upstream is archived but the v1 release is what it is.
  • pypa/gh-action-pypi-publish@release/v1 — upstream's recommended stable-channel ref, kept verbatim.

Most bumps are pure Node-version updates (Node 20 -> 24) and won't change behaviour. The two worth eyeballing in CI before merge are actions/upload-artifact (v4 -> v7 is a multi-major jump, though we only use it with name / path / if-no-files-found / retention-days which have been stable) and aws-actions/configure-aws-credentials (v2 -> v6, used with aws-region + access-key inputs which are also stable).

Test plan

  • JP62 build succeeds and pushes a tag: gh workflow run docker.jetson.6.2.0.yml --ref fix/jp62-opencv-gstreamer-nvidia -f force_push=true -f custom_tag=jp62-gstreamer-nvidia
  • JP71 build succeeds: same dispatch against docker.jetson.7.1.0.yml
  • On a JP62 device with nvidia-container-runtime configured: python3 -c "import cv2; print(cv2.getBuildInformation())" reports GStreamer: YES, FFMPEG: YES, CUDA: YES
  • gst-inspect-1.0 nvv4l2decoder resolves inside the container (relies on host CSV mounts)
  • gst-inspect-1.0 nvvidconv resolves
  • gst-inspect-1.0 nvjpegenc resolves
  • python3 -c "from gi.repository import Gst; Gst.init(None)" succeeds
  • Inference FPS on the 4K USB camera + detection workflow comfortably above the 1.3 FPS baseline
  • Second JP62 build (warm cache) skips most apt downloads and finishes the OpenCV step in seconds rather than tens of minutes — actually came back even better: 2 min total, every step hit Depot's docker layer cache (#X CACHED), so the cache mounts didn't need to engage. Layer-cache eviction (a real source change touching COPY . .) is what will exercise the cache mounts; not yet observed.
  • Second JP71 build (warm cache) hits cached PyTorch / torchvision / onnxruntime build trees — same outcome as JP62: 3 min total, served from Depot layer cache.

@alexnorell alexnorell force-pushed the fix/jp62-opencv-gstreamer-nvidia branch from f8e245f to 9ea044b Compare May 11, 2026 14:53
…o JP62

The pip opencv-python wheel ships without GStreamer, and the l4t-cuda runtime
base used for the JP62 image has no multimedia stack, so cv2.VideoCapture on
Jetson silently falls through to the plain v4l2 ioctl path -- pulling raw 4K
YUYV, doing CPU YUV->BGR, and never reaching NVIDIA's hardware engines.

Three pieces, all needed together:

1. Compile OpenCV from source in the builder stage with WITH_GSTREAMER=ON,
   WITH_FFMPEG=ON, WITH_LIBV4L=ON, WITH_CUDA=ON. Replaces the pip-installed
   opencv-python/opencv-contrib-python with a wheel built from python_loader.
   Adds a build-time cv2.getBuildInformation() check so a regression in the
   detected flags fails the build instead of silently shipping CPU-only cv2.

2. Add the GStreamer runtime + PyGObject stack to the runtime stage
   (libgstreamer1.0-0, plugins-base/good, python3-gi, gir typelibs) so that
   downstream Python code using "from gi.repository import Gst" to construct
   hardware capture/encode pipelines can import and run inside the container.

3. Copy NVIDIA's GStreamer plugins (libgstnv*.so) plus the tegra runtime libs
   from the JetPack builder stage into the runtime image, instead of relying
   on nvidia-container-passthrough host mounts. Self-contained -- works
   without any host-side changes deployed.

Branched off v1.2.7 to keep the diff minimal.
@alexnorell alexnorell force-pushed the fix/jp62-opencv-gstreamer-nvidia branch from 9ea044b to 1f6f621 Compare May 11, 2026 15:02
CMake 4.x removed compatibility with cmake_minimum_required(VERSION <3.5).
OpenCV 4.10.0's OpenCVGenPkgconfig.cmake still uses the old declaration,
so the configure step aborts with:

  CMake Error at .../OpenCVGenPkgconfig.cmake:113 (cmake_minimum_required):
    Compatibility with CMake < 3.5 has been removed from CMake.

Same workaround the existing onnxruntime build step uses.
OpenCV 4.13.0 (released 2025-12-31) is the current latest stable; it builds
cleanly under CMake 4.x without the cmake_minimum_required workaround the
4.10.0 source needed. Aligns both Jetson images.

JP62-specific:
- Move the from-source OpenCV step to AFTER the inference packages install
  so the cv2 we install (built with GStreamer + FFmpeg) is the last thing
  to touch /usr/local/lib/python3.10/dist-packages/cv2/. Previously the
  inference_core/cli/sdk pip install would refetch opencv-python (as a
  transitive dep) and clobber our from-source cv2.
- Drop CMAKE_POLICY_VERSION_MINIMUM=3.5 -- not needed on 4.13.0.
The l4t-jetpack:r36.4.0 builder image does not ship the tegra runtime
libs as a regular populated /usr/lib/aarch64-linux-gnu/tegra directory.
Those libs are tied to the host JetPack BSP and only appear inside the
container at run time via nvidia-container-runtime's CSV-driven host
mounts. So `COPY --from=builder /usr/lib/aarch64-linux-gnu/tegra ...`
errors during the build with "not found".

This image now scopes to enabling the cv2 side of the GStreamer stack
(WITH_GSTREAMER=ON OpenCV, python-gi, base gstreamer runtime + GIR
typelibs). The host system has to provide the NVIDIA plugins for any
nv* pipeline element to resolve at run time.
BuildKit needs to lstat the source parent dir before evaluating a glob
COPY pattern, even when "no match" is allowed. The libnvdla fix in
PR #2201 copies tegra/libnvdla_compiler.so*, but l4t-jetpack:r36.4.0
doesn't always ship /usr/lib/aarch64-linux-gnu/tegra/ as a real
directory -- that path is normally populated by nvidia-container-runtime
at container start. Recent builds intermittently fail at this step
depending on layer cache state.

mkdir -p the path so BuildKit can process the COPY. The glob will copy
the real file if it's there, or silently no-op if not. Doesn't paper
over the underlying flakiness of the libnvdla source path, but lets
this PR's build progress.
The actual location in nvcr.io/nvidia/l4t-jetpack:r36.4.0 is
/usr/lib/aarch64-linux-gnu/nvidia/libnvdla_compiler.so, not tegra/.
The original COPY in PR #2201 used the wrong path; the glob silently
matched zero files, so the libnvdla fix shipped an empty layer and
ONNX Runtime kept falling back to CPU on Jetson 6.x.

Same one-line correction as PR #2306. Bundling it here so this PR's
build can also actually ship libnvdla_compiler.so.
Without OPENCV_PYTHON3_INSTALL_PATH / PYTHON3_INCLUDE_DIR / PYTHON_VERSION
the python_loader config.py bakes the BUILDER path
(/build/opencv/<ver>/release/lib/python3) into sys.path. That path only
exists in the builder stage, so in the runtime image the loader stub
fails to find the native cv2 .so, falls back to importlib.import_module("cv2")
which re-enters the same stub package, and Python raises:

  ImportError: ERROR: recursion is detected during loading of "cv2"
               binary extensions. Check OpenCV installation.

JP51's PR #2100 passes these flags explicitly; replicating the same set
here so the loader references /usr/local/lib/python3.10/dist-packages.
Two related changes to make the from-source OpenCV build correct and
reusable across jp5.1 / jp6.2 / jp7.1:

1. Rescue the install-tree config files. After ninja install, the files
   at ${INSTALL_PATH}/cv2/config*.py have install-tree paths (correct).
   The pip install of the wheel built from /build/.../python_loader then
   overwrites them with BUILD-TREE config files (paths like
   /build/opencv/opencv-X.Y.Z/release/lib/python3) which only exist in
   the builder stage. In a multi-stage runtime image those paths are
   dead, so the loader's bootstrap can't find the native cv2 .so,
   re-imports the stub package, and Python raises:
       ImportError: recursion is detected during loading of "cv2"
   Fix by saving config*.py between ninja install and pip wheel/install,
   restoring after.

2. Pull per-platform variables out of the cmake invocation into ARGs at
   the top of the builder (OPENCV_CUDA_ARCH, OPENCV_PYTHON_INSTALL_PATH,
   OPENCV_PYTHON_INCLUDE_DIR, OPENCV_PYTHON_VERSION). Replicating this
   OpenCV block in jp5.1.1 / jp7.1.0 dockerfiles is now a copy-paste
   plus four ARG value changes; no surgery inside the cmake invocation.

Adds a build-time sanity check that simulates the runtime stage by
stripping /build/ from sys.path before importing cv2. If the install-
tree rescue ever regresses, the docker build fails here -- not 90
minutes later on the device.
@alexnorell alexnorell force-pushed the fix/jp62-opencv-gstreamer-nvidia branch from 119562a to a781139 Compare May 11, 2026 20:05
Pins syntax to dockerfile:1.7 on both jp62 and jp71 so ARG interpolation
in cache-mount ids works. Adds apt cache mounts (sharing=locked, id scoped
per JetPack + builder/runtime + arch) to every apt RUN so .debs and lists
persist across builds. Disables docker-clean and turns on
Keep-Downloaded-Packages so the apt cache is actually used. Drops the
trailing rm of /var/lib/apt/lists/* since that path is now the cache mount.

Adds per-build-tree cache mounts on the expensive C++ compiles:
PyTorch, torchvision, onnxruntime (build/cuda12 on jp62, build/Linux on
jp71), OpenCV (release/ on jp62), with the TRT .deb download also cached
on jp62. Each id is scoped by version + arch + SM so a version bump
starts clean instead of reusing a stale build tree.

The OpenCV step on jp62 stays positioned last in the builder because the
inference wheel installs upstream are not --no-deps and an earlier
opencv-python pull would clobber the from-source build. The build-dir
cache mount makes the forced rebuild after COPY-invalidation a fast
ninja no-op rather than a 30-60 min recompile.

Points jp62 and jp71 workflows at the new dedicated Depot projects
(2rp7mfjw7q for jp62, v1xzfwkc4b for jp71) so they no longer share an
NVMe cache pool and evict each other.
The cache mount on build/ auto-creates its parent directory before the
RUN body executes, so a combined 'git clone ... && cd ... && make' fails
with 'destination path already exists and is not an empty directory'.

First jp62 attempt with these cache mounts failed at the PyTorch step:
  fatal: destination path 'pytorch' already exists and is not an empty
  directory.

Split each of PyTorch / torchvision / onnxruntime on jp62 and jp71 into
two RUNs: a clone RUN with no mount, then a build RUN whose cache mount
overlays build/ on the cloned source layer.
@alexnorell alexnorell changed the title [DRAFT] JP62: build OpenCV from source with GStreamer + bake NVIDIA gst plugins [DRAFT] JP62: build OpenCV from source with GStreamer + build perf May 12, 2026
@alexnorell alexnorell marked this pull request as ready for review May 12, 2026 05:16
@alexnorell alexnorell changed the title [DRAFT] JP62: build OpenCV from source with GStreamer + build perf JP62: build OpenCV from source with GStreamer + build perf May 12, 2026
@alexnorell alexnorell changed the title JP62: build OpenCV from source with GStreamer + build perf Jetson dockerfiles: OpenCV with GStreamer on JP62, add BuildKit cache mounts on JP62/JP71, bump JP51 OpenCV May 12, 2026
Bumps:
- actions/cache v3 -> v5
- actions/checkout v2/v3/v4 -> v6
- actions/create-github-app-token v1 -> v3
- actions/setup-node v3 -> v6
- actions/setup-python v2/v5 -> v6
- actions/upload-artifact v4 -> v7
- aws-actions/amazon-ecr-login v1 -> v2
- aws-actions/configure-aws-credentials v2 -> v6
- dcarbone/install-jq-action v2.1.0 -> v3.2.0
- digicert/ssm-code-signing v1.0.0 -> v1.2.1
- docker/login-action v2/v3 -> v4
- docker/setup-buildx-action v2 -> v4
- docker/setup-qemu-action v2 -> v4
- google-github-actions/auth v2 -> v3
- google-github-actions/setup-gcloud v2 -> v3

depot/* already pinned to floating v1 (latest major).
actions/upload-release-asset and pypa/gh-action-pypi-publish kept
on their existing pins (v1 already latest; pypi-publish uses the
release/v1 stable-channel ref recommended by upstream).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants