Skip to content

perf(vehicle): SCM HIP E2E — direct hits map, buffer reserve, dense body reduce#756

Closed
amd-pratmish wants to merge 3 commits into
projectchrono:mainfrom
amd-pratmish:perf/scm-vehicle-hip-e2e
Closed

perf(vehicle): SCM HIP E2E — direct hits map, buffer reserve, dense body reduce#756
amd-pratmish wants to merge 3 commits into
projectchrono:mainfrom
amd-pratmish:perf/scm-vehicle-hip-e2e

Conversation

@amd-pratmish

@amd-pratmish amd-pratmish commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to the optional SCM HIP backend. Improves end-to-end Chrono performance:

  • Pass ray-cast hits map directly to ComputeContactForcesGpu (no intermediate vector copy)
  • Call scm_gpu::PrimeBuffers() from all SCMTerrain::Initialize overloads
  • Dense per-body force accumulation in SCMTerrainGpu.cpp

Depends on

  • Merge first: #755 (v1 foundation)
  • Incremental diff on fork: amd-pratmish:perf/scm-vehicle-hip-e2e based on feat/scm-vehicle-hip-gfx942

Files in this PR

  • src/chrono_vehicle/terrain/SCMTerrain.h
  • src/chrono_vehicle/terrain/SCMTerrain.cpp
  • src/chrono_vehicle/terrain/SCMTerrainGpu.cpp

Rebuild external scm_gpu from matching amd-chronos ref (see SCM_GPU_EXTERNAL.md in amd-chronos ship kit).

Test plan

  • scm_parity_test --n-hits 65536 PASS
  • CHRONO_SCM_GPU=1 wheel --load — contact timer improved vs v1
  • CHRONO_SCM_GPU_PROFILE=1 — steady-state pack/gpu/scatter logged

### Summary

Adds an **opt-in HIP path** for the SCM Bekker / Mohr-Coulomb / Janosi contact-force loop in `SCMLoader::ComputeInternalForces()`. Ray casting and contact-patch BFS stay on CPU.

- CMake: `CH_ENABLE_VEHICLE_SCM_GPU=ON` + `CHRONO_SCM_GPU_INCLUDE_DIR` / `CHRONO_SCM_GPU_LIB_DIR`
- Runtime: `CHRONO_SCM_GPU=1` (uniform soil, rigid `ChBody` contactables in v1)
- Auto-fallback to CPU when `hits.size() < CHRONO_SCM_GPU_MIN_HITS` (default **8192**)
- Host compiler stays **g++**; HIP device code compiles via CMake HIP language (`CMAKE_HIP_COMPILER` auto-detected). Set `-DCMAKE_HIP_ARCHITECTURES=gfx942` (MI300X) or `gfx90a` (MI210).

### OpenMP → HIP split (v1)

Porting pattern for CPU/OpenMP Chrono modules — only the dense inner loop moves to HIP:

```text
[CPU] Update active domains
[CPU] OpenMP ray cast → hits + patch_oob
[CPU] BFS contact patches
[CPU] Pack batch (OpenMP parallel for)
[GPU] scm_compute_forces_kernel  (Bekker + Mohr-Coulomb + Janosi per hit)
[CPU] Reduce per-body forces → ChLoadBodyForce

Ship kit: amd-chronos contrib/upstream_ready/phase2c/v1
amd-chronos ship-kit and others added 2 commits June 29, 2026 19:16
Upstream-facing playbook (no private integration references).

Co-authored-by: Cursor <cursoragent@cursor.com>
…ody reduce

### Summary

Follow-up to the optional SCM HIP backend. Improves end-to-end Chrono performance:

- Pass ray-cast `hits` map directly to `ComputeContactForcesGpu` (no intermediate vector copy)
- Call `scm_gpu::PrimeBuffers()` from all `SCMTerrain::Initialize` overloads
- Dense per-body force accumulation in `SCMTerrainGpu.cpp`

### Depends on

- **Merged:** `feat(vehicle): optional HIP SCM contact-force backend` (v1)

### Files in this PR

- `src/chrono_vehicle/terrain/SCMTerrain.h`
- `src/chrono_vehicle/terrain/SCMTerrain.cpp`
- `src/chrono_vehicle/terrain/SCMTerrainGpu.cpp`

Rebuild external `scm_gpu` from matching `amd-chronos` ref (see `SCM_GPU_EXTERNAL.md`).

Ship kit: amd-chronos contrib/upstream_ready/phase2c/v2
@amd-pratmish amd-pratmish force-pushed the perf/scm-vehicle-hip-e2e branch from c243acb to da729b6 Compare June 29, 2026 19:16
@amd-pratmish

Copy link
Copy Markdown
Contributor Author

@DanNegrut

@amd-pratmish

Copy link
Copy Markdown
Contributor Author

Superseded by combined SCM HIP PR #755 (maintainer requested single PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants