Skip to content

Add GPU selection constructor overloads and migrate ownership to unique_ptr#59

Merged
Ruochun merged 6 commits intomainfrom
copilot/update-demsovler-gpu-selection
Mar 4, 2026
Merged

Add GPU selection constructor overloads and migrate ownership to unique_ptr#59
Ruochun merged 6 commits intomainfrom
copilot/update-demsovler-gpu-selection

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 2, 2026

Two improvements to DEMSolver: controllable GPU selection at construction time, and exception-safe ownership of internal worker objects via std::unique_ptr.

GPU selection flavors

Flavor 1 – default (existing behavior, now explicit): scans available GPUs, uses at most 2.

DEMSolver sim;         // or DEMSolver sim(2);

Flavor 2 – integer count: warns if nGPUs > 2, falls back to 1 GPU (both threads share device 0) when fewer than requested are available.

DEMSolver sim(1);      // both kT/dT on device 0
DEMSolver sim(4);      // warning: only 2 used

Flavor 3 – explicit device IDs:

DEMSolver sim({0, 1}); // kT on device 0, dT on device 1
DEMSolver sim({2});    // both threads on device 2
DEMSolver sim({0, 1, 2}); // warning: first 2 used
DEMSolver sim({5});    // throws if device 5 unavailable

Changes

  • GpuManager: new GpuManager(const std::vector<int>& device_ids) constructor; duplicate IDs intentionally produce multiple streams on one device (single-GPU mode). scanNumDevices() made static for pre-construction use.
  • DEMSolver: new DEMSolver(std::vector<int>) overload; integer constructor now validates/warns instead of silently ignoring nGPUs.
  • unique_ptr migration: kT, dT, kTMain_InteractionManager, dTMain_InteractionManager, dTkT_InteractionManager, dTkT_GpuManager are now std::unique_ptr. Destructor no longer requires manual delete calls; members are destroyed in safe reverse-declaration order (workers before managers).
  • packTransferPointers signature changed from T*& to T* (pointer-by-value); no semantic change, required for unique_ptr compatibility at call sites.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: Ruochun <24469442+Ruochun@users.noreply.github.com>
Copilot AI changed the title [WIP] Update DEMSolver class to allow GPU selection and management Add GPU selection constructor overloads and migrate ownership to unique_ptr Mar 2, 2026
@Ruochun
Copy link
Copy Markdown
Collaborator

Ruochun commented Mar 3, 2026

@copilot With the introduction of unique_ptr, I suspect we may introduced a memory leak potential, specifically at the destruction of the DEMSolver class. Can you have a check and figure out the possible cause?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 3, 2026

@copilot With the introduction of unique_ptr, I suspect we may introduced a memory leak potential, specifically at the destruction of the DEMSolver class. Can you have a check and fi...

The risk was real in the initial commit (c16d5f1): kT was declared before dT in the unique_ptr member list, meaning reverse-destruction would destroy dT first — but kT's worker thread directly accesses dT via kT->dT (e.g. transferArraysResize, sendToTheirBuffer in kT.cpp). A still-running kT thread accessing freed dT memory would be a use-after-free. The managers were also originally declared after kT/dT, which would have freed pSchedSupport/pPagerToMain before the threads were joined.

This was already caught and corrected in commit 1101464 ("Reorder unique_ptrs for extra safety"): kT and dT are now declared last so they're destroyed first (kT → dT), and the managers are declared first so they outlive both workers. The resulting destruction order precisely mirrors the original explicit delete sequence:

  1. kT destroyed (thread joined) — kT accesses dT/managers, both still alive ✓
  2. dT destroyed (thread exits immediately via dynamicShouldJoin check, never touches kT) ✓
    3–6. managers destroyed after all threads have exited ✓

No further changes are needed; the current code is safe.

@Ruochun Ruochun marked this pull request as ready for review March 4, 2026 16:39
@Ruochun Ruochun merged commit 77b225a into main Mar 4, 2026
@Ruochun Ruochun deleted the copilot/update-demsovler-gpu-selection branch March 4, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants