CANN: fix CPU memory leak in CANN backend #16549

noemotiovon · 2025-10-13T02:16:07Z

This commit fixes a CPU-side memory leak issue in the CANN backend, which occurred when intermediate aclTensorList objects were not properly released after operator execution. The leak happened during repeated invocations of CANN ops (e.g., FlashAttention), leading to increasing host memory usage over time.

Proper resource cleanup (aclDestroyTensorList and related release logic) has been added to ensure that all temporary tensors are correctly freed.

Make sure to read the contributing guidelines before submitting a PR

This commit fixes a CPU-side memory leak issue in the CANN backend, which occurred when intermediate aclTensorList objects were not properly released after operator execution. The leak happened during repeated invocations of CANN ops (e.g., FlashAttention), leading to increasing host memory usage over time. Proper resource cleanup (aclDestroyTensorList and related release logic) has been added to ensure that all temporary tensors are correctly freed.

noemotiovon · 2025-10-13T02:21:39Z

I started llama-serve and continuously sent requests. The observed CPU memory usage is as follows:

2025-10-11 12:24:08,8607006232,1744188
......
2025-10-12 10:32:13,8607031400,1771672
......
2025-10-13 01:17:50,8607038132,1778800

hipudding

Thank you for your fix. I think aclTensor should be designed using RAII to avoid missing releases.

* origin/master: (32 commits) metal : FA support F32 K and V and head size = 32 (ggml-org#16531) graph : support cacheless embeddings with FA and iSWA (ggml-org#16528) opencl: fix build targeting CL 2 (ggml-org#16554) CUDA: fix numerical issues in tile FA kernel (ggml-org#16540) ggml : fix build broken with -march=armv9-a on MacOS (ggml-org#16520) CANN: fix CPU memory leak in CANN backend (ggml-org#16549) fix: add remark plugin to render raw HTML as literal text (ggml-org#16505) metal: add support for opt_step_sgd (ggml-org#16539) ggml : fix scalar path for computing norm (ggml-org#16558) CANN: Update several operators to support FP16 data format (ggml-org#16251) metal : add opt_step_adamw and op_sum (ggml-org#16529) webui: remove client-side context pre-check and rely on backend for limits (ggml-org#16506) [SYCL] fix UT fault cases: count-equal, argsort, pad OPs (ggml-org#16521) ci : add Vulkan on Ubuntu with default packages build (ggml-org#16532) common : handle unicode during partial json parsing (ggml-org#16526) common : update presets (ggml-org#16504) ggml : Fix FP16 ELU positive branch (ggml-org#16519) hparams : add check for layer index in is_recurrent (ggml-org#16511) ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (ggml-org#16518) CUDA: faster tile FA, add oob checks, more HSs (ggml-org#16492) ...

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Oct 13, 2025

hipudding approved these changes Oct 13, 2025

View reviewed changes

hipudding merged commit 56fc38b into ggml-org:master Oct 13, 2025
70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CANN: fix CPU memory leak in CANN backend #16549

CANN: fix CPU memory leak in CANN backend #16549

noemotiovon commented Oct 13, 2025

Uh oh!

noemotiovon commented Oct 13, 2025

Uh oh!

hipudding left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CANN: fix CPU memory leak in CANN backend #16549

CANN: fix CPU memory leak in CANN backend #16549

Conversation

noemotiovon commented Oct 13, 2025

Uh oh!

noemotiovon commented Oct 13, 2025

Uh oh!

hipudding left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants