Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caught amgx exception: Cannot allocate pinned memory #313

Open
AnjaliSandip opened this issue Jun 14, 2024 · 3 comments
Open

Caught amgx exception: Cannot allocate pinned memory #313

AnjaliSandip opened this issue Jun 14, 2024 · 3 comments
Labels

Comments

@AnjaliSandip
Copy link

I am using amgx with PETSc (-pc_type amgx) to run multiphysics simulations. I am encountering this error even after having scaled down the problem size significantly.

Caught amgx exception: Cannot allocate pinned memory

I have attached the output and error log files for your reference. Thank you for any feedback you can provide.

outlog.docx
errlog.docx

Environment information:

  • MINT OS
  • CUDA runtime 12.3
  • OpenMPI 4
  • AMGX version 2.4
  • CUDA driver 12.4
  • NVIDIA V100
@AnjaliSandip AnjaliSandip changed the title [AMGX ] [Caught amgx exception: Cannot allocate pinned memory] Jun 14, 2024
@AnjaliSandip AnjaliSandip changed the title [Caught amgx exception: Cannot allocate pinned memory] Caught amgx exception: Cannot allocate pinned memory Jun 14, 2024
@marsaev
Copy link
Collaborator

marsaev commented Jun 15, 2024

@AnjaliSandip
It seems error indicates that pinned memory pool cannot be allocated:

Caught amgx exception: Cannot allocate pinned memory
 at: /home/anjali.sandip/ISSM/ISSM/externalpackages/petsc/src/arch-linux-c-opt/externalpackages/git.amgx/src/global_thread_handle.cu:374

It's size is currently fixed to 100 MB: https://github.com/NVIDIA/AMGX/blob/v2.4.0/src/global_thread_handle.cu#L51 regardless of the input data ( and this allocation happens during resources creation at which point we don't know problem size)

Is your process allowed to allocate page-locked memory? (i.e. for docker containers you have to provide respective ulimit flag, i.e.: --ulimit memlock=-1)

@AnjaliSandip
Copy link
Author

AnjaliSandip commented Jun 17, 2024 via email

@marsaev
Copy link
Collaborator

marsaev commented Jul 2, 2024

@AnjaliSandip sorry for the delayed reply.
I'm not familiar with PETSc internals, but unless PETSc environment somehow hooks cudaMallocHost, it's settings shouldn't affect AMGX, since AMGX using a call directly to CUDA Runtime: https://github.com/NVIDIA/AMGX/blob/v2.4.0/src/global_thread_handle.cu#L378

You can try running an example that tries to allocate same amount of pinned memory to see if it's environment issue, something like this: https://godbolt.org/z/7ab86qc34

If there is no obvious/easy fix to page locked memory, I would suggest opening a ticket for PETSc (https://gitlab.com/petsc/petsc/-/issues), as they are more knowledgeable about PETSc details that might be important here. You can link this issue for the reference and i can follow up in the case there would be any further questions to AMGX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants