Skip to content

Support CUDA pinned memory in DataLoader #139

@colesbury

Description

@colesbury

CUDA pinned memory is important for efficient execution because it allows for faster data transfers and non-blocking CUDA copies.

The copy from normal memory to pinned memory can take significant time. A batch of 256x3x224x224 FloatTensor takes about 110ms on my computer to copy. Currently we can only do the copy on the main process because inter-process shared Tensor/Storages are copied to non-page locked shared memory. For small conv nets on fast GPUs, we probably need to do the copy in the background.

I believe we can page-lock the shared memory via cudaHostRegister. We would probably need to unregister it via cudaHostUnregister before freeing the memory.

This would require some knowledge of CUDA in the shared memory code or at least a free hooks to call cudaHostUnregister.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions