This repository has been archived by the owner on Feb 7, 2023. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: The memory pool implementation was written back in the days when I only had one GPU, and as a result I overlooked the fact that: (1) CNMEM needs to have the same current device for the allocation and deallocation to take place correctly. (2) cub needs the device id of the pointer passed in for proper deallocation. As a result, since C2 right now switches contexts very frequently, I added a global map to keep record of the pointer affiliations, and use that for deallocation when we are at another context. I have not tested the speed but assuming that std::unordered_map is not too bad this should be fairly fast. Differential Revision: D4617300 fbshipit-source-id: e8bb366616cd93504e7d68b7f999011cd49caba5
- Loading branch information