-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
high priorityoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
For an internal model (S362716 for meta employee), we find constant folding pass is consuming a lot of memory. Here is what the memory snapshot looks like:
So even if this is very short-lived, the caching allocator will request a lot of memory to cuda, even if those memory can be reused shortly after. Later on, when we use cublas API, cublas is going to request memory, and it's going to fail because all memory are in the caching allocator. It won't be able to empty the cache and the program will fail.
Versions
top of trunk
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @wconstab @bdhirsh @anijain2305
yanboliang
Metadata
Metadata
Assignees
Labels
high priorityoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module