Skip to content

PT2 constant folding is using a lot of memory and caused GPU OOM #108388

@xw285cornell

Description

@xw285cornell

🐛 Describe the bug

For an internal model (S362716 for meta employee), we find constant folding pass is consuming a lot of memory. Here is what the memory snapshot looks like:
Screenshot 2023-08-31 at 7 10 44 PM

So even if this is very short-lived, the caching allocator will request a lot of memory to cuda, even if those memory can be reused shortly after. Later on, when we use cublas API, cublas is going to request memory, and it's going to fail because all memory are in the caching allocator. It won't be able to empty the cache and the program will fail.

Versions

top of trunk

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @wconstab @bdhirsh @anijain2305

Metadata

Metadata

Assignees

Labels

high priorityoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions