PT2 constant folding is using a lot of memory and caused GPU OOM

### 🐛 Describe the bug

For an internal model (S362716 for meta employee), we find constant folding pass is consuming a lot of memory. Here is what the memory snapshot looks like: 
<img width="1245" alt="Screenshot 2023-08-31 at 7 10 44 PM" src="https://github.com/pytorch/pytorch/assets/7795712/9d994742-77d1-4d96-983d-c5a1d1b0c05d">

So even if this is very short-lived, the caching allocator will request a lot of memory to cuda, even if those memory can be reused shortly after. Later on, when we use cublas API, cublas is going to request memory, and it's going to fail because all memory are in the caching allocator. It won't be able to empty the cache and the program will fail. 

### Versions

top of trunk

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @wconstab @bdhirsh @anijain2305

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PT2 constant folding is using a lot of memory and caused GPU OOM #108388

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PT2 constant folding is using a lot of memory and caused GPU OOM #108388

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions