Skip to content

Card 0 OOM during block inference #988

@xin3he

Description

@xin3he

With this PR, #981
got OOM when quantizing block 11/61 for DeepSeek on CUDA 3x 80GB cards
got OOM when quantizing block 14/80 for llama 70b on Intel GPU 3x 24GB cards

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions