-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
24G GPU 炸显存了 #15
Comments
再尝试一遍?刚更新了代码 |
用新代码显存占用: 17150MiB / 32510MiB |
关键是官方也没给量化模型。。。我单独开了个#18,希望官方有看到。。。 |
您好,可能是默认使用了fp32精度导致OOM?可以试试拉取我们的最新代码,然后使用fp16精度来加载模型?方法如下: model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval() |
刚才王鹏提了精度问题,打开fp16是一种。量化部分在README有说,看量化章节,只需要加入quantization_config就行 |
模型重新下载了一边,还是不行啊,难道只能用量化的?。。。。。 |
初始化模型的时候加上这个试试:torch_dtype=torch.float16 |
如果加参数fp=16,像上面的 |
你是不是改config.json文件了?我修改config.json可以复现你这个报错。去hf上下一遍新的代码吧。 |
17250MiB / 32510MiB |
同样报错 ,已经下载了huggingface上最新的config.json |
同样是楼上的错误,也是24G显存,各种方法都用了。不是OOM就是 overflow |
我的config.json |
你把 bf16 改成true可能就能跑了,我刚测试了一下 指定torch_dtype=torch.float16没用,加载的参数还是bf16的。但是奇怪的是v100是不支持bf16的,不知道我这里怎么跑起来的。 |
能否分享一下您正确的config.json |
他这个模型好像只能在bf16下跑,所以要么在config里把fp16设置成false,bf16设置成true,初始化的时候什么都不加,要么两个都设置成false,初始化时加上 torch_dtype=torch.bfloat16 ,我试了一下两种方法都能跑,显存占用都小于20G。 {
"activation": "swiglu",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"QWenLMHeadModel"
],
"auto_map": {
"AutoConfig": "configuration_qwen.QWenConfig",
"AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
},
"attn_pdrop": 0.0,
"bf16": true,
"bias_dropout_fusion": true,
"bos_token_id": 151643,
"embd_pdrop": 0.1,
"eos_token_id": 151643,
"ffn_hidden_size": 22016,
"fp16": false,
"initializer_range": 0.02,
"kv_channels": 128,
"layer_norm_epsilon": 1e-05,
"model_type": "qwen",
"n_embd": 4096,
"n_head": 32,
"n_layer": 32,
"n_positions": 6144,
"no_bias": true,
"onnx_safe": null,
"padded_vocab_size": 151936,
"params_dtype": "torch.bfloat16",
"pos_emb": "rotary",
"resid_pdrop": 0.1,
"rotary_emb_base": 10000,
"rotary_pct": 1.0,
"scale_attn_weights": true,
"seq_length": 2048,
"tie_word_embeddings": false,
"tokenizer_type": "QWenTokenizer",
"transformers_version": "4.31.0",
"use_cache": true,
"use_flash_attn": true,
"vocab_size": 151936,
"use_dynamic_ntk": false,
"use_logn_attn": false
} 用这个应该就能跑 |
拉取最新的仓库 显卡:4090 24G ''' |
确实只能用bf16=True或者量化的,fp32是不行了 |
感谢各位同学的反馈,这个bug是因为float(2**30)超过了fp16的范围,最新代码修复了这个bug。可以再尝试下 |
用你们的DEMO,结果跑不起来,炸显存了,难道只能用量化的吗?
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 23.65 GiB total capacity; 20.85 GiB already allocated; 1.26 GiB free; 20.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered: