Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用两张4090微调13b的belle出现oom,单卡则不会 #24

Closed
starphantom666 opened this issue Jun 9, 2023 · 8 comments
Closed

用两张4090微调13b的belle出现oom,单卡则不会 #24

starphantom666 opened this issue Jun 9, 2023 · 8 comments
Labels
solved This problem has been already solved.

Comments

@starphantom666
Copy link

我单卡微调没有出现这个情况,多卡出现了,但是我有一张卡已经被占用了15G显存,还剩8g左右,相当于我是8+24g进行多卡微调,这样微调会确实会出现问题?还是我没配置好的问题?

@hiyouga
Copy link
Owner

hiyouga commented Jun 9, 2023

多卡需要每张卡都有 24G 内存。

@starphantom666
Copy link
Author

多卡需要每张卡都有 24G 内存。

Dalao,单卡我这里也有个问题,13b的模型我4bit量化,输入512输出512怎么微调也OOM o(╥﹏╥)o

@hiyouga
Copy link
Owner

hiyouga commented Jun 9, 2023

GPU 的空闲显存有多少?
从 512 减少到 256 试试呢?

@starphantom666
Copy link
Author

starphantom666 commented Jun 9, 2023

GPU 的空闲显存有多少? 从 512 减少到 256 试试呢?

降低了可以。还有个问题

from transformers import LlamaForCausalLM, AutoTokenizer
import torch

ckpt = './bloom_13b/'
device = torch.device('cuda')
model = LlamaForCausalLM.from_pretrained(ckpt, device_map={"":0},load_in_8bit=True, low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(ckpt)
model.eval()
prompt = "XXXXXXXXXXXXXXXXXXXXXXXXXXX"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
generate_ids = model.generate(input_ids, max_new_tokens=500, do_sample = False, repetition_penalty=1., eos_token_id=2, bos_token_id=1, pad_token_id=0)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
response = output[len(prompt):]
print(response)

官方的示例对话的结果,对比我用项目里的web demo结果不一致(均为8bit do_sample=False),而且web生成的效果比我用上述官方示例的效果差很多,这是怎么回事?

web demo的参数我在后台都改了,如下
gen_kwargs = {
"input_ids": input_ids,
"do_sample": False,
"top_p": 0.01,
"temperature": 0.99,
"num_beams": 1,
# "max_length": max_length,
"max_new_tokens":500,
"repetition_penalty": 1.0,
"logits_processor": get_logits_processor(),
"streamer": streamer,
"eos_token_id":2,
"bos_token_id":1,
"pad_token_id":0
}

@starphantom666
Copy link
Author

web demo的回答惜字如金。。

@starphantom666
Copy link
Author

web demo的回答惜字如金。。

已经解决,原来在代码里面,会自动包装问题,导致结果和官方示例不一致,而且还导致回答惜字如金

@hiyouga hiyouga added the solved This problem has been already solved. label Jun 12, 2023
@dengfenglai321
Copy link

web demo的回答惜字如金。。

已经解决,原来在代码里面,会自动包装问题,导致结果和官方示例不一致,而且还导致回答惜字如金

你好,请问怎么修改解决保证该项目与官方回答基本一致?

@hiyouga
Copy link
Owner

hiyouga commented Jul 17, 2023

@yumulinfeng1 使用指令微调后的模型时候应该在命令行参数中加入 --prompt_template 参数

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

3 participants