Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行 #38

Closed
magictext opened this issue Jul 27, 2023 · 1 comment

Comments

@magictext
Copy link

magictext commented Jul 27, 2023

请问我尝试使用colab运行示例量化程序,但因内存问题无法启动,这是我的使用的示例程序

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized('FlagAlpha/Llama2-Chinese-13b-Chat-4bit', device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained('FlagAlpha/Llama2-Chinese-13b-Chat-4bit',use_fast=False)
input_ids = tokenizer(['<s>Human: 怎么登上火星\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

colab分配的配置为12.7G内存 T4显卡

@Rayrtfr
Copy link
Collaborator

Rayrtfr commented Jul 28, 2023

12G的显存看起是可以的,报什么错呀, 12G显存卡着耗尽的边缘,你申请一个24G的显存卡试试

@Rayrtfr Rayrtfr closed this as completed Jul 31, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants