FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行 #38

magictext · 2023-07-27T13:08:00Z

请问我尝试使用colab运行示例量化程序，但因内存问题无法启动，这是我的使用的示例程序

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model = AutoGPTQForCausalLM.from_quantized('FlagAlpha/Llama2-Chinese-13b-Chat-4bit', device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained('FlagAlpha/Llama2-Chinese-13b-Chat-4bit',use_fast=False)
input_ids = tokenizer(['<s>Human: 怎么登上火星\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')        
generate_input = {
    "input_ids":input_ids,
    "max_new_tokens":512,
    "do_sample":True,
    "top_k":50,
    "top_p":0.95,
    "temperature":0.3,
    "repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id
}
generate_ids  = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)

colab分配的配置为12.7G内存 T4显卡

The text was updated successfully, but these errors were encountered:

Rayrtfr · 2023-07-28T00:46:57Z

12G的显存看起是可以的，报什么错呀, 12G显存卡着耗尽的边缘，你申请一个24G的显存卡试试

Rayrtfr closed this as completed Jul 31, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行 #38

FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行 #38

magictext commented Jul 27, 2023 •

edited

Loading

Rayrtfr commented Jul 28, 2023 •

edited

Loading

FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行 #38

FlagAlpha/Llama2-Chinese-13b-Chat-4bit模型需要什么配置可以运行 #38

Comments

magictext commented Jul 27, 2023 • edited Loading

Rayrtfr commented Jul 28, 2023 • edited Loading

magictext commented Jul 27, 2023 •

edited

Loading

Rayrtfr commented Jul 28, 2023 •

edited

Loading