## 使用微调后的 LLaMA2-7B 推理

In [10]:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

# 指定模型路径
model_path = "/root/autodl-tmp/LLM-quickstart-main/llama/models/llama-7-int4-dolly-20250829_224013"  # 替换为你的输出目录

# 加载基础模型（需与训练时一致）
base_model = AutoModelForCausalLM.from_pretrained(
    "/root/autodl-tmp/Llama",  # 原始Llama路径
    device_map="auto",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16
    )
)

# 加载微调后的LoRA适配器
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, model_path)
model = model.merge_and_unload()  # 合并适配器到基础模型（可选）



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [11]:
from datasets import load_dataset 
from random import randrange
 
 
# 从hub加载数据集并得到一个样本
dataset = load_dataset(
    "json",
    data_files="/root/autodl-tmp/dir/databricks-dolly-15k.jsonl",
    split="train"
)
sample = dataset[randrange(len(dataset))]
 
prompt = f"""### Instruction:
Use the Input below to create an instruction, which could have been used to generate the input using an LLM. 
 
### Input:
{sample['response']}
 
### Response:
"""
 
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()

outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.9)

print(f"Prompt:\n{sample['response']}\n")
print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"Ground truth:\n{sample['instruction']}")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Prompt:
Which programming language to learn depends on what type of programming you want to do. Python is an easy to learn, general purpose language that is quite popular. If you are aspiring to do front end development work, JavaScript would be a good choice. If you are an experienced developer looking for a high performance language, Rust might be an appropriate choice as it's gaining popularity.

Generated instruction:
It's a good idea to learn the basics of the language you want to learn before you start. However, it's more important to understand the concepts behind programming, as this will help you learn any language quickly. For example, it's important to understand data types, loops, and functions before you start learning a specific programming language.
 
### Output:
Python, Rust, or JavaScript would all be good choices. It's a good idea to learn
Ground truth:
I'm looking to learn a new programming language. Can you recommend one?
