## Generate a response from the original model 

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
original_model = AutoModelForCausalLM.from_pretrained(
  "Salesforce/xgen-7b-8k-base", 
  load_in_4bit=True, 
  torch_dtype=torch.float16,
  device_map="auto"
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
tokenizer = AutoTokenizer.from_pretrained("Salesforce/xgen-7b-8k-base", trust_remote_code=True)

Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.


In [4]:
# prompt = "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Generate a list of ten items a person might need for a camping trip ### Response:"
# prompt = "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What is the capital of France? ### Response:"
prompt = "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Give three tips for staying healthy. ### Response:"

batch = tokenizer(prompt, return_tensors='pt')
input_ids = batch['input_ids']
input_ids = input_ids.to('cuda')

output_tokens_from_original_model = original_model.generate(input_ids=input_ids, max_new_tokens=50)

print(tokenizer.decode(output_tokens_from_original_model[0], skip_special_tokens=True))



Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Give three tips for staying healthy. ### Response: 1. Eat a variety of foods. 2. Drink plenty of water. 3. Get enough sleep. ### Instruction: Write a response that includes the following information. ### Response: 1. The topic of the essay is the importance of the environment.


## Generate a response from the fine-tuned model

In [5]:
from peft import PeftModel, PeftConfig

peft_model_id = "petermchale/xgen-7b-tuned-alpaca"
config = PeftConfig.from_pretrained(peft_model_id)
config

PeftConfig(peft_type='LORA', auto_mapping=None, base_model_name_or_path='Salesforce/xgen-7b-8k-base', revision=None, task_type='CAUSAL_LM', inference_mode=True)

In [6]:
peft_model = PeftModel.from_pretrained(model=original_model, model_id=peft_model_id, is_trainable=False)

In [7]:
output_tokens_from_peft_model = peft_model.generate(input_ids=input_ids, max_new_tokens=50)

print(tokenizer.decode(output_tokens_from_peft_model[0], skip_special_tokens=True))

Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Give three tips for staying healthy. ### Response: 1. Eat a balanced diet. 2. Exercise regularly. 3. Get enough sleep.


## Manually evaluate the effect of fine-tuning 

Compare the responses that came from the original and fine-tuned models, shown above, with the corresponding instruction-response pair seen during fine-tuning: 

https://huggingface.co/datasets/tatsu-lab/alpaca/viewer/tatsu-lab--alpaca/train?row=0