# Test Fine-Tuned Model
Testing the llama3.2-lora-final model to evaluate output quality

In [1]:
import torch
from unsloth import FastLanguageModel

print(f"Using GPU: {torch.cuda.get_device_name(0)}")

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
Using GPU: NVIDIA GeForce RTX 4070


In [2]:
# Load the fine-tuned model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="llama3.2-lora-final",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)
print("Model loaded successfully!")

==((====))==  Unsloth 2026.1.4: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    NVIDIA GeForce RTX 4070. Num GPUs = 2. Max memory: 11.595 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.9. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post1. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2026.1.4 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


Model loaded successfully!


In [3]:
# Test Case 1: Animation preference
test_prompt = """### Instruction:
Analyze the user's history to identify their preference. Compare two potential movies and explain which one is the better recommendation.

### Input:
History: Toy Story (1995), Lion King, The (1994), Beauty and the Beast (1991), Aladdin (1992). Option A: Shrek (2001) (Animation|Children's|Comedy). Option B: Terminator 2: Judgment Day (1991) (Action|Sci-Fi|Thriller).

### Response:
"""

inputs = tokenizer([test_prompt], return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

### Instruction:
Analyze the user's history to identify their preference. Compare two potential movies and explain which one is the better recommendation.

### Input:
History: Toy Story (1995), Lion King, The (1994), Beauty and the Beast (1991), Aladdin (1992). Option A: Shrek (2001) (Animation|Children's|Comedy). Option B: Terminator 2: Judgment Day (1991) (Action|Sci-Fi|Thriller).

### Response:
The user's history indicates a preference for the genres and themes of the provided candidates. Given this information, the best recommendation is Option B: Terminator 2: Judgment Day (1991) (Action|Sci-Fi|Thriller) because it aligns with the user's interest in the provided candidates. The user's history shows an interest in the genres and themes of the provided candidates, making Terminator 2: Judgment Day the most relevant choice.


In [4]:
# Test Case 2: Sci-fi preference
test_prompt_2 = """### Instruction:
Analyze the user's history to identify their preference. Compare two potential movies and explain which one is the better recommendation.

### Input:
History: Matrix, The (1999), Blade Runner (1982), Star Wars (1977), Aliens (1986). Option A: Star Trek: First Contact (1996) (Action|Adventure|Sci-Fi). Option B: Princess Bride, The (1987) (Action|Adventure|Comedy|Romance).

### Response:
"""

inputs = tokenizer([test_prompt_2], return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

### Instruction:
Analyze the user's history to identify their preference. Compare two potential movies and explain which one is the better recommendation.

### Input:
History: Matrix, The (1999), Blade Runner (1982), Star Wars (1977), Aliens (1986). Option A: Star Trek: First Contact (1996) (Action|Adventure|Sci-Fi). Option B: Princess Bride, The (1987) (Action|Adventure|Comedy|Romance).

### Response:
The most relevant movie is Option A: Star Trek: First Contact (1996) because it aligns with the user's interest in the genres and themes of their previous watches. Option B: Princess Bride, The (1987) is not as relevant because it falls outside the user's interest in the genres and themes of their previous watches.


## Evaluation Criteria

Check if the model:
1. **Identifies user preferences** from viewing history
2. **Compares both options** (A and B)
3. **Provides clear reasoning** for the recommendation
4. **Selects the appropriate option** based on patterns

**Good output example:**
> "The better recommendation is Option A. Reasoning: The user has shown a strong preference for Animation|Children's films. Option A (Animation|Children's|Comedy) aligns with their viewing patterns, while Option B (Action|Sci-Fi|Thriller) does not."

If outputs are poor, training needs to be done.