# Prompting for Hybrid Fairy-Tales
Comparing Prompt Engineering Strategies Across LLMs

## 1. Setup

### Import Libraries

In [None]:
! pip install -r requirements.txt

In [19]:
from openai import OpenAI
from math import exp
import numpy as np
from IPython.display import display, HTML
import os
import pandas as pd
import transformers
import json
import sys
from models.models import *
from llama_cpp import Llama
import torch
from dotenv import load_dotenv
from together import Together

load_dotenv()
openai_key = os.getenv("OPENAI_API_KEY")
together_key = os.getenv("TOGETHERAI_API_KEY")

---

### Load Prompt data

I created a JSON containing the prompts

In [2]:
with open('prompts/prompts.json', 'r') as f:
    meta_prompts = json.load(f)

meta_prompts

[{'Technique': 'Zero-Shot',
  'System': 'You are an AI language model tasked with writing short and imaginative fairy tales. In each case, you will be asked to write a fairy tale that blends two contrasting narrative genres. Do not assume any prior context or style. Simply respond with a complete and coherent fairy tale based solely on the genres provided. Use clear language and avoid adding your own framing or interpretations.'},
 {'Technique': 'Role Prompting',
  'System': 'You are a professional fairy tale writer known for your mastery in combining unconventional literary styles. Your task is to write a short fairy tale that skillfully merges two contrasting genres, as specified by the user. Your storytelling should demonstrate creative discipline, blending the genres in a balanced way while remaining entertaining and coherent. Maintain a polished tone and ensure that the story reflects the unique narrative features of both genres.'},
 {'Technique': 'Few-Shot',
  'System': 'Below ar

---

## 2. Prompting plan

For a first view, we can convert the meta-prompts into a Pandas DataFrame and display them.

In [20]:
system_prompts_df = pd.DataFrame(system_prompts)
system_prompts_df


Unnamed: 0,Technique,System
0,Zero-Shot,You are an AI language model tasked with writi...
1,Role Prompting,You are a professional fairy tale writer known...
2,Few-Shot,Below are two examples of how to write creativ...
3,Style Prompting,"Write the fairy tale using rich, evocative lan..."
4,Emotion + Zero-Shot,"Respond with a fairy tale that evokes emotion,..."
5,Emotion + Role Prompting,"You are a dream-weaver, a generative storytell..."


The two genres that will be blended are: **crime** +  **fantasy**

In [4]:
user_prompt = "The fairy tale will mix the two following genres: crime + fantasy"

---

## 3. Generation

In this section, we generate hybrid fairy tales using six different prompting strategies across three large language models (LLMs).
Each story is produced in response to a fixed user instruction — asking the model to generate a fairy tale blending two contrasting narrative genres — and guided by a system prompt that embodies one of the six prompting techniques defined earlier.

We use the following models:

- **GPT-4** (OpenAI): a strong commercial model, used as a qualitative reference for generation quality.
  ⚠️ *This model does not provide internal explainability tools like logprobs.*

- **LLaMA 3.1–8B** (via TogetherAI API): an open-source model.
  ✅ *Provides token-level logprobs and supports direct XAI-style analysis.*

- **Mistral 7B** (via TogetherAI API): still an open-source model.


Each model receives the same pair of inputs:
1. A **system prompt** representing one of the six prompting techniques.
2. A **user prompt** that specifies the genres to be blended (e.g., "Please write a fairy tale that combines horror and comedy.").

This setup allows us to:
- Observe how different prompting strategies shape the story generation process,
- Compare how each model interprets the same input,
- Apply explainability tools (logprobs) to open models to measure which parts of the prompt influence the output most.

The following subsections are dedicated to the generation process for each model individually.

#### Generation: GPT-4 (Reference)

In [21]:
system_prompts = system_prompts
gpt4_outputs = []

# Loop on each prompt
for entry in system_prompts:
    technique = entry["Technique"]
    system_prompt = entry["System"]

    full_prompt = f"{system_prompt}\n\n{user_prompt}"
    output = query_gpt4(full_prompt)

    gpt4_outputs.append({
        "Technique": technique,
        "GeneratedText": output
    })

with open("outputs/gpt-4o_tales.json", "w") as f:
    json.dump(gpt4_outputs, f, indent=2)

In [6]:
gpt4_outputs

[{'Technique': 'Zero-Shot',
  'GeneratedText': "Once upon a time, in the mystical land of Ailuria, where fairies twinkled under the crescent moon and unicorns graced the verdant meadows, the harmony of this enchanted realm was suddenly disrupted. The precious Moonstone, the divine artifact which preserved Ailuria's magic, had been stolen. The disappearance of the Moonstone meant the depletion of magic, the doom of Ailuria, where the line between crime and fantasy began to blur.\n\nThe investigation was led by Detective Owliver, a wise owl with a silver badge pinned to his feathery chest and a pair of spectacles perched on his beak. Even though Owliver was knowledgeable about crimes, he was still a rookie when it came to magic and fantasy. However, the urgency of the situation led him to take the mission, with a shimmering wand in his wing.\n\nHis first suspect was the sly fox, Reynard, known around Ailuria for his tricks and deceit. Owliver flew to Reynard's den, which was filled with 

#### Generation + XAI: LlaMA 3.1 - 8B

In [25]:
# Client Initialization
client = Together()

# Model Selection
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"

llama_outputs = []

for entry in system_prompts:
    technique = entry["Technique"]
    system_prompt = entry["System"]

    prompt = {
        "model": model_name,
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        "max_tokens": 700,
        "temperature": 0.8,
        "logprobs": 5  # <- qui richiedi i logprobs
    }

# Generation
    resp = client.chat.completions.create(**prompt)

# Extract and save output
    choice = resp.choices[0]
    generated = choice.message.content
    log = choice.logprobs  # contiene tokens, token_logprobs, top_logprobs

    result = {
        "model": model_name,
        "technique": technique,
        "generated": generated,
        "tokens": log.tokens,
        "token_logprobs": log.token_logprobs,
        "top_logprobs": log.top_logprobs
    }
    llama_outputs.append(result)

with open("outputs/together_llama_test.json", "w") as f:
    json.dump(llama_outputs, f, indent=2)

##### LM Tentative with ollama.cpp

In [7]:
llm = Llama(
    model_path="/Users/lorispalmarin/projects/llama.cpp/models/llama3.2_1B.gguf",
    logprobs=True,
    n_ctx=512
)

llama_outputs = []

for entry in system_prompts:
    technique = entry["Technique"]
    system_prompt = entry["System"]

    full_prompt = f"{system_prompt}\n\n{user_prompt}"
    output = query_llama(full_prompt)

    llama_outputs.append({
        "Technique": technique,
        "GeneratedText": output
    })

llama_model_load_from_file_impl: using device Metal (Apple M1) - 5455 MiB free
llama_model_loader: loaded meta data with 27 key-value pairs and 146 tensors from /Users/lorispalmarin/projects/llama.cpp/models/llama3.2_1B.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = llama3.2_1B_hf
llama_model_loader: - kv   3:                         general.size_label str              = 1.2B
llama_model_loader: - kv   4:                          llama.block_count u32              = 16
llama_model_loader: - kv   5:                       llama.context_length u32              = 8192
llama_model_loader: - kv   6:                     llama.em

KeyboardInterrupt: 

In [None]:
llama_outputs

#### Generation + XAI: Mistral 7B Instruct

In [12]:
client = Together()

# House
model_name = "mistralai/Mistral-7B-Instruct-v0.1"

# 4. Prompt di prova
prompt = {
    "model": model_name,
    "messages": [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ],
    "max_tokens": 200,
    "temperature": 0.8,
    "logprobs": 5,
    "echo": True# <- qui richiedi i logprobs
}

# 5. Esegui la generazione
resp = client.chat.completions.create(**prompt)

# 6. Estrai e salva output
choice = resp.choices[0]
generated = choice.message.content
log = choice.logprobs  # contiene tokens, token_logprobs, top_logprobs

result = {
    "model": model_name,
    "generated": generated,
    "tokens": log.tokens,
    "token_logprobs": log.token_logprobs,
    "top_logprobs": log.top_logprobs
}

# 7. Stampa e salva
print("\n".join([
    f"{tok}: {lp:.2f}" for tok, lp in zip(result["tokens"], result["token_logprobs"])
]))

with open("outputs/together_mistral_test.json", "w") as f:
    json.dump(result, f, indent=2)

 Once: -0.40
 upon: -0.00
 a: -0.00
 time: -0.00
,: -0.06
 in: -0.08
 a: -0.06
 world: -2.38
 of: -2.33
 magic: -0.14
 and: -0.01
 wonder: -0.29
,: -0.00
 there: -0.14
 was: -0.50
 a: -0.00
 kingdom: -0.21
 ruled: -0.14
 by: -0.00
 a: -0.03
 fair: -1.74
 and: -0.18
 just: -0.01
 queen: -0.29
.: -0.12
 However: -0.59
,: -0.00
 a: -2.70
 group: -2.14
 of: -0.00
 powerful: -1.23
 crim: -1.95
inals: 0.00
 had: -0.05
 taken: -0.38
 over: -0.05
 the: -0.02
 kingdom: -0.56
,: -0.11
 spreading: -2.47
 fear: -0.52
 and: -0.00
 chaos: -0.20
 wherever: -0.27
 they: -0.00
 went: -0.00
.: -0.00
 The: -0.35
 queen: -0.78
,: -1.35
 desperate: -0.16
 to: -0.14
 restore: -0.22
 peace: -0.05
 and: -1.39
 order: -0.14
,: -0.04
 turned: -1.85
 to: -0.00
 the: -0.26
 one: -2.62
 person: -0.42
 who: -0.09
 had: -2.72
 the: -0.01
 power: -0.01
 to: -0.00
 defeat: -0.37
 the: -0.04
 evil: -5.12
 crim: -1.05
inals: 0.00
:: -0.36
 a: -0.03
 young: -0.21
 dragon: -5.28
 with: -0.97
 a: -0.92
 heart: -0.10
 of: -

In [None]:
client = OpenAI(api_key=openai_key)

model_id = "gpt2"  # o "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    output_attentions=True,
    output_hidden_states=True
)
model.eval()
if torch.backends.mps.is_available():
    model = model.to("mps")

def query_gpt2(prompt, max_new_tokens=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.8,
            top_k=50,
            top_p=0.9,
            output_attentions=True,
            output_hidden_states=True,
            return_dict_in_generate=True,
            output_scores=True
        )
    decoded = tokenizer.decode(output.sequences[0], skip_special_tokens=True)
    return output, decoded

In [None]:
query_gpt2("ciao")

In [None]:
tokenizer = AutoTokenizer.from_pretrained("gpt2", use_fast=True)
model = AutoModelForCausalLM.from_pretrained("gpt2")
# set model decoder to true
model.config.is_decoder = True
# set text-generation params under task_specific_params
model.config.task_specific_params["text-generation"] = {
    "do_sample": True,
    "max_length": 50,
    "temperature": 0.7,
    "top_k": 50,
    "no_repeat_ngram_size": 2,
}
s = ["Yesterday I knew a very nice girl. She works as a"]
explainer = shap.Explainer(model, tokenizer)
shap_values = explainer(s)

In [None]:
def get_completion(
    messages: list[dict[str, str]],
    model: str = "gpt-4",
    max_tokens=500,
    temperature=0,
    stop=None,
    seed=123,
    tools=None,
    logprobs=None,  # whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message..
    top_logprobs=None,
) -> str:
    params = {
        "model": model,
        "messages": messages,
        "max_tokens": max_tokens,
        "temperature": temperature,
        "stop": stop,
        "seed": seed,
        "logprobs": logprobs,
        "top_logprobs": top_logprobs,
    }
    if tools:
        params["tools"] = tools

    completion = client.chat.completions.create(**params)
    return completion

In [None]:
output = get_completion(
        [{"role": "user", "content": "Let your imagination guide you. Write a prompt that invites a language model to weave a whimsical and mysterious fairy tale where unexpected genres collide. The story it evokes should feel enchanted and surprising, a place where darkness dances with light, or fear meets laughter. Let the emotional tone of the prompt carry the creative intent."}],
        model="gpt-4o",
        logprobs=True,
        top_logprobs=5
    )
print(output.choices[0].message.content)

---

## 4. Qualitative Analysis

---

## 5. Quantitative Analysis

---

## 6. Conclusion