<a href="https://colab.research.google.com/github/samratkar/samratkar.github.io/blob/main/LLM_Prod_Prompt_Engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Prompt Engineering: Basic and Advanced</h1>
<i>Methods for improving the output through prompt engineering.</i>

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


In [None]:
%%capture
!pip install -q langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

## Part 1: Loading models and playing with inference parameters

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "Qwen/Qwen2.5-1.5B-Instruct"

# Load tokenizer & model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",                 # picks GPU if available
    torch_dtype=(
        torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported()
        else "auto"
    ),
)

In [None]:
# Your chat
messages = [{"role": "user", "content": "Create a funny joke about chickens."}]

# Use the model’s chat template to build the prompt
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    do_sample=False,           # set True + temperature for sampling
    max_new_tokens=120,
    eos_token_id=tokenizer.eos_token_id,
)


Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [None]:
out = pipe(prompt)
print(out[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Why did the chicken cross the road? To get to the other side of the internet!


🔥 Qwen2.5 Text Generation Playground in Google Colab

This notebook lets you experiment with:

- **Temperature** – Controls randomness.
- **Top P** – Nucleus sampling for diversity.
- **Max Length** – Maximum tokens to generate.
- **Stop Sequences** – Stop generation at specific strings.
- **Frequency Penalty** – Reduce repeated words.
- **Presence Penalty** – Encourage new words/concepts.

In [None]:
# ===== Qwen2.5 Playground with Sliders (Colab-ready) =====
!pip install -q transformers accelerate ipywidgets

import torch, numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer, LogitsProcessor
import ipywidgets as w
from IPython.display import display

model_id = "Qwen/Qwen2.5-1.5B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=(torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else "auto"),
)
# Safety: ensure a pad token
if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token

# ---- Custom penalties (OpenAI-style approximation) ----
class FreqPresencePenalty(LogitsProcessor):
    """
    Applies frequency_penalty (proportional to count) and presence_penalty (flat if token seen)
    to the logits. Works with the running input_ids (prompt + generated).
    """
    def __init__(self, freq_penalty: float = 0.0, pres_penalty: float = 0.0):
        self.freq_penalty = float(freq_penalty)
        self.pres_penalty = float(pres_penalty)

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        if (self.freq_penalty == 0.0) and (self.pres_penalty == 0.0):
            return scores
        # input_ids: [batch, seq_len]; scores: [batch, vocab]
        with torch.no_grad():
            for b in range(input_ids.size(0)):
                ids = input_ids[b].tolist()
                if not ids:
                    continue
                uniq, counts = np.unique(ids, return_counts=True)
                penalties = self.freq_penalty * counts.astype(np.float32)
                penalties += self.pres_penalty * (counts > 0).astype(np.float32)
                # subtract penalties from logits of seen tokens
                idx = torch.tensor(uniq, device=scores.device, dtype=torch.long)
                pen = torch.tensor(penalties, device=scores.device, dtype=scores.dtype)
                scores[b, idx] -= pen
        return scores

# ---- Widgets ----
prompt = w.Textarea(
    value="Write a tiny sci-fi poem about a robot who loves coffee.",
    description="Prompt",
    layout=w.Layout(width="100%", height="100px"),
)

temperature = w.FloatSlider(value=0.7, min=0.0, max=2.0, step=0.05, description="Temperature")
top_p = w.FloatSlider(value=0.9, min=0.0, max=1.0, step=0.01, description="Top-p")
max_new_tokens = w.IntSlider(value=120, min=1, max=1024, step=1, description="Max tokens")

freq_pen = w.FloatSlider(value=0.0, min=0.0, max=2.0, step=0.05, description="Freq penalty")
pres_pen = w.FloatSlider(value=0.0, min=0.0, max=2.0, step=0.05, description="Presence pen.")

stop_sequences = w.Text(
    value="END",
    description="Stop seqs",
    placeholder="Comma-separated, e.g. ###,END,\n",
    layout=w.Layout(width="100%"),
)

system_prefix = w.Text(
    value="You are a helpful assistant.",
    description="System",
    layout=w.Layout(width="100%"),
)

btn = w.Button(description="Generate", button_style="primary")
out = w.Output()

# ---- Generation handler ----
def generate_clicked(_):
    out.clear_output()
    with out:
        try:
            # Build chat template
            messages = [
                {"role": "system", "content": system_prefix.value.strip()},
                {"role": "user", "content": prompt.value.strip()},
            ]
            input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

            # Sampling flags
            do_sample = temperature.value > 0.0
            gen_kwargs = {
                "max_new_tokens": int(max_new_tokens.value),
                "do_sample": do_sample,
                "top_p": float(top_p.value),
                "temperature": float(max(temperature.value, 1e-6)) if do_sample else None,
                "eos_token_id": tokenizer.eos_token_id,
                "pad_token_id": tokenizer.pad_token_id,
            }

            # Penalties
            processors = []
            if (freq_pen.value > 0.0) or (pres_pen.value > 0.0):
                processors.append(FreqPresencePenalty(freq_pen.value, pres_pen.value))

            output_ids = model.generate(
                **inputs,
                logits_processor=processors if processors else None,
                **{k:v for k,v in gen_kwargs.items() if v is not None},
            )

            # Decode only the newly generated portion
            gen_only = output_ids[0, inputs["input_ids"].shape[1]:]
            text = tokenizer.decode(gen_only, skip_special_tokens=True)

            # Apply stop sequences post-hoc
            stops = [s.strip() for s in stop_sequences.value.split(",") if s.strip()]
            if stops:
                cut = len(text)
                for s in stops:
                    idx = text.find(s)
                    if idx != -1:
                        cut = min(cut, idx)
                text = text[:cut]

            print(text.strip())

        except Exception as e:
            print("Error:", e)

btn.on_click(generate_clicked)

# ---- Layout ----
controls_left = w.VBox([temperature, top_p, max_new_tokens])
controls_right = w.VBox([freq_pen, pres_pen, stop_sequences])
row1 = w.HBox([controls_left, controls_right])

display(system_prefix, prompt, row1, btn, out)

print("Tip: Generally tweak either Temperature or Top-p (not both). "
      "Use penalties to reduce repetition; add stop sequences like 'END' or '###' to cut off output.")


Text(value='You are a helpful assistant.', description='System', layout=Layout(width='100%'))

Textarea(value='Write a tiny sci-fi poem about a robot who loves coffee.', description='Prompt', layout=Layout…

HBox(children=(VBox(children=(FloatSlider(value=0.7, description='Temperature', max=2.0, step=0.05), FloatSlid…

Button(button_style='primary', description='Generate', style=ButtonStyle())

Output()

Tip: Generally tweak either Temperature or Top-p (not both). Use penalties to reduce repetition; add stop sequences like 'END' or '###' to cut off output.


## Part 2: Advanced Prompt Engineering

## Complex Prompt

In [None]:
# Text to summarize which we stole from https://jalammar.github.io/illustrated-transformer/ ;)
text = """In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.
The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.
Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.
Popping open that Optimus Prime goodness, we see an encoding component, a decoding component, and connections between them.
The encoding component is a stack of encoders (the paper stacks six of them on top of each other – there’s nothing magical about the number six, one can definitely experiment with other arrangements). The decoding component is a stack of decoders of the same number.
The encoders are all identical in structure (yet they do not share weights). Each one is broken down into two sub-layers:
The encoder’s inputs first flow through a self-attention layer – a layer that helps the encoder look at other words in the input sentence as it encodes a specific word. We’ll look closer at self-attention later in the post.
The outputs of the self-attention layer are fed to a feed-forward neural network. The exact same feed-forward network is independently applied to each position.
The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence (similar what attention does in seq2seq models).
Now that we’ve seen the major components of the model, let’s start to look at the various vectors/tensors and how they flow between these components to turn the input of a trained model into an output.
As is the case in NLP applications in general, we begin by turning each input word into a vector using an embedding algorithm.
Each word is embedded into a vector of size 512. We'll represent those vectors with these simple boxes.
The embedding only happens in the bottom-most encoder. The abstraction that is common to all the encoders is that they receive a list of vectors each of the size 512 – In the bottom encoder that would be the word embeddings, but in other encoders, it would be the output of the encoder that’s directly below. The size of this list is hyperparameter we can set – basically it would be the length of the longest sentence in our training dataset.
After embedding the words in our input sequence, each of them flows through each of the two layers of the encoder.
Here we begin to see one key property of the Transformer, which is that the word in each position flows through its own path in the encoder. There are dependencies between these paths in the self-attention layer. The feed-forward layer does not have those dependencies, however, and thus the various paths can be executed in parallel while flowing through the feed-forward layer.
Next, we’ll switch up the example to a shorter sentence and we’ll look at what happens in each sub-layer of the encoder.
Now We’re Encoding!
As we’ve mentioned already, an encoder receives a list of vectors as input. It processes this list by passing these vectors into a ‘self-attention’ layer, then into a feed-forward neural network, then sends out the output upwards to the next encoder.
"""

# Prompt components
persona = "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\n"
instruction = "Summarize the key findings of the paper provided. \n"
context = "Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\n"
data_format = "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\n"
audience = "The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\n"
tone = "The tone should be professional and clear.\n"
#text = "MY TEXT TO SUMMARIZE"  # Replace with your own text to summarize
data = f"Text to summarize: {text}. "

# The full prompt - remove and add pieces to view its impact on the generated output
query = persona + instruction + data_format + audience + tone + data
#query =  data

In [None]:
messages = [
    {"role": "user", "content": query}
]
print(tokenizer.apply_chat_template(messages, tokenize=False))

<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.
Summarize the key findings of the paper provided. 
Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.
The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.
The tone should be professional and clear.
Text to summarize: In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in

In [None]:
# Generate the output
outputs = pipe(messages)
print(outputs[0]["generated_text"])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


**Method Summary:**
- **Model Overview:** The Transformer is a neural architecture used for natural language processing tasks, particularly effective in machine translation.
- **Components:** 
  - **Encoding Layer:** Consists of multiple identical encoders stacked together. Each encoder contains two sub-layers: a self-attention mechanism and a feed-forward neural network.
  - **Decoding Layer:** Similar to the encoding layer, containing multiple identical decoders. Between the decoders lies an attention mechanism.
- **Self-Attention Mechanism:** Used within the encoder to analyze context across words in the input sentence.



## In-Context Learning: Providing Examples

In [None]:
# Use a single example of using the made-up word in a sentence
one_shot_prompt = [
    {
        "role": "user",
        "content": "A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:"
    },
    {
        "role": "assistant",
        "content": "I have a Gigamuru that my uncle gave me as a gift. I love to play it at home."
    },
    {
        "role": "user",
        "content": "To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:"
    }
]
print(tokenizer.apply_chat_template(one_shot_prompt, tokenize=False))

<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:<|im_end|>
<|im_start|>assistant
I have a Gigamuru that my uncle gave me as a gift. I love to play it at home.<|im_end|>
<|im_start|>user
To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:<|im_end|>



In [None]:
# Generate the output
outputs = pipe(one_shot_prompt)
print(outputs[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


The orc warrior screegged his axe at the dragon's head before charging towards it. The sound of the screeg echoed through the forest as the warriors fought for their lives.


## Chain Prompting: Breaking up the Problem


In [None]:
# Create name and slogan for a product
product_prompt = [
    {"role": "user", "content": "Create a name and slogan for a chatbot that leverages LLMs."}
]
outputs = pipe(product_prompt)
product_description = outputs[0]["generated_text"]
print(product_description)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Name: "AI Companion"

Slogan: "Empowering Conversations with AI"


In [None]:
# Based on a name and slogan for a product, generate a sales pitch
sales_prompt = [
    {"role": "user", "content": f"Generate a very short sales pitch for the following product: '{product_description}'"}
]
outputs = pipe(sales_prompt)
sales_pitch = outputs[0]["generated_text"]
print(sales_pitch)

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


"Unlock smarter conversations with our AI Companion! Empower your interactions with intelligent insights and personalized assistance."


# **Reasoning with Generative Models**


## Chain-of-Thought: Think Before Answering


In [None]:
# # Answering without explicit reasoning

standard_prompt = [
     {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
     {"role": "assistant", "content": "11"},
     {"role": "user", "content": "The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
 ]

# # Run generative model
outputs = pipe(standard_prompt)
print(outputs[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


17


In [None]:
# Answering with chain-of-thought
cot_prompt = [
    {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
    {"role": "assistant", "content": "Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11."},
    {"role": "user", "content": "The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
]

# Generate the output
outputs = pipe(cot_prompt)
print(outputs[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


To find out how many apples the cafeteria has now, we need to follow these steps:

1. Start with the initial number of apples: 25.
2. Subtract the number of apples used for lunch: 25 - 20 = 5.
3. Add the number of apples bought later: 5 + 6 = 11.

So, the cafeteria now has 11 apples.


## Zero-shot Chain-of-Thought


In [None]:
# Zero-shot Chain-of-Thought
zeroshot_cot_prompt = [
    {"role": "user", "content": "The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step-by-step."}
]

# Generate the output
outputs = pipe(zeroshot_cot_prompt)
print(outputs[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Sure! Let's break this down step-by-step:

1. Initially, the cafeteria had 25 apples.
2. They used 20 apples to make lunch. So we subtract these from the initial amount:
   \( 25 - 20 = 5 \) apples remaining.
3. Then, they bought 6 more apples. We add these to the remaining amount:
   \( 5 + 6 = 11 \) apples.

So, after using some for lunch and buying more, the cafeteria has 11 apples left.


## Tree-of-Thought: Exploring Intermediate Steps


In [None]:
# Zero-shot Chain-of-Thought
zeroshot_tot_prompt = [
    {"role": "user", "content": "Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is 'The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results."}
]

In [None]:
# Generate the output

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    do_sample=False,           # set True + temperature for sampling
    max_new_tokens=620,
    eos_token_id=tokenizer.eos_token_id,
)


outputs = pipe(zeroshot_tot_prompt)
print(outputs[0]["generated_text"])

Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Sure! Let's break down the problem step-by-step:

### Step 1: Initial Count
- **Expert 1:** The cafeteria starts with 25 apples.
- **Expert 2:** They use 20 apples for lunch.
- **Expert 3:** After using 20 apples, there are \(25 - 20 = 5\) apples left.

### Step 2: Remaining Apples
- **Expert 1:** There are now 5 apples remaining.
- **Expert 2:** They buy 6 more apples.
- **Expert 3:** Adding these new apples to the remaining ones gives us \(5 + 6 = 11\) apples in total.

### Conclusion:
- **Expert 1:** The cafeteria has 5 apples after making lunch.
- **Expert 2:** After buying more apples, they have 11 apples.
- **Expert 3:** The final count is 11 apples.

This approach ensures that each expert follows a logical sequence of steps to arrive at the correct answer. If anyone realizes an error or inconsistency, they can stop and re-evaluate their reasoning.


In [None]:
zeroshot_tot_prompt = [
    {
        "role": "user",
        "content": (
            "Imagine three different experts are working together to solve this problem. "
            "The goal is to combine the numbers 4, 9, 10, and 13 using arithmetic operations (+, -, *, /) to get exactly 24. "
            "All experts will write down one step of their reasoning at a time and share it with the group. "
            "If any expert realizes their reasoning is incorrect at any point, they will stop and leave. "
            "The group will continue step by step until they either reach the solution or determine it is impossible. "
            "Make sure the experts discuss and evaluate each result before proceeding to the next step."
        )
    }
]


In [None]:
# Generate the output
outputs = pipe(zeroshot_tot_prompt)
print(outputs[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Let's start with the initial set of numbers: 4, 9, 10, and 13.

**Step 1:** One expert suggests combining two of these numbers in a way that might lead us closer to 24. They propose:
\[ 13 + (9 - 4) = 13 + 5 = 18 \]
This doesn't quite work because we need to get to 24.

**Step 2:** Another expert tries another combination:
\[ 10 * (13 - 9) = 10 * 4 = 40 \]
Again, this isn't close enough to 24.

**Step 3:** A third expert proposes:
\[ 13 * (10 - 4) = 13 * 6 = 78 \]
Still not there yet.

**Step 4:** The fourth expert suggests:
\[ 13 * (10 - 4) = 13 * 6 = 78 \]
This still doesn't work.

**Step 5:** The fifth expert then suggests:
\[ 13 * (10 - 4) = 13 * 6 = 78 \]
This approach also fails.

At this point, let's review our progress:

- We've tried various combinations but none have led us to 24.
- Each attempt has resulted in values significantly higher than 24.

Given that all previous attempts have failed, it seems unlikely that any combination of the given numbers can be used to achieve 

# **Output Verification**

## Providing Examples

In [None]:
# Zero-shot learning: Providing no examples
zeroshot_prompt = [
    {"role": "user", "content": "Create a character profile for an RPG game in JSON format."}
]

# Generate the output
outputs = pipe(zeroshot_prompt)
print(outputs[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


```json
{
  "character": {
    "name": "Evelyn",
    "race": "Human",
    "class": "Warrior",
    "alignment": "Neutral Good",
    "background": "Outlander",
    "stats": {
      "Strength": 18,
      "Dexterity": 16,
      "Constitution": 14,
      "Intelligence": 12,
      "Wisdom": 10,
      "Charisma": 8
    },
    "abilities": [
      "Defensive Maneuvers",
      "Combat Expertise",
      "Leadership"
    ],
    "equipment": {
      "primaryWeapon": "Greatsword",
      "secondaryWeapon": "Longbow",
      "armor": "Leather Armor",
      "rangedWeapons": ["Shortbow", "Crossbow"],
      "accessories": ["Amulet of Fortitude"]
    },
    "backstory": "Born into a family of outlaws, Evelyn grew up with a deep sense of justice and honor. After witnessing the brutal treatment of her people, she left her home to become a warrior, dedicated to protecting those who cannot protect themselves."
  }
}
```


In [None]:
# One-shot learning: Providing an example of the output structure
one_shot_template = """Create a short character profile for an RPG game. Make sure to only use this format:

{
  "description": "A SHORT DESCRIPTION",
  "name": "THE CHARACTER'S NAME",
  "armor": "ONE PIECE OF ARMOR",
  "weapon": "ONE OR MORE WEAPONS"
}
"""
one_shot_prompt = [
    {"role": "user", "content": one_shot_template}
]

# Generate the output
outputs = pipe(one_shot_prompt)
print(outputs[0]["generated_text"])

The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


```json
{
  "description": "A Short Description",
  "name": "The Shadow Weaver",
  "armor": "Leather Armor and a Cloak of Shadows",
  "weapon": "Longsword, Staff, and a Bow with Arrows"
}
```
