# Implementing a new model prompt with debug output

## The new prompt format we want to use

As an example, we will implement the [Orca Mini 3B](https://huggingface.co/pankajmathur/orca_mini_3b) prompt.

The prompt looks like this:

```Python
prompt = "### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"
```

Where `system` is the system prompt and `instruction` is the user input instructions.

In [2]:
from guidance import gen, system, user, assistant

# Custom prompt implementation
from guidance.models.transformers._transformers import Transformers, TransformersChat

class Orca(Transformers):
    pass

class OrcaChat(TransformersChat, Orca):
    def get_role_start(self, role_name, **kwargs):
        if role_name == "system":
            return "### System:\n"
        elif role_name == "user":
            if str(self).endswith("\n\n### User:\n"):
                return "" # we don't need to start anything if we are starting with a top level unnested system tag
            else:
                return "### System:\n"
        else:
            return " "

    def get_role_end(self, role_name=None):
        if role_name == "system":
            return "\n\n### User:\n"
        elif role_name == "user":
            return "\n\n### Response:\n"
        else:
            return " "


## Loading the new OrcaChat model

In [3]:
import torch

orca = OrcaChat('pankajmathur/orca_mini_3b', torch_dtype=torch.float16, device_map='auto')
# orca = OrcaChat('gpt2', device_map='auto') # Can use a small mock model while iterating on the prompt

  from .autonotebook import tqdm as notebook_tqdm


### With full debug output enabled

In [21]:
with system(debug=True):
    lm = orca + "You are a cat expert."

with user(debug=True):
    lm += "What are the smallest cats?"

with assistant(debug=True):
    lm += gen("answer", stop=".", max_tokens=20)

In [1]:
%load_ext autoreload
%autoreload 2

In [22]:
with system():
    lm = orca + "You are a cat expert."

with user():
    lm += "What are the smallest cats?"

with assistant():
    lm += gen("answer", stop=".", max_tokens=20)

### With granular debug output enabled

Here we can try to activate only part of the prompt output, to see how Guidance actually generates it and from which blocks (system/user/assistant)

Only System prompt debug:

In [23]:
with system(debug=True):
    lm = orca + "You are a cat expert."

with user():
    lm += "What are the smallest cats?"

with assistant():
    lm += gen("answer", stop=".", max_tokens=20)

Only User prompt debug:

In [24]:
with system():
    lm = orca + "You are a cat expert."

with user(debug=True):
    lm += "What are the smallest cats?"

with assistant():
    lm += gen("answer", stop=".", max_tokens=20)

Only Assistant prompt debug:

In [25]:
with system():
    lm = orca + "You are a cat expert."

with user():
    lm += "What are the smallest cats?"

with assistant(debug=True):
    lm += gen("answer", stop=".", max_tokens=20)