# Modular Prompts

Sometimes prompting LLMs feels like more of an art than a science. Lets explore using dspy to generate and optimize prompts.

In [None]:
!pip install -U dspy
!pip install litellm
!pip install ctransformers

Collecting dspy
  Downloading dspy-2.5.38-py3-none-any.whl.metadata (7.2 kB)
Collecting backoff (from dspy)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting datasets (from dspy)
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting diskcache (from dspy)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting json-repair (from dspy)
  Downloading json_repair-0.30.2-py3-none-any.whl.metadata (11 kB)
Collecting litellm==1.51.0 (from dspy)
  Downloading litellm-1.51.0-py3-none-any.whl.metadata (32 kB)
Collecting magicattr~=0.1.6 (from dspy)
  Downloading magicattr-0.1.6-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting optuna (from dspy)
  Downloading optuna-4.1.0-py3-none-any.whl.metadata (16 kB)
Collecting ujson (from dspy)
  Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.3 kB)
Collecting asyncer==0.0.8 (from dspy)
  Downloading asyncer-0.0.8-py3-none-any.whl.metadata (6.7 kB)


dspy can use a range of LLM APIs (e.g. openai, gemini, etc..) which you can setup easily.

For this demo, we'll use the same local model we used previously (and means we have to write some custom code for it):

In [None]:
#@title HuggingFace connection stuff
import litellm
from litellm import CustomLLM, completion, get_llm_provider
from litellm.types.utils import Choices, Message, ModelResponse
import dspy
from ctransformers import AutoModelForCausalLM, AutoConfig
from transformers import AutoTokenizer, pipeline

class LocalHf(CustomLLM):
  def __init__(self):
    self.pipe = None
    super().__init__()

  def completion(self, *args, **kwargs) -> litellm.ModelResponse:
    if self.pipe is None:
      config = AutoConfig.from_pretrained(kwargs["model"])
      # Explicitly set the max_seq_len
      config.max_seq_len = 4096
      config.max_answer_len= 1024
      model = AutoModelForCausalLM.from_pretrained(kwargs["model"], hf=True, gpu_layers=50, config=config)
      tokenizer = AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf")
      self.pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=2048)
      self.pipe.tokenizer.chat_template = """{%- for message in messages %}
  {%- if message['role'] == 'user' %}
      {{- bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }}
  {%- elif message['role'] == 'system' %}
      {{- '<<SYS>>\\n' + message['content'].strip() + '\\n<</SYS>>\\n\\n' }}
  {%- elif message['role'] == 'assistant' %}
      {{- '[ASST] '  + message['content'] + ' [/ASST]' + eos_token }}
  {%- endif %}
{%- endfor %}"""

    response = self.pipe(kwargs["messages"], max_length=1024)[0]["generated_text"][-1]
    return litellm.ModelResponse(
        choices=[Choices(message=Message(content=response["content"], role=response["role"]))],
        model=kwargs["model"],
    )

local_hf = LocalHf()
litellm.custom_provider_map = [
        {"provider": "local-hf", "custom_handler": local_hf}
]

In [None]:
dspy.configure(lm=dspy.LM('local-hf/TheBloke/Llama-2-7B-Chat-GGML'))

# Defining the Task

Now we have dspy all setup with a LLM, we can start defining our task.

First we define our signature. This specifies the input and output fields of our task:

In [None]:
from typing import Literal

class Sex(dspy.Signature):
    """Classify the sex of the animal (not the owner)."""

    sentence: str = dspy.InputField(desc="Description of a pet visiting the vet.")
    sex: Literal['Male', 'Female', 'Unknown'] = dspy.OutputField(desc="The sex of the animal (if specified)")

Lets test it out by giving our signature to the most basic dspy module, `Predict`

In [None]:
classify = dspy.ChainOfThought(Sex)
classify(sentence="The owner said the dog was ill.")

We can print out the calls made to the language model. See how dspy converts our `Signature` and `Module` into a llm prompt:

In [None]:
dspy.inspect_history(n=1)

We can easily swap out modules and signatures. Try swapping `dspy.Predict` for `dspy.ChainOfThought` and see how the prompt changes.

# Optimizing
Lets define some examples, we'll let dspy optimize the prompt based on these.

In [None]:
examples = [
  {"sentence": "A Poodle with skin allergies came in for her monthly injection.", "sex": "Female"},
  {"sentence": "A turtle was brought in with a cracked shell.", "sex": "Unknown"},
  {"sentence": "A Siamese kitten visited for her first wellness exam.", "sex": "Female"},
  {"sentence": "A Labrador was brought in for his routine wellness bloodwork.", "sex": "Male"},
  {"sentence": "A kitten was treated for fleas and intestinal parasites.", "sex": "Unknown"},
  {"sentence": "The fluffy tabby cat named Luna was scheduled for a spay surgery.", "sex": "Female"},
  {"sentence": "An orange-striped tomcat named Simba was neutered this morning.", "sex": "Male"},
  {"sentence": "The black cat with the name Midnight underwent a neuter procedure today.", "sex": "Male"}
]

examples = [dspy.Example(e).with_inputs("sentence") for e in examples]

We can now evaluate our model:

In [None]:
from dspy.evaluate import Evaluate

def exact_match(example, pred, trace=None):
  return example.sex == pred.sex

evaluator = Evaluate(devset=examples, num_threads=1, display_progress=True, display_table=5)
evaluator(classify, metric=exact_match)

Hmm not that great. We can use optimizers to refine the prompt and evaluate it again:

In [None]:

optimizer = dspy.LabeledFewShot(k=3)
#optimizer = dspy.BootstrapFewShot(metric=exact_match, max_labeled_demos=3, max_bootstrapped_demos=1)
#optimizer = dspy.MIPROv2(metric=exact_match, auto="light")

optimized = optimizer.compile(classify, trainset=examples)#, requires_permission_to_run=False)
evaluator(optimized, metric=exact_match)

In [None]:
dspy.inspect_history(n=1)