# Chain of Thought

This is a recent technique which can improve your LLM's reasoning ability (at the expense of some extra tokens!)

First lets make sure our libraries are up to date:

In [10]:
!pip install -U transformers
!pip install git+https://github.com/guidance-ai/guidance

Collecting git+https://github.com/guidance-ai/guidance
  Cloning https://github.com/guidance-ai/guidance to /tmp/pip-req-build-rqfn3rwq
  Running command git clone --filter=blob:none --quiet https://github.com/guidance-ai/guidance /tmp/pip-req-build-rqfn3rwq
  Resolved https://github.com/guidance-ai/guidance to commit 2f114670904119be93abe240c6a6855f8afdb192
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


We can now load a chat model using huggingface as before:

In [11]:
from transformers import pipeline

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
pipe = pipeline("text-generation", model_name, device_map="auto")

Here's a complex medical question for the model to solve:


In [12]:
MAIN_PROMPT = """\
A 12-month-old girl is brought in by her mother to the pediatrician for the
first time since her 6-month checkup. The mother states that her daughter
had been doing fine, but the parents are now concerned that their daughter
is still not able to stand up or speak. On exam, the patient has a temperature of 98.5°F (36.9°C), pulse is 96/min, respirations are 20/min, and blood
pressure is 100/80 mmHg. The child appears to have difficulty supporting
herself while sitting. The patient has no other abnormal physical findings.
She plays by herself and is making babbling noises but does not respond
to her own name. She appears to have some purposeless motions. A previous clinic note documents typical development at her 6-month visit and
mentioned that the patient was sitting unsupported at that time. Which of
the following is the most likely diagnosis?

A) Language disorder
B) Rett syndrome
C) Fragile X syndrome
D) Trisomy 21
"""

Lets try just asking this question to the model:

In [13]:
messages = [
     {"role": "system", "content": "You are a smart and intelligent medical expert."},
     {"role": "user", "content": MAIN_PROMPT},
]

output = pipe(messages)
output[0]["generated_text"][-1]["content"]

"The most likely diagnosis based on the given information is Fragile X syndrome. Fragile X syndrome is a genetic disorder that affects the development of the brain and can cause intellectual disability, autism, and other developmental delays. The patient's symptoms, including difficulty supporting herself while sitting, babbling noises, and purposeless motions, all suggest that she may have Fragile X syndrome. The previous clinic note from her 6-month visit also mentions that she was sitting unsupported at that time, which further supports the diagnosis."

The answer should be B. This is a hard problem so lets turn this into a zero-shot chain of thought prompt by asking the model to show its working:

In [14]:
messages = [
     {"role": "system", "content": "You are a smart and intelligent medical expert."},
     {"role": "user", "content": MAIN_PROMPT + "\nLet’s use step by step inductive reasoning, given the medical nature of the question"},
]

output = pipe(messages)
print(output[0]["generated_text"][-1]["content"])

Based on the given information, the most likely diagnosis is Rett syndrome.

1. Language disorder: The patient has difficulty supporting herself while sitting, which is a common symptom of Rett syndrome.

2. Rett syndrome: The patient's previous clinic note documents typical development at her 6-month visit, which is consistent with the symptoms of Rett syndrome.

3. Fragile X syndrome: The patient's mother states that her daughter had been doing fine, but the parents are now concerned that their daughter is still not able to stand up or speak. This is consistent with the symptoms of fragile X syndrome, which can cause difficulty with speech and language development.

4. Trisomy 21: The patient's temperature is 98.5°F (36.9°C), which is consistent with the symptoms of trisomy 21, which can cause developmental delays and intellectual disability.

5. Other: The patient has no other abnormal physical findings, which is not consistent with any of the other diagnoses.

In conclusion, Rett s

The model does a lot better! Would it do even better if you came up with some Few-Shot CoT examples? Try it.

# Guiding the model

We can combine this idea with the guidance library to extract the final answer.

In [15]:
from guidance import models, select, gen

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
lm = models.Transformers(model_name)

{% if message['role'] == 'user' %}
{{ '<|user|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'system' %}
{{ '<|system|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>
'  + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %} was unable to be loaded directly into guidance.
                        Defaulting to the ChatML format which may not be optimal for the selected model. 
                        For best results, create and pass in a `guidance.ChatTemplate` subclass for your model.


We can use the select function to force the model to output (A,B,C,D) after its reasoning:

In [16]:
@guidance(stateless=False)
def cot(lm):
  messages = [
      {"role": "system", "content": "You are a smart and intelligent medical expert."},
      {"role": "user", "content": MAIN_PROMPT},
  ]
  prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
  prompt = prompt.replace("</s>", "")

  lm += prompt +  \
    "\nLet’s use step by step inductive reasoning, given the medical nature of the question. " +\
    gen(max_tokens=200)

  return lm

lm += cot()

lm + """\n<|user|> Therefore, among A-D, which answer is most likely?
<|assistant|> """ + select(["A", "B", "C", "D"])

# Exercises

1. Experiment with different methods for improving prompts. The [Prompt Engineering Guide](https://www.promptingguide.ai) is a good place to start.