
# 🧠 Chain of Thought (CoT) Reasoning with LLaMA-2-13B-Chat 💬

In this tutorial, we explore how the **LLaMA-2-13B-Chat-HF** model—both from **Meta** and **NousResearch**—handles reasoning through **Chain of Thought (CoT)** prompting. 🦙✨

You'll learn how to apply **in-context learning** using:
- 🚀 Zero-shot prompting
- 🎯 One-shot prompting
- 🧩 Few-shots prompting

We walk through several examples to observe how well the model can reason through problems step by step, especially when encouraged with **CoT-style** prompts. The goal is to understand the model's **emergent reasoning ability** and how shot-based examples impact its performance.

📌 Whether you're building smarter agents or exploring LLM reasoning, this notebook will give you practical insights into **prompt engineering** and **LLM behavior under various learning conditions**.

In [None]:
#%pip install -U torch==2.0.1 \
#  transformers==4.33.0 \
#  sentencepiece==0.1.99 \
#  accelerate==0.22.0 # needed for low_cpu_mem_usage parameter

In [None]:
import torch
from transformers import LlamaTokenizer
from transformers import LlamaForCausalLM

model_checkpoint = "NousResearch/Llama-2-7b-chat-hf"
#LATER WE WILL TRY WITH BIGGER MODEL "NousResearch/Llama-2-13b-chat-hf"
tokenizer = LlamaTokenizer.from_pretrained(model_checkpoint)

model = LlamaForCausalLM.from_pretrained(model_checkpoint, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)

model = model.eval() #inference mode

## 🚨🚨🚨 Side Section Begins 🚨🚨🚨
### Using Llama-2-13b-chat-hf from Meta Llama on Hugging Face 🤗
  1.🐱‍💻 Hugging Face Hub Login using notebook_login() function will prompt for your credentials 🔑, giving you access to the Hub's resources.


- 🖥️ **Go to Hugging Face Website**

-  🔑 **Log In to Your Account**

-  👤 **Navigate to Your Settings by Clicking on your profile icon and select Settings**

-  🔐 **Generate a New Token** (Access Token) with scope (e.g., **read**, **write**, or **admin**).

-  📄 **Copy Your Token**

-  🔄 **Use the Token in Your Code**


In [None]:
from huggingface_hub import notebook_login
notebook_login()

  2. 🔁 Using `meta-llama/Llama-2-13b-chat-hf` 🦙 Instead of NousResearch

- Loads the Meta model
- Load the tokenizer
- Continue the remaining code in the notebook




In [None]:
# Load model from Meta Llama
from transformers import AutoTokenizer, AutoModelForCausalLM

model_checkpoint = "meta-llama/Llama-2-13b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForCausalLM.from_pretrained(model_checkpoint)


🚨🚨🚨 End of Side Section 🚨🚨🚨


###🧩 Preparing Messages for LLaMA Prompt Format

This utility function transforms a list of message histories into properly formatted input prompts for LLaMA-style chat models, following the instruction formatting used in many fine-tuned Hugging Face models.

### 🔍 What the Code Does:
- Defines a `Message` structure with roles (`system`, `user`, `assistant`) and content.
- Prepares messages with system instructions using special tokens like `<<SYS>>` and `[INST]...[/INST]`.
- Verifies correct message ordering:
  - A `system` message (optional, must be first)
  - Followed by alternating `user` and `assistant` messages
  - Ending with a `user` message
- Builds input strings by interleaving user and assistant turns, wrapped in `[INST]` tags, and adds `bos_token` and `eos_token` as required by the tokenizer.
- Ensures the format is compatible with models expecting instruction-style inputs (like LLaMA-2 chat variants).

🛠️ This function is adapted from [llama-cpp-chat-completion-wrapper](https://github.com/viniciusarruda/llama-cpp-chat-completion-wrapper/blob/1c9e29b70b1aaa7133d3c7d7b59a92d840e92e6d/llama_cpp_chat_completion_wrapper.py)



In [None]:
# based on https://github.com/viniciusarruda/llama-cpp-chat-completion-wrapper/blob/1c9e29b70b1aaa7133d3c7d7b59a92d840e92e6d/llama_cpp_chat_completion_wrapper.py

from typing import List
from typing import Literal
from typing import TypedDict

from transformers import PreTrainedTokenizer

Role = Literal["system", "user", "assistant"]

class Message(TypedDict):
    role: Role
    content: str

MessageList = List[Message]

BEGIN_INST, END_INST = "[INST] ", " [/INST] "
BEGIN_SYS, END_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

def convert_list_of_message_lists_to_input_prompt(list_of_message_lists: List[MessageList], tokenizer: PreTrainedTokenizer) -> List[str]:
    input_prompts: List[str] = []
    print(type(list_of_message_lists))
    print(type(list_of_message_lists[0]))
    for message_list in list_of_message_lists:
        if message_list[0]["role"] == "system":
            content = "".join([BEGIN_SYS, message_list[0]["content"], END_SYS, message_list[1]["content"]])
            message_list = [{"role": message_list[1]["role"], "content": content}] + message_list[2:]

        if not (
            all([msg["role"] == "user" for msg in message_list[::2]])
            and all([msg["role"] == "assistant" for msg in message_list[1::2]])
        ):
            raise ValueError(
                "Format must be in this order: 'system', 'user', 'assistant' roles.\nAfter that, you can alternate between user and assistant multiple times"
            )

        eos = tokenizer.eos_token
        bos = tokenizer.bos_token
        input_prompt = "".join(
            [
                "".join([bos, BEGIN_INST, (prompt["content"]).strip(), END_INST, (answer["content"]).strip(), eos])
                for prompt, answer in zip(message_list[::2], message_list[1::2])
            ]
        )

        if message_list[-1]["role"] != "user":
            raise ValueError(f"Last message must be from user role. Instead, you sent from {message_list[-1]['role']} role")

        input_prompt += "".join([bos, BEGIN_INST, (message_list[-1]["content"]).strip(), END_INST])

        input_prompts.append(input_prompt)

    return input_prompts

### 🧠 Zero-Shot Chain of Thought Reasoning

In zero-shot learning, the model is given a new question without any examples beforehand. It must reason through the solution purely from the current prompt.

**Prompt Example:**

**QUESTION :**
A factory produces both widgets and gadgets. For every 5 widgets, it uses 3 units of material A. For every 4 gadgets, it uses 6 units of material A.
On a particular day, the factory produces 20 widgets and 16 gadgets.
If the factory only had 50 units of material A at the start, how many units of material A will it have left after production?

**ANSWER :**  
 <br />
 <br />


**What should we expect as an answer?**

In [None]:
system_message = Message()
system_message["role"] = "system"
system_message["content"] = "" #Note: no role for now
print(system_message)

user_message = Message()
user_message["role"] = "user"
user_message["content"] = """**QUESTION :**
A factory produces both widgets and gadgets. For every 5 widgets, it uses 3 units of material A. For every 4 gadgets, it uses 6 units of material A.
On a particular day, the factory produces 20 widgets and 16 gadgets.
If the factory only had 50 units of material A at the start, how many units of material A will it have left after production?
**ANSWER :**
"""

list_of_messages = list()
list_of_messages.append(system_message)
list_of_messages.append(user_message)

list_of_message_lists = list()
list_of_message_lists.append(list_of_messages)

prompt = convert_list_of_message_lists_to_input_prompt(list_of_message_lists, tokenizer)
print(prompt)

In [None]:
from transformers import pipeline
from transformers import GenerationConfig

tokenized_prompt = tokenizer(prompt)

print(f'prompt is {len(tokenized_prompt["input_ids"][0])} tokens')
generation_config = GenerationConfig(max_new_tokens=2000)

#do_sample=True, temperature , top_p, top_k
pipeline = pipeline("text-generation", model=model,  tokenizer=tokenizer, generation_config=generation_config)
reponse = pipeline(prompt, return_full_text=False)
print(reponse[0][0]["generated_text"])

#### Is the model's answer correct?!
**QUESTION :**
A factory produces both widgets and gadgets. For every 5 widgets, it uses 3 units of material A. For every 4 gadgets, it uses 6 units of material A.
On a particular day, the factory produces 20 widgets and 16 gadgets.
If the factory only had 50 units of material A at the start, how many units of material A will it have left after production?
**ANSWER :**

For widgets: 5 widgets → 3 units of material A.
So, 20 widgets → (20 ÷ 5) × 3 = 4 × 3 = 12 units.

For gadgets: 4 gadgets → 6 units of material A.
So, 16 gadgets → (16 ÷ 4) × 6 = 4 × 6 = 24 units.

Total used = 12 + 24 = 36 units.
Starting with 50 units: 50 − 36 = 14 units left.
**The answer is 14.**

<br />

### 🧠 One-Shot Chain of Thought Reasoning

In one-shot learning, the model is provided with a **`single example`** before being asked to solve a similar problem. This helps it understand the expected reasoning style.

**Prompt Example:**

**QUESTION :**
A tank fills in 5 hours with 2 identical pipes working together. How long would it take only one pipe to fill the same tank?

**ANSWER :**

If 2 pipes fill the tank in 5 hours, then 1 pipe would take 2 times longer.
So, 5 × 2 = 10 hours.
**The answer is 10.**

**QUESTION :**
A factory produces both widgets and gadgets. For every 5 widgets, it uses 3 units of material A. For every 4 gadgets, it uses 6 units of material A.
On a particular day, the factory produces 20 widgets and 16 gadgets.
If the factory only had 50 units of material A at the start, how many units of material A will it have left after production?

**ANSWER :**

<br />
<br />


In [None]:
system_message = Message()
system_message["role"] = "system"
system_message["content"] = ""
print(system_message)

user_message = Message()
user_message["role"] = "user"
user_message["content"] = """
**QUESTION :**
A tank fills in 5 hours with 2 identical pipes working together. How long would it take only one pipe to fill the same tank?

**ANSWER :**

If 2 pipes fill the tank in 5 hours, then 1 pipe would take 2 times longer.
So, 5 × 2 = 10 hours.
**The answer is 10.**

**QUESTION :**
A factory produces both widgets and gadgets. For every 5 widgets, it uses 3 units of material A. For every 4 gadgets, it uses 6 units of material A.
On a particular day, the factory produces 20 widgets and 16 gadgets.
If the factory only had 50 units of material A at the start, how many units of material A will it have left after production?
**ANSWER :**
"""

list_of_messages = list()
list_of_messages.append(system_message)
list_of_messages.append(user_message)

list_of_message_lists = list()
list_of_message_lists.append(list_of_messages)

prompt = convert_list_of_message_lists_to_input_prompt(list_of_message_lists, tokenizer)
print(prompt)

In [None]:
#from transformers import pipeline
#from transformers import GenerationConfig

tokenized_prompt = tokenizer(prompt)

print(f'prompt is {len(tokenized_prompt["input_ids"][0])} tokens')

#generation_config = GenerationConfig(max_new_tokens=2000)
#pipeline = pipeline("text-generation", model=model,  tokenizer=tokenizer, generation_config=generation_config,)

reponse = pipeline(prompt, return_full_text=False)
print(reponse[0][0]["generated_text"])

# WOW....!!!!!

### 🧠 Few-Shots Chain of Thought Reasoning

In few-shot learning, the model is given multiple examples that guide it to follow a specific reasoning process before solving a new problem.

**Prompt Example:**

**QUESTION :**
A tank fills in 5 hours with 2 identical pipes working together. How long would it take only one pipe to fill the same tank?

**ANSWER :**
If 2 pipes fill the tank in 5 hours, then 1 pipe would take 2 times longer.
So, 5 × 2 = 10 hours.
**The answer is 10.**

**QUESTION :**
A car travels 300 km using 25 liters of fuel. How much fuel would be needed to travel 480 km at the same rate?

**ANSWER :**
300 km requires 25 liters.
So, 1 km requires 25 ÷ 300 = 0.0833 liters.
Then, 480 × 0.0833 ≈ 40 liters.
**The answer is 40.**

**QUESTION :**
A factory produces both widgets and gadgets. For every 5 widgets, it uses 3 units of material A. For every 4 gadgets, it uses 6 units of material A.
On a particular day, the factory produces 20 widgets and 16 gadgets.
If the factory only had 50 units of material A at the start, how many units of material A will it have left after production?

**ANSWER :**

<br />
 <br />


In [None]:
system_message = Message()
system_message["role"] = "system"
system_message["content"] = ""
print(system_message)

user_message = Message()
user_message["role"] = "user"
user_message["content"] = """
**QUESTION :**
A tank fills in 5 hours with 2 identical pipes working together. How long would it take only one pipe to fill the same tank?

**ANSWER :**

If 2 pipes fill the tank in 5 hours, then 1 pipe would take 2 times longer.
So, 5 × 2 = 10 hours.
**The answer is 10.**

**QUESTION :**
A car travels 300 km using 25 liters of fuel. How much fuel would be needed to travel 480 km at the same rate?

**ANSWER :**
300 km requires 25 liters.
So, 1 km requires 25 ÷ 300 = 0.0833 liters.
Then, 480 × 0.0833 ≈ 40 liters.
**The answer is 40.**

**QUESTION :**
A factory produces both widgets and gadgets. For every 5 widgets, it uses 3 units of material A. For every 4 gadgets, it uses 6 units of material A.
On a particular day, the factory produces 20 widgets and 16 gadgets.
If the factory only had 50 units of material A at the start, how many units of material A will it have left after production?
**ANSWER :**
"""

list_of_messages = list()
list_of_messages.append(system_message)
list_of_messages.append(user_message)

list_of_message_lists = list()
list_of_message_lists.append(list_of_messages)

prompt = convert_list_of_message_lists_to_input_prompt(list_of_message_lists, tokenizer)
print(prompt)

In [None]:
#from transformers import pipeline
#from transformers import GenerationConfig

tokenized_prompt = tokenizer(prompt)

print(f'prompt is {len(tokenized_prompt["input_ids"][0])} tokens')
#generation_config = GenerationConfig(max_new_tokens=2000)

#pipeline = pipeline("text-generation", model=model,  tokenizer=tokenizer, generation_config=generation_config)
reponse = pipeline(prompt, return_full_text=False)
print(reponse[0][0]["generated_text"])

## 🧠🧠🧠 NOW, Let's try bigger model 🧠🧠🧠
### Let's try: **NousResearch/Llama-2-13b-chat-hf**


In [None]:
model_checkpoint = "NousResearch/Llama-2-13b-chat-hf"
tokenizer = LlamaTokenizer.from_pretrained(model_checkpoint)

model = LlamaForCausalLM.from_pretrained(model_checkpoint, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)

model = model.eval() #inference mode



### Zero Shot Learning

In [None]:
#Build the Prompt - Zero shot
system_message = Message()
system_message["role"] = "system"
system_message["content"] = ""

user_message = Message()
user_message["role"] = "user"
user_message["content"] = """
**QUESTION :**
A bakery sells cupcakes in boxes of 6. If a customer buys 8 boxes, and then eats 6 cupcakes, how many cupcakes does the customer have left?
**ANSWER :**
"""

list_of_messages = list()
list_of_messages.append(system_message)
list_of_messages.append(user_message)

list_of_message_lists = list()
list_of_message_lists.append(list_of_messages)

prompt = convert_list_of_message_lists_to_input_prompt(list_of_message_lists, tokenizer)
print(prompt)

In [None]:
from transformers import pipeline
from transformers import GenerationConfig

tokenized_prompt = tokenizer(prompt)

print(f'prompt is {len(tokenized_prompt["input_ids"][0])} tokens')
generation_config = GenerationConfig(max_new_tokens=2000)

pipeline = pipeline("text-generation", model=model,  tokenizer=tokenizer, generation_config=generation_config)
reponse = pipeline(prompt, return_full_text=False)
print(reponse[0][0]["generated_text"])

### 🧠 Now it is your turn..:
**Remember: Use CoT + exmaples to boost model output**

- Try few shots (How many examples do you think your model needs in the prompt?)
- Swap examples, add more examples
- Does the bigger model talks more than required?
- Do you need to specify role in the system message?
- Do you need to tune other generation configurations? (do_sample, temperature , top_p, top_k)
- Do you think complexity of shots and domain relationship would make a difference?
- Do you think the similarity between the model training data and your task plays a role here?
- Does it works with others tasks?

### Let's know what you think in the comments...

