## Testing Chain-of-Thought Reasoning Without Prompting with Mistral-7B-Instruct

In this short notebook, we utilize the 4-bit quantization of the Mistral-7B-Instruct to explore the recent findings from Google DeepMind's paper, ["Chain-of-Thought Reasoning Without Prompting"](https://arxiv.org/abs/2402.10200). 

- The Authors develop a method to sift through the top-𝑘 decoding paths, referred to as CoT-decoding, which isolates the most reliable paths for model output.
- "CoT-decoding offers an alternative way to elicit reasoning capabilities from pre-trained LLMs without explicit prompting."
- "Our study reveals that pre-trained language models inherently possess reasoning capabilities, as evidenced by their generation of CoT reasoning paths when examining alternative top tokens during decoding, rather than relying on greedy decoding."
- The authors focused on pre-trained models rather than instruction fine-tuned models, but did some experimenting with Mistral-7B-Instruct, in particular.
- An implementation and experimentation of CoT-decoding is shown below.

In [1]:
# Load necessary packages
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
from tqdm import tqdm

In [2]:
# Configure the model to use 4-bit quantization
config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load the model and tokenizer, using Mistral-7B-v0.2
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", quantization_config=config)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

bin c:\Users\garre\Documents\GitHub\Chain-of-Thought-Reasoning-Without-Prompting\venv\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
# Get our initial top k tokens
def get_topk_tokens(model, inputs, num_branches=10):
        
    # Generate logits for the next token after the prompt 
    with torch.no_grad():
        outputs = model(**inputs, return_dict=True)
        next_token_logits = outputs.logits[:, -1, :]
    
    # Apply softmax to convert logits to probabilities
    probabilities = torch.softmax(next_token_logits, dim=-1)

    # Get the top k tokens and their probabilities
    topk_values, topk_indicies = torch.topk(probabilities, num_branches)

    return topk_values, topk_indicies


# Generate a full response from the model and log the difference in probabilities between the top two tokens
def generate_response(model, tokenizer, inputs, max_length=500):

    # Create variables to store our response and each token's probabilities
    response = []
    response_probs = []
    # Loop through the max length of the response
    for i in range(max_length):

        # Generate the logits for the next token
        topk_values, topk_indices = get_topk_tokens(model, inputs, num_branches=2)

        # Get the difference in probabilities between the top two tokens
        prob_diff = topk_values[:, 0] - topk_values[:, 1]
        response_probs.append(prob_diff.item())

        # Append the most likely token to the response
        response.append(topk_indices[:, 0])

        # Stop if this token is the end of sequence token
        if topk_indices[:, 0] == tokenizer.eos_token_id:
            break

        # Add the token to the input for the next iteration
        inputs['input_ids'] = torch.cat([inputs['input_ids'], topk_indices[:, 0].unsqueeze(-1)], dim=1)

    return inputs['input_ids'], response_probs

# Generate all branching responses
def generate_branching_responses(model, tokenizer, prompt, num_branches=10, max_length=500):

    # First we tokenize the prompt
    inputs = tokenizer(prompt, return_tensors="pt")

    # Get our initial top k tokens
    _, topk_indices = get_topk_tokens(model, inputs, num_branches)

    # Create a list to store our responses and each token's probabilities
    responses = []
    response_probs = []
    for k in tqdm(range(num_branches)):

        # Add the kth most likely token to this new branch
        new_input_ids = inputs.copy()
        new_input_ids['input_ids'] = torch.cat([inputs['input_ids'], topk_indices[:, k].unsqueeze(-1)], dim=1)

        # Generate a response and log the difference in probabilities between the top two tokens
        response, probs = generate_response(model, tokenizer, new_input_ids, max_length)

        # Append the response to our list
        responses.append(tokenizer.batch_decode(response))

        # Determine the average difference in probabilities for this response
        response_probs.append(sum(probs) / len(probs))

    return responses, response_probs

In [5]:
# Use the instruction format for Mistral-7B instruct
prompt = """[INST] Was Nicolas Cage born in an even or odd year? [/INST]"""

# Generate branching responses
responses, response_probs = generate_branching_responses(model, tokenizer, prompt, num_branches=10, max_length=250)

# Print responses and scores
print('Prompt:', prompt)
for k, response, prob in zip(range(len(responses)), responses, response_probs):
    print(f'\nResponse k={k}:\n\n', response[0].split('[/INST]')[1])
    print('\nScore:', prob)

100%|██████████| 10/10 [02:06<00:00, 12.61s/it]

Prompt: [INST] Was Nicolas Cage born in an even or odd year? [/INST]

Response k=0:

  Nicolas Cage was born on January 7, 1964. The year 1964 is an even number since it is a multiple of 2. Therefore, Nicolas Cage was born in an even year.

Score: 0.9388242900371552

Response k=1:

  Yes, Nicolas Cage was born in an odd year. He was born on January 7, 1964, which is an odd year.

Score: 0.9637579719225565

Response k=2:

  yes, Nicolas Cage was born in an odd year. He was born on January 7, 1964, which is an odd year.

Score: 0.9686339398225149

Response k=3:

 yes, Nicolas Cage was born in an odd year. He was born on January 7, 1964, which is an odd year.

Score: 0.9651062542741949

Response k=4:

 Yes, Nicolas Cage was born in an odd year. He was born on January 7, 1964, which is an odd year.

Score: 0.9605302341056593

Response k=5:

  Nicholas Cage was born on January 7, 1964. The year 1964 is an even number since it is a multiple of 2. Therefore, Nicholas Cage was born in an even 




In [8]:
# Use the instruction format for Mistral-7B instruct
prompt = """[INST] Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends
every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in
dollars does she make every day at the farmers' market? [/INST]"""

# Generate branching responses
responses, response_probs = generate_branching_responses(model, tokenizer, prompt, num_branches=10, max_length=500)

# Print responses and scores
print('Prompt:', prompt)
for k, response, prob in zip(range(len(responses)), responses, response_probs):
    print(f'\nResponse k={k}:\n\n', response[0].split('[/INST]')[1])
    print('\nScore:', prob)

100%|██████████| 10/10 [11:16<00:00, 67.63s/it]

Prompt: [INST] Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends
every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in
dollars does she make every day at the farmers' market? [/INST]

Response k=0:

  Let's first find out how many eggs Janet has left after consuming some for breakfast and baking muffins.

1. Janet consumes 3 eggs for breakfast, so she has:
   Eggs left = Ducks laying eggs per day - Eggs for breakfast
   = 16 eggs/day - 3 eggs/day
   = 13 eggs/day

2. She uses 4 eggs for baking muffins, so she has:
   Eggs left = Eggs left after breakfast - Eggs for baking muffins
   = 13 eggs/day - 4 eggs/day
   = 9 eggs/day

Now, let's calculate how much money she makes by selling these eggs at the farmers' market.

3. She sells each egg for $2, so her daily earnings are:
   Daily earnings = Eggs left * Price per egg
   = 9 eggs/day * $2/egg
   = $18/day

So, Janet 




In [6]:
# Use the instruction format for Mistral-7B instruct
prompt = """[INST] I have 3 apples, my dad has 2 more apples than me, how many apples do we have in total? [/INST]"""

# Generate branching responses
responses, response_probs = generate_branching_responses(model, tokenizer, prompt, num_branches=10, max_length=250)

# Print responses and scores
print('Prompt:', prompt)
for k, response, prob in zip(range(len(responses)), responses, response_probs):
    print(f'\nResponse k={k}:\n\n', response[0].split('[/INST]')[1])
    print('\nScore:', prob)

100%|██████████| 10/10 [04:26<00:00, 26.60s/it]

Prompt: [INST] I have 3 apples, my dad has 2 more apples than me, how many apples do we have in total? [/INST]

Response k=0:

  You have 3 apples, and your dad has 2 apples more than that, so your dad has 3 apples + 2 = 5 apples. Therefore, you both have a total of 3 apples (yours) + 5 apples (your dad's) = 8 apples.

Score: 0.9008669594059819

Response k=1:

  you have 3 apples, and your dad has 3 apples + 2 = 5 apples, so in total you have 3 apples + 5 apples = 8 apples.

Score: 0.90053237026388

Response k=2:

 You have 3 apples, and your dad has 2 apples more than that, so your dad has 3 apples + 2 = <box>5 apples</box>. Therefore, you both have a total of 3 apples + 5 apples = <box>8 apples</box>.

Score: 0.8931514896563629

Response k=3:

  To find out how many apples you and your dad have in total, you can follow these steps:

1. You have 3 apples.
2. Your dad has 2 more apples than you, so he has 3 + 2 = 5 apples.
3. To find the total number of apples, add the number of apples


