# Self-Consistency and Chain-of-thoughts prompting in LLMs

In this notebook, I'll employ the [Mixtral8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) LLM model to explore and apply chain-of-thought and self-consistency prompting techniques. This involves practically learning these methods by applying on various examples.

This  LLM is capable enough to run on google colab thanks to **Denis Mazur** and **Artyom Eliseev** who have made this possible by quantizing the original model in mixed precision and implementing a MoE-specific offloading strategy. `[Quantization, build_model, Expert MoE files were forked from their github repo]`.

Read their [tech report](https://arxiv.org/abs/2312.17238). I will try to summarize my learnings from their report in the following days in this notebook.


##### *Edited: 01-09-2024.*

## Install and import libraries

In [None]:
from IPython.display import HTML, display, Markdown

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [None]:
import numpy
from IPython.display import clear_output

# fix triton in colab
!export LC_ALL="en_US.UTF-8"
!export LD_LIBRARY_PATH="/usr/lib64-nvidia"
!export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"
!ldconfig /usr/lib64-nvidia

!git clone https://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@github.com/emmanuelrajapandian/Advanced-Prompt-Engineering-LLMs.git --quiet
!cd Advanced-Prompt-Engineering-LLMs && pip install -q -r requirements.txt
!huggingface-cli download lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo --quiet --local-dir Mixtral-8x7B-Instruct-v0.1-offloading-demo

clear_output()

In [None]:
import sys

sys.path.append("Advanced-Prompt-Engineering-LLMs")
import torch
from torch.nn import functional as F
from hqq.core.quantize import BaseQuantizeConfig
from huggingface_hub import snapshot_download
from IPython.display import clear_output
from tqdm.auto import trange
from transformers import AutoConfig, AutoTokenizer
from transformers.utils import logging as hf_logging

from source.build_model import OffloadConfig, QuantConfig, build_model

hqq_aten package not installed. HQQBackend.ATEN backend will not work unless you install the hqq_aten lib in hqq/kernels.


The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

## Initialize model

In [None]:
model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"
quantized_model_name = "lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo"
state_path = "Mixtral-8x7B-Instruct-v0.1-offloading-demo"

config = AutoConfig.from_pretrained(quantized_model_name)
device = torch.device("cuda:0")

offload_per_layer = 4
num_experts = config.num_local_experts

offload_config = OffloadConfig(
    main_size=config.num_hidden_layers * (num_experts - offload_per_layer),
    offload_size=config.num_hidden_layers * offload_per_layer,
    buffer_size=4,
    offload_per_layer=offload_per_layer,
)


attn_config = BaseQuantizeConfig(
    nbits=4,
    group_size=64,
    quant_zero=True,
    quant_scale=True,
)

attn_config["scale_quant_params"]["group_size"] = 256

ffn_config = BaseQuantizeConfig(
    nbits=2,
    group_size=16,
    quant_zero=True,
    quant_scale=True,
)
quant_config = QuantConfig(ffn_config=ffn_config, attn_config=attn_config)


# Building the Model after specifying the params in the instantiated Classes
# OffloadConfig(), BaseQuantizeConfig().
model = build_model(
    device=device,
    quant_config=quant_config,
    offload_config=offload_config,
    state_path=state_path,
)

clear_output()

## Run the model

In [None]:
from transformers import TextStreamer

tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
past_key_values = None
sequence = None

clear_output()

In [None]:
def generate_outputs(prompt:str, temperature:float, max_tokens:int=128, stop_sequences:list=[], n:int=1) -> list:
    """
    Function to run inference with Mixtral model
    """
    user_entry = dict(role="user", content=prompt)
    input_ids = tokenizer.apply_chat_template([user_entry], return_tensors="pt").to(device)

    if past_key_values is None:
      attention_mask = torch.ones_like(input_ids)
    else:
      seq_len = input_ids.size(1) + past_key_values[0][0][0].size(1)
      attention_mask = torch.ones([1, seq_len - 1], dtype=torch.int, device=device)

    outputs = []
    for i in range(n):
        outputs.append(model.generate(input_ids=input_ids,
                                      attention_mask=attention_mask,
                                      past_key_values=past_key_values,
                                      streamer=streamer,
                                      do_sample=True,
                                      temperature=temperature,
                                      top_p=0.9,
                                      max_new_tokens=max_tokens,
                                      pad_token_id=tokenizer.eos_token_id,
                                      return_dict_in_generate=True,
                                      output_hidden_states=True,))
    return outputs

In [None]:
TEMP = 0.9
output = generate_outputs("What is the pet of Phineas and Ferb called and what does it do?", TEMP, max_tokens=128)

Phineas Flynn and Ferb Fletcher's pet in the animated show "Phineas and Ferb" is a platypus named Perry. However, Perry is not just an ordinary pet, as he leads a double life: when the boys are not using him for a backyard adventure, Perry removes his pet collar and becomes Agent P, a secret agent fighting evil.

Perry, the pet platypus, does not have any supernatural powers, but he is very intelligent, agile, and skilled in martial arts. He uses gadgets and tools created by Phineas


In [None]:
TEMP = 0.9
output = generate_outputs("What are the commands in Mac to push, pull, merge repos in github from the local terminal?",
                          TEMP, max_tokens=256)

To perform Git operations on a Mac, you will first need to navigate to the local directory that contains the Git repository using the `cd` command. Once you are in the directory that contains the Git repository, you can use the following commands:

* To pull changes from the remote repository, use the command:
```
git pull origin <branch-name>
```
* To push your local changes to the remote repository, use the command:
```
git push origin <branch-name>
```
* To merge changes from one branch onto another, use the command:
```
git merge branch-name
```
Here, `origin` refers to the remote repository where the Git operations are being performed, and `<branch-name>` is the name of the branch that you want to merge.

You may also need to use the `git add` and `git commit` commands to add and commit changes to your local repository before pushing them to the remote repository.

To learn more about these and other Git commands, you can refer to the official Git documentation.
