# Tree-of-thoughts prompting in LLMs

In this notebook, I'll employ the [Mixtral8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) LLM model to explore and apply tree-of-thought prompting technique. This involves practically learning these methods by applying on various examples.

This  LLM is capable enough to run on google colab thanks to **Denis Mazur** and **Artyom Eliseev** who have made this possible by quantizing the original model in mixed precision and implementing a MoE-specific offloading strategy. `[Quantization, build_model, Expert MoE files were forked from their github repo]`.

Read their amazing [tech report](https://arxiv.org/abs/2312.17238). I will try to summarize my learnings from their report in the following days in this notebook.


##### *Edited: 01-09-2024.*

## Install and import libraries

In [1]:
from IPython.display import HTML, display, Markdown

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

HTML('''<script>
code_show_err=false;
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
}
$( document ).ready(code_toggle_err);
</script>''')

In [2]:
import numpy
from IPython.display import clear_output

# fix triton in colab
!export LC_ALL="en_US.UTF-8"
!export LD_LIBRARY_PATH="/usr/lib64-nvidia"
!export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"
!ldconfig /usr/lib64-nvidia

!git clone https://github.com/emmanuelrajapandian/Advanced-Prompt-Engineering-LLMs.git --quiet
!cd Advanced-Prompt-Engineering-LLMs && pip install -q -r requirements.txt
!huggingface-cli download lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo --quiet --local-dir Mixtral-8x7B-Instruct-v0.1-offloading-demo

clear_output()

In [3]:
import sys

sys.path.append("Advanced-Prompt-Engineering-LLMs")
import torch
from torch.nn import functional as F
from hqq.core.quantize import BaseQuantizeConfig
from langchain.prompts import PromptTemplate
from huggingface_hub import snapshot_download
from IPython.display import clear_output
from tqdm.auto import trange
from transformers import AutoConfig, AutoTokenizer
from transformers.utils import logging as hf_logging

from source.build_model import OffloadConfig, QuantConfig, build_model

hqq_aten package not installed. HQQBackend.ATEN backend will not work unless you install the hqq_aten lib in hqq/kernels.


The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

## Initialize model

In [4]:
model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"
quantized_model_name = "lavawolfiee/Mixtral-8x7B-Instruct-v0.1-offloading-demo"
state_path = "Mixtral-8x7B-Instruct-v0.1-offloading-demo"

config = AutoConfig.from_pretrained(quantized_model_name)
device = torch.device("cuda:0")

offload_per_layer = 4
num_experts = config.num_local_experts

offload_config = OffloadConfig(
    main_size=config.num_hidden_layers * (num_experts - offload_per_layer),
    offload_size=config.num_hidden_layers * offload_per_layer,
    buffer_size=4,
    offload_per_layer=offload_per_layer,
)


attn_config = BaseQuantizeConfig(
    nbits=4,
    group_size=64,
    quant_zero=True,
    quant_scale=True,
)

attn_config["scale_quant_params"]["group_size"] = 256

ffn_config = BaseQuantizeConfig(
    nbits=2,
    group_size=16,
    quant_zero=True,
    quant_scale=True,
)
quant_config = QuantConfig(ffn_config=ffn_config, attn_config=attn_config)


# Building the Model after specifying the params in the instantiated Classes
# OffloadConfig(), BaseQuantizeConfig().
model = build_model(
    device=device,
    quant_config=quant_config,
    offload_config=offload_config,
    state_path=state_path,
)

clear_output()

## Run the model

In [5]:
from transformers import TextStreamer

tokenizer = AutoTokenizer.from_pretrained(model_name)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
past_key_values = None
sequence = None

clear_output()

In [41]:
def generate_outputs(prompt:str, temperature:float, max_tokens:int=1000, do_sample:bool=True, stop_sequences:list=[], n:int=1) -> list:
    """
    Function to run inference with Mixtral model
    """
    user_entry = dict(role="user", content=prompt)
    input_ids = tokenizer.apply_chat_template([user_entry], return_tensors="pt").to(device)

    outputs = []
    for i in range(n):
        outputs.append(model.generate(input_ids=input_ids,
                                      streamer=streamer,
                                      do_sample=do_sample,
                                      temperature=temperature,
                                      top_p=0.9,
                                      max_new_tokens=max_tokens,
                                      pad_token_id=tokenizer.eos_token_id,
                                      return_dict_in_generate=True,
                                      output_hidden_states=False,))
    return outputs

In [None]:
!pip install igraph --q
!pip install plotly --q

In [88]:
from totclasscode import TextTaskToT

In [97]:
title = "Impact of Artificial Intelligence (AI) on Job Market"
start = "AI will affect almost 40 percent of jobs around the world, replacing some and complementing others."
mid = '''" more opportunities to leverage AI benefits"'''
end = '''"The AI era is upon us, and it is still within our power to ensure it brings prosperity for all."'''

input_data = (title, start, mid, end)

In [98]:
TEMP = 0.75
output = TextTaskToT(input_data, temperature= TEMP)

In [99]:
prompt = output.wrap_prompt("standard")
print(prompt)



Human: Write a passage of 4 paragraphs.

The first line must contain only this title:
Impact of Artificial Intelligence (AI) on Job Market

The first paragraph must start with sentence:
AI will affect almost 40 percent of jobs around the world, replacing some and complementing others.

The second paragram must contain the words:
" more opportunities to leverage AI benefits"

The last paragraph must end with sentence:
"The AI era is upon us, and it is still within our power to ensure it brings prosperity for all."

Assistant:


Standard prompting of an Article regarding AI and Jobs using a template predefined in python file.

In [100]:
result = output.make_passages(method= "standard")

Impact of Artificial Intelligence (AI) on Job Market

AI will affect almost 40 percent of jobs around the world, replacing some and complementing others. The advent of AI in the job market is a significant development, one that is bound to bring about substantial changes in the way we perceive and approach work. It's estimated that around 3,500 major corporate jobs could be automated in the United Kingdom alone, according to a recent study. This statistic underscores the immediate and potential impact of AI on employment.

However, it's important to note that AI won't solely be a force of reduction in the job market. Instead, it will provide more opportunities to leverage AI benefits. The integration of AI in various industries will lead to the creation of new jobs that require a unique blend of human and machine capabilities. These new roles will be geared towards managing, maintaining, and operating AI technologies, necessitating a significant shift in the skill sets and competencies