<a href="https://colab.research.google.com/github/polyexplorer/open-llm/blob/main/ZephyrWrapper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dependencies


In [6]:
! pip install colabcode

Collecting colabcode
  Downloading colabcode-0.3.0-py3-none-any.whl (5.0 kB)
Collecting pyngrok>=5.0.0 (from colabcode)
  Downloading pyngrok-7.0.0.tar.gz (718 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m718.7/718.7 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting nest-asyncio==1.4.3 (from colabcode)
  Downloading nest_asyncio-1.4.3-py3-none-any.whl (5.3 kB)
Collecting uvicorn==0.13.1 (from colabcode)
  Downloading uvicorn-0.13.1-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting jupyterlab==3.0.7 (from colabcode)
  Downloading jupyterlab-3.0.7-py3-none-any.whl (8.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.3/8.3 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
Collecting jupyterlab-server~=2.0 (from jupyterlab==3.0.7->colabcode)
  Downloading jupyterlab_server-

In [2]:
#Dependencies
! pip install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
! pip install optimum
! pip install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
! pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # Use cu117 if on CUDA 11.7
! pip install langchain
! pip install langchainhub
! pip install duckduckgo-search
! pip install colabcode

Collecting git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
  Cloning https://github.com/huggingface/transformers.git (to revision 72958fcd3c98a7afdc61f953aa58c544ebda2f79) to /tmp/pip-req-build-vcx9jxw1
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-vcx9jxw1
  Running command git rev-parse -q --verify 'sha^72958fcd3c98a7afdc61f953aa58c544ebda2f79'
  Running command git fetch -q https://github.com/huggingface/transformers.git 72958fcd3c98a7afdc61f953aa58c544ebda2f79
  Running command git checkout -q 72958fcd3c98a7afdc61f953aa58c544ebda2f79
  Resolved https://github.com/huggingface/transformers.git to commit 72958fcd3c98a7afdc61f953aa58c544ebda2f79
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting huggingface-hub<1.0,>=0.16.4 (from transfor

In [1]:
# LLM Wrapper
from transformers import AutoModelForCausalLM, AutoTokenizer,GPTQConfig, pipeline,TextStreamer
import torch
from typing import Any, List, Mapping, Optional
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from pydantic import BaseModel, Field

class ModelParams(BaseModel):
  model_id : str
  prompt_format: str
  revision : str = Field(default = 'main')


class ZephyrLLM:
    def __init__(self, model_params: ModelParams):
        # Refresh CUDA Memory
        torch.cuda.empty_cache()
        self.prompt_format = model_params.prompt_format
        self.model_id = model_params.model_id
        self.revision = model_params.revision
        self.model,self.tokenizer = self.get_model()
        streamer = TextStreamer(self.tokenizer, skip_prompt=True, skip_special_tokens=True)
        self.pipe = pipeline(
            "text-generation",
            model=self.model,
            tokenizer=self.tokenizer,
            max_new_tokens=512,
            do_sample=True,
            temperature=0.1,
            top_k=40,
            top_p=0.95,
            repetition_penalty=1.15,
            streamer=streamer,
        )


    def format_prompt(self,prompt, instruction=None):
        if not instruction:
           instruction = 'You are a good AI assistant. Answer the question as accurately as possible.'
        formatted_prompt =  self.prompt_format.format(prompt = prompt, instruction = instruction)
        print("Formatted Prompt:",formatted_prompt)
        return formatted_prompt

    def generate_instruction(
        self,
        prompt:str,
        instruction:str = 'Think carefully and answer the given question as truthfully as possible',
        llm_template = None
    ):
        if not llm_template:
            llm_template = self.format_prompt
        instruction_format = f"""### Instruction: {instruction}:

    ### Input:
    Question: {prompt}

    ### Response:
    """
        if llm_template:
            return llm_template(instruction_format)
        else:
            return instruction_format


    def get_model(self):
        model_id = self.model_id
        # To use a different branch, change revision
        # For example: revision="main"
        quantization_config_loading = GPTQConfig(bits=4, use_exllama = False)
        model = AutoModelForCausalLM.from_pretrained(model_id,
                                                  quantization_config=quantization_config_loading,
                                                  device_map="cuda",
                                                  trust_remote_code=True,
                                                  revision=self.revision)

        tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
        return model, tokenizer

    def _predict(self, prompt, instruction = None):
        formatted_prompt = self.format_prompt(prompt = prompt,instruction=instruction)
        # print(f"Formatted Prompt: \n {formatted_prompt}")
        return self.pipe(formatted_prompt)[0]['generated_text']

    def predict(self,prompt, instruction = None):
        return self._predict(prompt,instruction).split(r'<|assistant|>')[-1].strip()

    def ask(self,question,instruction = None):
        formatted_prompt = self.generate_instruction(prompt=question,instruction=instruction)
        return self.predict(formatted_prompt)


In [2]:
model_params = ModelParams(model_id = "TheBloke/zephyr-7B-alpha-GPTQ",
revision = "gptq-4bit-32g-actorder_True",
prompt_format = """<|system|>
{instruction}
</s>
<|user|>
{prompt}</s>
<|assistant|>
"""
)
model = ZephyrLLM(model_params)

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute and has already quantized weights. However, loading attributes (e.g. disable_exllama, use_cuda_fp16, max_input_length) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:
answer = model.predict("What are some different ideas for a date night in 20s?")



Formatted Prompt: <|system|>
You are a good AI assistant. Answer the question as accurately as possible.
</s>
<|user|>
What are some different ideas for a date night in 20s?</s>
<|assistant|>

1. Go to an art exhibit or gallery opening and discuss your favorite pieces of artwork over drinks at a nearby bar.
2. Take a cooking class together, learn new skills, and enjoy a romantic dinner you've prepared.
3. Attend a live music concert or show, dance the night away, and grab dessert afterward.
4. Rent bikes and explore a scenic trail or park, followed by a picnic lunch or dinner.
5. Try a new escape room challenge or puzzle game, work together to solve it, and celebrate with cocktails.
6. Visit a local brewery or winery for a tasting tour and pairing experience.
7. Watch a movie under the stars at an outdoor cinema or drive-in theater.
8. Have a spa day at home, complete with massages, facials, and relaxing baths.
9. Play board games or video games at a retro arcade or game cafe.
10. Go o

In [4]:
print(answer)

1. Go to an art exhibit or gallery opening and discuss your favorite pieces of artwork over drinks at a nearby bar.
2. Take a cooking class together, learn new skills, and enjoy a romantic dinner you've prepared.
3. Attend a live music concert or show, dance the night away, and grab dessert afterward.
4. Rent bikes and explore a scenic trail or park, followed by a picnic lunch or dinner.
5. Try a new escape room challenge or puzzle game, work together to solve it, and celebrate with cocktails.
6. Visit a local brewery or winery for a tasting tour and pairing experience.
7. Watch a movie under the stars at an outdoor cinema or drive-in theater.
8. Have a spa day at home, complete with massages, facials, and relaxing baths.
9. Play board games or video games at a retro arcade or game cafe.
10. Go on a hike or nature walk, pack a lunch, and enjoy the scenery.


In [5]:
answer = model.predict(prompt = "I have a chord progression: B A G#min G. What key/modes do they belong in ? What scale would be ideal for a guitar solo ?", instruction = 'You are a good songwriter. Answer the question your profound musical knowledge.')

Formatted Prompt: <|system|>
You are a good songwriter. Answer the question your profound musical knowledge.
</s>
<|user|>
I have a chord progression: B A G#min G. What key/modes do they belong in ? What scale would be ideal for a guitar solo ?</s>
<|assistant|>

The chords you provided (B, A, G#m, and G) can be interpreted as being in either the key of C major or the relative minor key of F major. In C major, this would be the V-IV-vi-V progression, while in F major it would be vi-iv-iii-IV. To determine which interpretation is correct, look at the context of the music to see if there's any indication that one key is more likely than the other.

For a guitar solo, an ideal scale would depend on the chosen key. If we assume the progression is in the key of C major, then the ideal scales for a guitar solo would be the C major scale (C D E F G A), the D major scale (D E F# G A), or the G major scale (G A B C D). These scales contain all the notes found in the chords used in the progressi