# Install Libraries

In [1]:
!pip install -q -U bitsandbytes transformers -qq

# Load Fine-Tuned Model

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

finetuned_model_checkpoint = "iamsubrata/Llama-2-7b-chat-hf-sharded-bf16-fine-tuned"

bitsandbytes_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(finetuned_model_checkpoint)
model = AutoModelForCausalLM.from_pretrained(finetuned_model_checkpoint, quantization_config=bitsandbytes_config, device_map={"":0})

tokenizer_config.json:   0%|          | 0.00/719 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/661 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

pytorch_model-00001-of-00003.bin:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

pytorch_model-00002-of-00003.bin:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

pytorch_model-00003-of-00003.bin:   0%|          | 0.00/3.59G [00:00<?, ?B/s]



Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/192 [00:00<?, ?B/s]



* **iamsubrata/Llama-2-7b-chat-hf-sharded-bf16-fine-tuned**: FineTuned **Llama-2-7b-chat-hf-sharded-bf16** (optimized for dialogue use cases) 
on `Enhglish Quotes Dataset`.

* **BitsAndBytesConfig** is used to configure for **model quantization**, which is a technique to reduce the memory and computation requirements of LLM by using lower-precision data types.
    * **load_in_4bit**: Load the model weights in 4-bit precision, which reduces the model size by **75%** compared to **16-bit precision**.
    
    * **bnb_4bit_use_double_quant**: Leverages **double quantization**, which is a method to further compress the model by using `two quantization levels`: one for the **most frequent values** and one **for the rest**.
    
    * **bnb_4bit_quant_type**: Quantization type: **nf4**, which stands for **nibble float 4**, a custom data type that uses 4 bits to
    represent floating-point numbers.
    
    - **bnb_4bit_compute_dtype**: This specifies the data type to use for the model computation, **torch.bfloat16**, which is a `16-bit floating-point` format that preserves more dynamic range than 16-bit integer formats.

* A **causal language model** is a type of generative model that **predicts the next token in a sequence given the previous tokens**.

# Inference

* **TextStreamer** class enables streaming of the generated text, which means that the text is printed to the standard output as soon as each token is generated, instead of waiting for the whole response to be generated.

In [3]:
from transformers import TextStreamer

# Define a stream *without* function calling capabilities
def stream(user_prompt,model):
    runtimeFlag = "cuda:0"
    system_prompt = 'You are a helpful assistant that provides accurate and concise responses'

    B_INST, E_INST = "[INST]", "[/INST]"
    B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

    prompt = f"{B_INST} {B_SYS}{system_prompt.strip()}{E_SYS}{user_prompt.strip()} {E_INST}\n\n"

    inputs = tokenizer([prompt], return_tensors="pt").to(runtimeFlag)

    streamer = TextStreamer(tokenizer)

    # Despite returning the usual output, the streamer will also print the generated text to stdout.
    _ = model.generate(**inputs, streamer=streamer, max_new_tokens=500)

In [12]:
stream('Can you tell me quotes related to universe?.',model)

<s> [INST] <<SYS>>
You are a helpful assistant that provides accurate and concise responses
<</SYS>>

Can you tell me quotes related to universe?. [/INST]

Of course! Here are some quotes related to the universe:

1. "The universe is a pretty big place. If it's just us, seems like an awful waste of space." - Carl Sagan
2. "The universe is not only stranger than we think, it is stranger than we can think." - Werner Heisenberg
3. "The universe is not a fixed, unchanging thing, but a constantly evolving, dynamic system." - Brian Greene
4. "The universe is a vast and intricate tapestry, woven from threads of matter, energy, and information." - Michio Kaku
5. "The universe is not only a vast and complex system, but it is also a system of relationships, interdependencies, and interactions." - Stephen Hawking
6. "The universe is a mirror of the human soul, reflecting our deepest desires, fears, and hopes." - Unknown
7. "The universe is a vast and mysterious place, full of wonders and secrets 

In [13]:
stream('Make me Laugh.',model)

<s> [INST] <<SYS>>
You are a helpful assistant that provides accurate and concise responses
<</SYS>>

Make me Laugh. [/INST]

Why did the scarecrow win an award?

Because he was outstanding in his field! 😂🌽</s>


In [14]:
stream('Tell me some thoughtful quotes with deep meaning.',model)

<s> [INST] <<SYS>>
You are a helpful assistant that provides accurate and concise responses
<</SYS>>

Tell me some thoughtful quotes with deep meaning. [/INST]

Certainly! Here are some thoughtful quotes with deep meaning that you might find inspiring or helpful:

1. "The future belongs to those who believe in the beauty of their dreams." - Eleanor Roosevelt

This quote emphasizes the importance of having faith in one's aspirations and dreams. It suggests that those who believe in their dreams are more likely to create a brighter future for themselves and those around them.

2. "The greatest glory in living lies not in never falling, but in rising every time we fall." - Nelson Mandela

This quote highlights the importance of resilience and perseverance in the face of adversity. It encourages us to embrace challenges and setbacks as opportunities for growth and learning, rather than giving up or losing hope.

3. "The biggest adventure you can take is to live the life of your dreams." - 