## lmsys/vicuna-13b-v1.3

In this project we experiment with [lmsys/vicuna-13b-v1.3]( https://huggingface.co/lmsys/vicuna-13b-v1.3) and observe the model performance and the way it generates prompt outputs. Vicuna is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

 Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achieves more than 90% quality of OpenAI ChatGPT and Google BARD while outperforming other models like LLaMA and Stanford Alpaca in more than 90%.

[ Learn more](https://lmsys.org/blog/2023-03-30-vicuna/)

### Import Libraries

If transformer is not installed previously in your environment execute the following cell:

In [None]:
!pip install git+https://github.com/huggingface/transformers.git@refs/pull/25740/head accelerate

If allready present, import the libraries

In [None]:
import transformers
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

In [None]:
path ="/home/kgupta/LLM/lmsys_vicuna_13b_v1_3"
tok_path='/home/kgupta/LLM/lmsys_vicuna_13b_v1_3/tokenizer'

In [None]:
model =AutoModelForCausalLM.from_pretrained(path) # helps you load the NLP model easily in sequential manner

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

AutoTokenizer is a class from the Hugging Face Transformers library that allows you to automatically load tokenizers for different pre-trained models without specifying the exact model type.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(path)
tokenizer.save_pretrained(tok_path)

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


('/home/kgupta/LLM/lmsys_vicuna_13b_v1_3/tokenizer/tokenizer_config.json',
 '/home/kgupta/LLM/lmsys_vicuna_13b_v1_3/tokenizer/special_tokens_map.json',
 '/home/kgupta/LLM/lmsys_vicuna_13b_v1_3/tokenizer/tokenizer.model',
 '/home/kgupta/LLM/lmsys_vicuna_13b_v1_3/tokenizer/added_tokens.json',
 '/home/kgupta/LLM/lmsys_vicuna_13b_v1_3/tokenizer/tokenizer.json')

### Create Pipeline

- `transformers.pipeline()`:

    This is a function provided by the Hugging Face Transformers library for easily creating pipelines for various NLP tasks. Pipelines streamline the process of applying pre-trained models to text without needing to write custom code.

- `tokenizer=tokenizer`:

    Here, you're passing the tokenizer object to the pipeline. This tells the pipeline how to tokenize the input text before passing it to the model.
    
- `model=model`:

    This specifies the pre-trained model you want to use for text generation. The model variable should be an instance of a pre-trained language model (e.g., GPT-2, GPT-3, etc.).
    
- `torch_dtype=torch.float16`:

    This sets the torch data type to float16. This means that the model will perform computations using half-precision floating-point numbers, which can lead to faster computation times and reduced memory usage.

- `device_map="auto"`:

    This argument is specifying that the pipeline should automatically detect the appropriate device for computation.


In [None]:
pipeline = transformers.pipeline(
    "text-generation",
    tokenizer=tokenizer,
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Checking if GPU is working

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if device.type == 'cuda':
    print('GPU is available')
else:
    print('Using CPU')

GPU is available


### Prompt - 1

In [None]:
system = f'''Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

'''

prompt = f"""\
### Human:Add a+ b
### Assistant:\
"""

Create a sequence with your preferred parameters

Parameters:
- `prompt`: The input prompt or starting text for text generation.

- `do_sample=True`: Enables random sampling of the next token, introducing randomness in the generated text.

- `top_k=10`: Controls the number of highest probability tokens considered in the sampling. Higher values make the generation more focused.

- `temperature=0.5`: Controls the randomness of the generation. Higher values (e.g., > 1) make the output more random, while lower values (e.g., < 1) make it more deterministic.

- `top_p=0.95`: Defines a threshold for the cumulative probability of the next token. Only tokens with a cumulative probability up to `top_p` will be considered.

- `eos_token_id=tokenizer.eos_token_id`: Specifies the End-of-Sequence token ID. This helps the model know when to stop generating text.

- `num_return_sequences=1`: Determines how many different sequences will be generated. In this case, it's set to 1.

- `max_length=1024`: Sets the maximum length of the generated sequence.

- `add_special_tokens=False`: Specifies whether to add special tokens like `[CLS]`, `[SEP]`, etc. to the generated output.

- `repetition_penalty=1.1`: Controls how much the model avoids repeating the same token. Higher values (greater than 1) discourage repetition.


In [None]:
sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    temperature=0.5,
    top_p=0.95,
    eos_token_id=tokenizer.eos_token_id,
    num_return_sequences=1,
    max_length=1024,
    add_special_tokens=False,
    repetition_penalty = 1.1
)

In [None]:
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Result: ### Human:Add a+ b 
### Assistant: Sure, here's the calculation for adding a and b using the + operator:
```makefile
a = 10
b = 5
result = a + b
print(result) # Output: 15
```
I hope that helps! Let me know if you have any other questions.


### Prompt - 2

In [None]:
system = f'''Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

'''

prompt = f"""\
### Human: {system}
### Assistant:\
"""

In [None]:
sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    temperature=0.5,
    top_p=0.95,
    eos_token_id=tokenizer.eos_token_id,
    num_return_sequences=1,
    max_length=4096,
    add_special_tokens=False,
    repetition_penalty = 1.1
)

In [None]:
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Result: ### Human: Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.


### Assistant: Aloha! I recently had the opportunity to explore the beautiful islands of Hawaii and I'm excited to share my experience with you all.

My journey began on the island of Oahu, where I was immediately struck by the rich culture and history of the area. One of the first things I did was visit the Pearl Harbor Memorial, which was both moving and educational. It's important to remember the events that led up to the United States' entry into World War II, and this memorial does an excellent job of honoring those who lost their lives.

Next, I headed over to the Polynesian Cultural Center, which was one of the highlights of my trip. This interactive museum allowed me to learn about the different Polynesian cultures that make up Hawaii, as well as other Pacific Island nations. I got to participate in traditional dances, try my hand 