<a href="https://colab.research.google.com/github/rida-manzoor/GenAI/blob/main/4_1_OpensourceLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Open Sourced LLM
Open-source language models (LLMs) refer to language models whose source code is publicly available for viewing, modification, and distribution. These models are developed collaboratively by communities of researchers, developers, and enthusiasts who contribute to their improvement and accessibility.

Key characteristics of open-source LLMs include:

1. **Transparency:** The source code of open-source LLMs is openly accessible, allowing researchers and developers to study the internal workings of the model, understand its algorithms, and identify potential biases or limitations.

2. **Collaboration:** Open-source LLMs encourage collaboration and contributions from a diverse community of users. This collaborative effort often leads to faster innovation, bug fixes, and improvements in model performance.

3. **Customization:** Users of open-source LLMs have the flexibility to customize and fine-tune the model according to their specific needs and use cases. This includes modifying model architectures, training on domain-specific data, and integrating with other tools and frameworks.

4. **Accessibility:** By making the source code freely available, open-source LLMs promote accessibility and democratize access to advanced natural language processing (NLP) capabilities. This enables a wider range of applications and benefits for researchers, businesses, and society as a whole.

Examples of popular open-source LLMs include:

- **GPT-2:** Developed by OpenAI, GPT-2 is an open-source language model known for its text generation capabilities across various domains.

- **BERT (Bidirectional Encoder Representations from Transformers):** Developed by Google, BERT is an open-source LLM that has achieved state-of-the-art performance in various NLP tasks such as question answering and text classification.

- **Llama2:** Developed by Meta AI. Available in 7B, 13B, and 70B parameters versions.


In [None]:
!pip install transformers



In [None]:
!pip install accelerate

Collecting accelerate
  Downloading accelerate-0.30.1-py3-none-any.whl (302 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/302.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.7/302.6 kB[0m [31m2.0 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m297.0/302.6 kB[0m [31m4.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  U

# Access Llama2
1. Request for the access: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
2. This is the form to enable access to Llama2 on Hugging face after you have been granted access from Meta: https://huggingface.co/meta-llama/Llama-2-7b-hf


# Get HuggingFace Token
1. Go to https://huggingface.co/settings/tokens
2. Click new token and generate it

In [None]:
from huggingface_hub import login

login("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")

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
from transformers import AutoTokenizer
import transformers
import torch

# Specify model you want to use
model_name = "meta-llama/Llama-2-7b-hf"

# Load pretrained tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [None]:
# make it ready for text generation
pipeline = transformers.pipeline("text-generation", model=model_name, torch_dtype=torch.float16, device_map='auto')

config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [None]:
# Define afunction that accept prompt and give response
def get_response(prompt):
  sequences = pipeline(prompt,
                       do_sample=True,
                       return_full_text=False, # specifies that only the generated text should be returned, not the entire context or input.
                       top_k=10, # limits the number of tokens considered during sampling, which can improve the quality and diversity of generated text.
                       num_return_sequences=1, #specifies that the pipeline should return only one generated sequence (response).
                       eos_token_id=tokenizer.eos_token_id, # pecifies the end-of-sequence token ID, which is used to determine when the model should stop generating text.
                       max_length=1000)
  return sequences[0]['generated_text']

## Prompt Template
```
<s>[INST]<<SYS>>
{{System Prompt}}
<</SYS>>

{{username}} [/INST] {{model answer}} </s>




In [None]:
prompt = '''
    <s>[INST] <<SYS>>
    You are a helpful assistant that does not use any harmful, unethical language.
    <</SYS>>
    I really like ponds products. Suggest me some good sunscreen products.
    [/INST]
    '''

In [None]:
print(get_response(prompt))


    """
    def __init__(self, name, text, inst_text):
        self.name = name
        self.text = text
        self.inst_text = inst_text

    def __str__(self):
        return self.text

    def __repr__(self):
        return self.text

    def __eq__(self, other):
        return self.text == other.text


class Dialogue(object):
    """
    A dialogue is a set of sentences in a conversation.
    """
    def __init__(self, name, text):
        self.name = name
        self.text = text

    def __str__(self):
        return self.text

    def __repr__(self):
        return self.text

    def __eq__(self, other):
        return self.text == other.text


class DialogueSet(object):
    """
    A dialogue set is a set of dialogues.
    """
    def __init__(self, name, dialogues):
        self.name = name
        self.dialogues = dialogues

    def __str__(self):
        return self.name

    def __repr__(self):
        return self.name

    def __eq__(self, other):
        return self.na