# **Large Language Models Example**

In [1]:
!pip install groq

Collecting groq
  Downloading groq-0.11.0-py3-none-any.whl.metadata (13 kB)
Collecting httpx<1,>=0.23.0 (from groq)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->groq)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->groq)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading groq-0.11.0-py3-none-any.whl (106 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.5/106.5 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.0/78.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading h11-0.14.0-py3-none-any.whl (58 kB

In [9]:
from groq import Groq

class PromptTemplates:

    @staticmethod
    def get_CV_summarization_template(cv_text,job_role="Research Scientist"):

        return f"""Consider the CV_text below:

        ---- CV Text ----

        {cv_text}

        Consider also the job role below:

        --- Job Role ----

        {job_role}

        Use the cv_text and the job role provided and respond with a summary of the CV that helps a recruiter make a decision
        Make sure to format your response in a readable format,
        and only provide the answer without any extra information
        """

class LLM:

    def __init__(self, api=None, groq_model=None, temperature=None):

        self.api = 'GROQ'
        self.groq_model = "mixtral-8x7b-32768"
        self.temperature = 0.0
        self.groq_client = Groq(api_key="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")

    def set_prompt(self, cv_text, prompt=None):

        if prompt is None:
          self.prompt = PromptTemplates.get_CV_summarization_template(cv_text)
          return

        self.prompt = prompt

    def respond_to_prompt(self):
        prompt = self.prompt

        if self.api == 'GROQ':
            client = self.groq_client

            # Create chat completion with configured model and temperature
            chat_completion = client.chat.completions.create(
                messages=[
                    {
                        "role": "user",
                        "content": prompt,
                    }
                ],
                temperature=self.temperature,
                model=self.groq_model,
            )

            llm_response = str(chat_completion.choices[0].message.content)
            return llm_response

In [10]:
# Sample CV text
cv_text = """
Experienced data scientist with expertise in machine learning, natural language processing, and AI applications.
Proficient in Python, TensorFlow, and cloud computing. Published in top-tier AI journals and conferences.
Skilled in data analysis, statistical modeling, and developing end-to-end AI pipelines.
"""

resume_llm = LLM()
resume_llm.set_prompt(cv_text)
resume_llm.respond_to_prompt()

'Summary of CV:\n\nThe CV belongs to an experienced data scientist with expertise in machine learning, natural language processing, and AI applications. They are proficient in Python, TensorFlow, and cloud computing, and have published in top-tier AI journals and conferences. Their skills include data analysis, statistical modeling, and developing end-to-end AI pipelines. These qualifications make them a strong candidate for the Research Scientist position, where they can utilize their expertise in AI and machine learning to conduct research and develop innovative solutions.'

# **Instruction Tuning Example**

[Notebook Link](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Mistral/Supervised_fine_tuning_(SFT)_of_an_LLM_using_Hugging_Face_tooling.ipynb#scrollTo=IQ1sMda27Zj6)

## Required hardware

The notebook is designed to be run on any NVIDIA GPU which has the [Ampere architecture](https://en.wikipedia.org/wiki/Ampere_(microarchitecture)) or later with at least 24GB of RAM. This includes:

* NVIDIA RTX 3090, 4090
* NVIDIA A100, H100, H200

and so on. Personally I'm running the notebook on an RTX 4090 with 24GB of RAM.

The reason for an Ampere requirement is because we're going to use the [bfloat16 (bf16) format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format), which is not supported on older architectures like Turing.

But: a few tweaks can be made to train the model in float16 (fp16), which is supported by older GPUs like:

* NVIDIA RTX 2080
* NVIDIA Tesla T4
* NVIDIA V100.

# **Fine-tune Gemma models in Keras using LoRA**

[Notebook link](https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lora_tuning.ipynb#scrollTo=SDEExiAk4fLb)

## Overview

Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models.

Large Language Models (LLMs) like Gemma have been shown to be effective at a variety of NLP tasks. An LLM is first pre-trained on a large corpus of text in a self-supervised fashion. Pre-training helps LLMs learn general-purpose knowledge, such as statistical relationships between words. An LLM can then be fine-tuned with domain-specific data to perform downstream tasks (such as sentiment analysis).

LLMs are extremely large in size (parameters in the order of billions). Full fine-tuning (which updates all the parameters in the model) is not required for most applications because typical fine-tuning datasets are relatively much smaller than the pre-training datasets.

[Low Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) is a fine-tuning technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights of the model and inserting a smaller number of new weights into the model. This makes training with LoRA much faster and more memory-efficient, and produces smaller model weights (a few hundred MBs), all while maintaining the quality of the model outputs.

This tutorial walks you through using KerasNLP to perform LoRA fine-tuning on a Gemma 2B model using the [Databricks Dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k). This dataset contains 15,000 high-quality human-generated prompt / response pairs specifically designed for fine-tuning LLMs.