<a href="https://colab.research.google.com/github/saifulrijal-ds/llm-zoomcamp-2024/blob/main/02-open-source/colab/phi-3-mini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -U transformers accelerate bitsandbytes flash_attn



In [1]:
!nvidia-smi

Sun Jul  7 05:47:36 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   69C    P8              12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

--2024-07-07 05:47:41--  https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3832 (3.7K) [text/plain]
Saving to: ‘minsearch.py.3’


2024-07-07 05:47:41 (71.4 MB/s) - ‘minsearch.py.3’ saved [3832/3832]



In [3]:
import requests
import minsearch

In [4]:
docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
  course_name = course['course']

  for doc in course['documents']:
    doc['course'] = course_name
    documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.Index at 0x7af6b808e110>

In [5]:
def search(query):
  boost = {'question':3.0, 'section': 0.5}

  results = index.search(
      query=query,
      filter_dict={'course': 'data-engineering-zoomcamp'},
      boost_dict=boost,
      num_results=5
  )

  return results

In [6]:
def build_prompt(query, search_result):
  prompt_template = """
QUESTION: {question}

CONTEXT:
{context}
""".strip()

  context = ""

  for doc in search_result:
    context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"

  prompt = prompt_template.format(question=query, context=context).strip()
  return prompt

In [7]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [8]:
torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map='cuda',
    torch_dtype='auto',
    trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct"
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [9]:
!nvidia-smi

Sun Jul  7 05:49:39 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   67C    P0              30W /  70W |   7393MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [10]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)

In [11]:
system_prompt = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.
"""

In [12]:
def llm(prompt):
  messages = [
      {"role": "system", "content": system_prompt},
      {"role": "user", "content": prompt}
  ]

  generation_args = {
      "max_new_tokens": 256,
      "temperature": 0,
      "return_full_text": False,
      "do_sample": False
  }

  outputs = pipe(messages, **generation_args)
  return outputs[0]['generated_text'].strip()

In [13]:
def rag(query):
  search_result = search(query)
  prompt = build_prompt(query, search_result)
  answer = llm(prompt)
  return answer

In [14]:
rag("I just discovered the course. Can I still join it?")

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


"You can still join the course even if you discover it after the start date. You're eligible to submit the homeworks, but remember to meet the deadlines for the final projects. The course will start on January 15th, 2024 at 5:00 PM. Before the course starts, you can install and set up all the dependencies and requirements, and familiarize yourself with the prerequisites and syllabus. You can also contribute to the course by starring the repo, sharing it with friends, or creating a PR to improve the text or structure of the repository."

In [20]:
!nvidia-smi

Sun Jul  7 05:34:09 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0              32W /  70W |   8067MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    