### Import necessary libraries and packages



In [1]:
pip install -U transformers accelerate bitsandbytes  -qq

Note: you may need to restart the kernel to use updated packages.


#### Get file from LLM zoomcamp course repo

In [2]:
!rm -f minsearch.py
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

--2024-07-07 17:03:37--  https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3832 (3.7K) [text/plain]
Saving to: 'minsearch.py'


2024-07-07 17:03:37 (43.3 MB/s) - 'minsearch.py' saved [3832/3832]



#### Get json file from course repo and create TF-IDF

In [3]:
import requests 
import minsearch

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.Index at 0x78b0193c2320>

#### search query based on cosine-similarity

In [4]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )
    return results

In [6]:
from transformers import BitsAndBytesConfig,  pipeline
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import torch
torch.random.manual_seed(0)

2024-07-07 17:04:37.947048: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-07 17:04:37.947151: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-07 17:04:38.079639: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


<torch._C.Generator at 0x78b018363ab0>

### Get Phi-3-mini-128k-instruct from huggingface

In [7]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct", 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

config.json:   0%|          | 0.00/3.48k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Create pipeline

In [8]:
generator = pipeline("text-generation", model=model, 
                     tokenizer=tokenizer
                    )

#### Build prompt

In [173]:
def build_prompt(query, search_results):
    prompt_template = [
          {"role": "system", "content": "You are a helpful AI assistant."},]
    
    for doc in search_results:
        prompt_template.append({'role':'user','content':doc['question']})
        prompt_template.append({'role':'assistant','content':doc['text']})
        
    prompt_template.append({'role':'user','content':query})
    return prompt_template

def llm(prompt,temp):
    config = {'max_length':600,
                'temperature':temp,
                'do_sample':True,
              "return_full_text": False, 
              #  'num_return_sequences':1,
               # 'num_beams':1,         
                #'top_p':0.95,         
                #'top_k':50,        
               # 'eos_token_id':tokenizer.eos_token_id,
               # 'pad_token_id':tokenizer.eos_token_id,
               }
    response = generator(prompt,**config)
    response_final = response[0]['generated_text'].strip()
    return response_final

#### Get results and create prompt from it and generate result using llm

In [174]:
def rag(query,temp=0.1):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt,temp)
    return answer

#### The following result based on the json document data

#### Temperature 0.1

In [177]:
%%time
rag("I just discovered the course. Can I still enroll it?")

CPU times: user 3.44 s, sys: 1.92 ms, total: 3.44 s
Wall time: 3.44 s


'Yes, you can still enroll in the course. The course will start on 15th Jan 2024 at 17h00. You can register before the course starts using this link.'

#### Let's try using different temperature values and examine the differences in the results.

In [78]:
import numpy as np
from IPython.display import display, Markdown

In [176]:
%%time
for temp in np.round(np.linspace(0.2,1.0,9),1):
    display(Markdown(f"### Temperature: {temp}"))
    result=rag('I just discovered the course. Can I still enroll it?',temp)
    display(Markdown(f"Result: {result}"))
    print("-"*100)

### Temperature: 0.2

Result: Yes, you can still enroll in the course. The course will start on 15th Jan 2024 at 17h00. You can register before the course starts using this link.

----------------------------------------------------------------------------------------------------


### Temperature: 0.3

Result: Yes, you can still enroll in the course. The course will start on 15th Jan 2024 at 17h00. Register before the course starts using this link.

----------------------------------------------------------------------------------------------------


### Temperature: 0.4

Result: Yes, you can still enroll in the course. The enrollment period is open until the course start date.
You can register using the provided link.
Don't forget to join the course Telegram channel with announcements and register in DataTalks.Club's Slack and join the channel.

----------------------------------------------------------------------------------------------------


### Temperature: 0.5

Result: Yes, you can enroll in the course. However, you will need to register before the course starts. You can do this using the provided link.

----------------------------------------------------------------------------------------------------


### Temperature: 0.6

Result: Yes, you can still register. The course will start on the 15th of January, 2024. Don't miss out on this amazing learning opportunity!

----------------------------------------------------------------------------------------------------


### Temperature: 0.7

Result: Yes, you can. Just go to www.google.com/dataTalks and there you can enroll. I'd recommend you to enroll soon as spots get limited. The course will start soon.

----------------------------------------------------------------------------------------------------


### Temperature: 0.8

Result: Of course! You can register now.

----------------------------------------------------------------------------------------------------


### Temperature: 0.9

Result: Yes, you can still sign up for the course 🙂. It's already open for enrollment. Don't miss the opportunity to gain knowledge and skills.
Remember to register before the course starts, as you won't be guaranteed to be in the list after the registration deadline. There will be a drop-out period after the course starts, during which no one can enroll.
I hope this helps! If you have any other questions, feel free to ask.

----------------------------------------------------------------------------------------------------


### Temperature: 1.0

Result: Yes, you can still sign up for the course! Remember, the deadline to sign up is tomorrow, 10 am EST! I hope it's not too late for you.

----------------------------------------------------------------------------------------------------
CPU times: user 32 s, sys: 16.4 ms, total: 32 s
Wall time: 32 s
