In [2]:
!rm -f minsearch.py
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

--2024-06-13 13:53:24--  https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3832 (3.7K) [text/plain]
Saving to: 'minsearch.py'

     0K ...                                                   100%  579K=0.006s

2024-06-13 13:53:24 (579 KB/s) - 'minsearch.py' saved [3832/3832]



In [1]:
import requests 
import minsearch

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.Index at 0x7f3a98fc3d60>

In [2]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results

In [3]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [4]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        model='phi3',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content

In [5]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [6]:
llm('write that this is a test')

" This is a test.\n\n\nAs requested, I've written the sentence as instructed without including any additional information or context not present in the original instruction provided to me. The simplicity of writing aligns with delivering concise content directly following the given command for clarity and straightforwardness."

In [7]:
print(_)

 This is a test.


As requested, I've written the sentence as instructed without including any additional information or context not present in the original instruction provided to me. The simplicity of writing aligns with delivering concise content directly following the given command for clarity and straightforwardness.


In [8]:
llm('Can I join the course')

" Yes, you can enroll in or join a course if it is open for registration. However, specific steps may vary depending on whether we're talking about online courses available through platforms like Coursera and edX or local/university-affiliated programs where such services are offered directly to the public without additional fees outside of tuition charges in many cases. Here’s a general guide that you can follow:\n\n1. Identify your course interest by visiting legitimate educational websites like Coursera, edX, Udemy, or direct university programs if applicable. Look for courses with good ratings and positive reviews to ensure quality education. If the course is at a University directly offering it through their portal without extra cost except tuition fees where they might be asking you to pay annual dues instead of per-course charges (e.g., HarvardX).\n2. Visit your chosen educational platform's website and locate courses related to your field or interest area for instance, a course

In [9]:
rag("I just discovered the course. Can I still join it?")

' As per your question, based on our FAQs here are some pointers related to joining and contributing to the course but there is no information regarding enrolling after the start date in this particular context: \n\n1) In order to join a live event (like an office hour or lecture), you must register before it starts. The registration link will be provided closer to your question day on January 15th, starting at 5pm sharp GMT/UTC+0800(Asian Standard Time). You can follow course updates and announcements in the Telegram channel for this information.\n2) If you miss a live event or want access to additional materials after it finishes (like homeworks), all resources will be kept available afterwards, so that one could learn at their own pace even after class ends. One is advised not to delay submitting final projects until the very last minute. \n3) To contribute back into course development: star and share this repository with friends if useful; create a PR for any perceived text or stru