# Calling LLM 

In [1]:
from  openai import OpenAI
client = OpenAI()

In [2]:
response = client.chat.completions.create(
    model = 'gpt-4o',
    messages = [{"role": "user", "content": "Is it too late to join the course?"}]
)

In [3]:
response.choices[0].message.content

"It's difficult to provide a specific answer without more context, as it depends on various factors such as the institution, the type of course, the enrollment deadlines, and the specific policies in place. Here are a few steps you can take:\n\n1. **Check the Enrollment Deadline:** Review the course information on the institution's website or contact the course administrator to find out whether the enrollment period has ended.\n\n2. **Contact the Instructor or Administrator:** If the deadline has passed, reach out to the course instructor or the administrative office to inquire if any exceptions can be made for late enrollment.\n\n3. **Review the Course Requirements:** Verify if you meet all the prerequisites and other requirements for joining the course.\n\n4. **Consider Alternative Options:** If enrollment is not possible this term, ask about future offerings of the course or explore similar courses that might still be open.\n\nWould you like guidance specific to a particular type of

# Query local search engine

In [4]:
import sys
sys.path.insert(1, '../src')

In [5]:
import minsearch

In [6]:
import json

In [7]:
with open('../data/documents.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

In [8]:
documents = []

for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)

In [9]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [10]:
index = minsearch.Index(
    text_fields = ['question', 'text', 'section'],
    keyword_fields = ['course']
)

In [11]:
index.fit(documents)

<minsearch.Index at 0x14dbdaa10>

In [12]:
q = 'the course has already started, can I still enroll?'

In [13]:
boost = {'question': 3.0, 'section': 0.5} # more importance to question 3x more important

index.search(
    query=q,
    boost_dict=boost,
    num_results=3
    
)

[{'text': 'Yes, you can. You won’t be able to submit some of the homeworks, but you can still take part in the course.\nIn order to get a certificate, you need to submit 2 out of 3 course projects and review 3 peers’ Projects by the deadline. It means that if you join the course at the end of November and manage to work on two projects, you will still be eligible for a certificate.',
  'section': 'General course-related questions',
  'question': 'The course has already started. Can I still join it?',
  'course': 'machine-learning-zoomcamp'},
 {'text': "Yes, even if you don't register, you're still eligible to submit the homeworks.\nBe aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.",
  'section': 'General course-related questions',
  'question': 'Course - Can I still join the course after the start date?',
  'course': 'data-engineering-zoomcamp'},
 {'text': 'Yes, we will keep all the materials after the cour

# Generating Answers with Llama3 locally

We get a generic answer above.

In [14]:
prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database. Use only the facts  from the CONTEXT when answering the question. If the CONTEXT does not contain the answer, ouput NONE.

QUESTION: {question}

CONTEXT: {context}""".strip()

In [15]:
context = ""

for doc in documents:
    context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"

In [16]:
print(context)

section: General course-related questions
question: Course - When will the course start?
answer: The purpose of this document is to capture frequently asked technical questions
The exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1
Subscribe to course public Google Calendar (it works from Desktop only).
Register before the course starts using this link.
Join the course Telegram channel with announcements.
Don’t forget to register in DataTalks.Club's Slack and join the channel.

section: General course-related questions
question: Course - What are the prerequisites for this course?
answer: GitHub - DataTalksClub data-engineering-zoomcamp#prerequisites

section: General course-related questions
question: Course - Can I still join the course after the start date?
answer: Yes, even if you don't register, you're still eligible to submit the homeworks.
Be aware, however, that there will be deadlines for turning in the

In [17]:
prompt = prompt_template.format(question=q, context=context)

In [18]:
client = OpenAI(
    base_url = 'http://localhost:11434/v1',
    api_key='ollama', # required, but unused
)

model_to_use="llama3"

response = client.chat.completions.create(
  model=model_to_use, 
  messages=[
    {
        "role": "user", 
        "content": prompt
    },
  ]
)
print(response.choices[0].message.content)

I see you're having some issues with LocalStack and AWS CLI!

You may have encountered errors like:

* "Unable to locate credentials" after running `localstack` with Kinesis.
* "<botocore.awsrequest.AWSRequest object at 0x7fbaf2666280>" after executing an AWS CLI command.
* "The unspecified location constraint is incompatible for the region specific endpoint this request was sent to" while creating a bucket with LocalStack.

To resolve these issues, you can try the following:

1. For the "Unable to locate credentials" error:
	* Add environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` in your `docker-compose.yaml` file.
	* Alternatively, run `aws --endpoint-url http://localhost:4566 configure` and provide random values for these keys.
2. For the "<botocore.awsrequest.AWSRequest object at 0x7fbaf2666280>" error:
	* Add environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` in your `docker-compose.yaml` file.
3. For the "The unspecified location constraint 

# Modularize code

In [19]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5} 
    
    results = index.search(
        query=query,
        filter_dict = {'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=3
    )
    return results

In [20]:
def build_prompt(query, search_results):
    prompt_template = """
    You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database. Use only the facts  from the CONTEXT when answering the question. If the CONTEXT does not contain the answer, ouput NONE.
    
    QUESTION: {question}
    
    CONTEXT: {context}""".strip()

    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"

    prompt = prompt_template.format(question=query, context=context)

    return prompt

In [21]:
def llm(prompt):
    client = OpenAI(
    base_url = 'http://localhost:11434/v1',
    api_key='ollama', # required, but unused
)
    
    model_to_use="llama3"
    
    response = client.chat.completions.create(
      model=model_to_use, 
      messages=[
        {
            "role": "user", 
            "content": prompt
        },
      ]
    )
    return (response.choices[0].message.content)

In [25]:
query = 'how do I run Kafka?'
results = search(query)
prompt = build_prompt(query, results)
print(llm(prompt))

To run Kafka, you should use the Java command with the class name of the producer/consumer/kstreams/etc. For example:

java -cp build/libs/<jar_name>-1.0-SNAPSHOT.jar:out src/main/java/org/example/JsonProducer.java

Please note that this is based on the given context and might not be applicable to all Kafka usage scenarios.
