In [1]:
import minsearch

In [2]:
import json

In [3]:
with open('documents.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

In [4]:
documents = []

for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)

In [5]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [6]:
index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

SELECT * WHERE course = 'data-engineering-zoomcamp';

In [7]:
q = 'the course has already started, can I still enroll?'

In [8]:
index.fit(documents)

<minsearch.Index at 0x7eff09753610>

In [32]:
import os
from groq import Groq
from dotenv import load_dotenv


# Load the environment variables from the .env file
load_dotenv()

# Access the environment variables
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')
GROQ_API_KEY = os.getenv('GROQ_API_KEY')

client = Groq(
    api_key=GROQ_API_KEY
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Explain the importance of fast language models",
        }
    ],
    model="llama3-70b-8192",
)

print(chat_completion.choices[0].message.content)

Fast language models are crucial in today's natural language processing (NLP) landscape, and their importance cannot be overstated. Here are some reasons why:

1. **Real-time Applications**: Fast language models enable real-time applications such as chatbots, voice assistants, and live sentiment analysis. They can process and respond to user input quickly, providing a seamless user experience.
2. **Low Latency**: Fast language models reduce latency, which is critical in applications where timely responses are essential, such as:
	* Conversational AI: Fast responses ensure that conversations feel natural and engaging.
	* Sentiment analysis: Rapid analysis enables swift decision-making in customer service, social media monitoring, or brand reputation management.
	* Language translation: Fast translation facilitates real-time communication across languages.
3. **Scalability**: Fast language models can handle large volumes of text data, making them scalable for applications that require pr

In [12]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=10
    )

    return results

In [13]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [33]:
def llm(prompt):
    # response = client.chat.completions.create(
    #     model='gpt-4o',
    #     messages=[{"role": "user", "content": prompt}]
    # )
    # client = genai.GenerativeModel('gemini-1.5-flash-latest')
    # print(client)
    # response = client.generate_content(prompt)

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model="llama3-70b-8192",
    )

    
    return chat_completion.choices[0].message.content

In [34]:
query = 'how do I run kafka?'

def rag(query):
    search_results = search(query)
    # print(search)
    prompt = build_prompt(query, search_results)
    # print(prompt)
    answer = llm(prompt)
    return answer

In [35]:
rag(query)

'According to the context, to run Kafka, you need to:\n\n* For Java Kafka: Run `java -cp build/libs/<jar_name>-1.0-SNAPSHOT.jar:out src/main/java/org/example/JsonProducer.java` in the project directory.\n* For Python Kafka: Create a virtual environment, run `requirements.txt`, and then run the Python files in that environment.\n\nMake sure you have the necessary dependencies installed and Kafka broker docker container is running.'

In [36]:
rag('the course has already started, can I still enroll?')

'According to the context, the answer to the question "Can I still enroll in the course even though it has already started?" is:\n\nYES. Even if you don\'t register, you\'re still eligible to submit the homeworks. Be aware, however, that there will be deadlines for turning in the final projects. So don\'t leave everything for the last minute.'

In [18]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [1]:
from elasticsearch import Elasticsearch

In [2]:
es_client = Elasticsearch('http://localhost:9200') 

In [3]:
index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"} 
        }
    }
}

index_name = "course-questions"

es_client.indices.create(index=index_name, body=index_settings)

ConnectionError: Connection error caused by: ConnectionError(Connection error caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x7f6b0cbca380>: Failed to establish a new connection: [Errno 111] Connection refused))

In [24]:
documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [25]:
from tqdm.auto import tqdm

In [26]:
for doc in tqdm(documents):
    es_client.index(index=index_name, document=doc)

100%|██████████| 948/948 [00:05<00:00, 182.83it/s]


In [27]:
query = 'I just disovered the course. Can I still join it?'

In [70]:
def elastic_search(query):
    search_query = {
        "size": 3,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^4", "text"],
                        "type": "best_fields"
                    }
                },
                "filter": {
                    "term": {
                        "course": "machine-learning-zoomcamp"
                    }
                }
            }
        }
    }

    response = es_client.search(index=index_name, body=search_query)
    
    result_docs = []
    
    for hit in response['hits']['hits']:
        result_docs.append(hit['_source'])
        # print(hit['_score'])
        # print(hit["_source"])
    
    return result_docs

In [50]:
def rag(query):
    search_results = elastic_search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [38]:
rag(query)

"Based on the provided context, there is no direct answer on how to run Kafka. However, in the context of running a Python producer, the solution involves creating a virtual environment, installing the required packages, and then running the Python file.\n\nHere's the relevant command from the context:\n\n`python -m venv env`\n`source env/bin/activate`\n`pip install -r ../requirements.txt`\n\nAfter setting up the virtual environment, you can run the Python file (e.g., `producer.py`) in that environment."

In [51]:
query = 'How do I execute a command in a running docker container?'

In [52]:
rag(query)

84.050095
{'text': 'Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.\ndocker run -it --entrypoint bash <image>\nIf the container is already running, execute a command in the specific container:\ndocker ps (find the container-id)\ndocker exec -it <container-id> bash\n(Marcos MJD)', 'section': '5. Deploying Machine Learning Models', 'question': 'How do I debug a docker container?', 'course': 'machine-learning-zoomcamp'}
51.04628
{'text': "You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:\nTo copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:\ndocker cp /path/to/local/file_or_directory container_id:/path/in/container\nHrithik Kumar Advani", 'section': '5. Deploying Machine Learning Models', 'question': 'How do I copy files from my local machine to docker containe

'Based on the context, to execute a command in a running Docker container, you can use the following command:\n\n`docker exec -it <container-id> bash`\n\nFirst, find the container ID using `docker ps`, and then replace `<container-id>` with the actual ID. This will open a bash shell in the running container, allowing you to execute commands.'

In [77]:
def build_prompt_homework(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT:
{context}
    """.strip()

    context = ""
    context_template = """
Q: {question}
A: {text}
""".strip()
    
    for doc in search_results:
        # context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
        context = context + context_template.format(question=doc['question'], text=f"{doc['text']}\n\n")
    
    
    prompt = prompt_template.format(question=query, context=context).strip()
    print("The length of the prompt", len(prompt))
    print(prompt)

    return prompt

In [78]:
def rag(query):
    search_results = elastic_search(query)
    prompt = build_prompt_homework(query, search_results)
    answer = llm(prompt)
    return answer

In [79]:
rag(query)

The length of the prompt 1462
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: How do I execute a command in a running docker container?

CONTEXT:
Q: How do I debug a docker container?
A: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

Q: How do I copy files from my local machine to docker container?
A: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
docker cp /path/to/local/f

'According to the FAQ database, to execute a command in a running Docker container, you can use the following command:\n\n`docker exec -it <container-id> bash`\n\nWhere `<container-id>` is the ID of the running container, which can be found using `docker ps`.'

In [80]:
import tiktoken

In [81]:
encoding = tiktoken.encoding_for_model("gpt-4o")

In [83]:

def rag(query):
    search_results = elastic_search(query)
    prompt = build_prompt_homework(query, search_results)
    print("The number of tokens in ", len(encoding.encode(prompt)))
    answer = llm(prompt)
    return answer

In [84]:
rag(query)

The length of the prompt 1462
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: How do I execute a command in a running docker container?

CONTEXT:
Q: How do I debug a docker container?
A: Launch the container image in interactive mode and overriding the entrypoint, so that it starts a bash command.
docker run -it --entrypoint bash <image>
If the container is already running, execute a command in the specific container:
docker ps (find the container-id)
docker exec -it <container-id> bash
(Marcos MJD)

Q: How do I copy files from my local machine to docker container?
A: You can copy files from your local machine into a Docker container using the docker cp command. Here's how to do it:
To copy a file or directory from your local machine into a running Docker container, you can use the `docker cp command`. The basic syntax is as follows:
docker cp /path/to/local/f

'To execute a command in a running Docker container, you can use the `docker exec` command. First, find the container ID using `docker ps`, and then execute a command in the container using:\n\n`docker exec -it <container-id> bash`'