In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from openai import OpenAI

In [3]:
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

In [15]:
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{
        "role": "user",
        "content": "give me the roadmap of an ML/AI engineer by 2025 from beginner to hero"
    }]
)

In [16]:
print(response.choices[0].message.content)

Becoming an ML/AI engineer by 2025 requires a structured roadmap that evolves with the rapidly changing landscape of technology and industry. Here's a comprehensive roadmap to guide you from beginner to expert level.

### Phase 1: Foundation Building (0-6 months)

#### 1. **Mathematics and Statistics**
   - Linear Algebra: Matrices, Vectors, Eigenvalues
   - Calculus: Derivatives, Integrals, Optimization
   - Probability and Statistics: Distributions, Bayes' Theorem, Hypothesis Testing

#### 2. **Programming Skills**
   - Learn Python: Focus on libraries like NumPy, Pandas, and Matplotlib.
   - Version Control: Git and GitHub basics.

#### 3. **Basic Data Handling**
   - Data Wrangling: Cleaning and preparing data for analysis.
   - Introduction to SQL: Basic queries and database operations.

#### 4. **Introduction to Machine Learning**
   - Understand the basics of Supervised vs. Unsupervised learning.
   - Simple algorithms: Linear Regression, K-Nearest Neighbors, Decision Trees.

##

**Generating Answer**

In [10]:
import requests
import minsearch

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

documents[0]

{'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.",
 'section': 'General course-related questions',
 'question': 'Course - When will the course start?',
 'course': 'data-engineering-zoomcamp'}

In [12]:
index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.minsearch.Index at 0x7f331d156480>

In [13]:
q = 'the course has already started, can I still enroll?'

In [5]:
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{
        "role": "user",
        "content": q
    }]
)

In [6]:
print(response.choices[0].message.content)

Whether you can still enroll in a course that has already started depends on the institution's policies and the specific course. Many colleges and universities have deadlines for enrollment, but some may allow late registration under certain circumstances. It's best to contact the admissions office or the course instructor directly to inquire about your options. They will provide you with the most accurate and relevant information.


In [19]:
def llm(prompt):
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content


def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results


def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt


def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [21]:
response = rag(q)

print(response)

Yes, you can still enroll in the course after the start date. Even if you don't register, you're still eligible to submit the homework. However, keep in mind that there will be deadlines for turning in the final projects, so it's best not to leave everything until the last minute.


In [22]:
q = 'how do I run kafka?'
response = rag(q)

print(response)

To run Kafka, you can execute the following command in the terminal from your project directory:

```bash
java -cp build/libs/<jar_name>-1.0-SNAPSHOT.jar:out src/main/java/org/example/JsonProducer.java
```

Make sure to replace `<jar_name>` with the actual name of your jar file.
