Q1. Getting the embeddings model
First, we will get the embeddings model ``multi-qa-distilbert-cos-v1`` from the Sentence Transformer library

```python
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer(model_name)
```
Create the embedding for this user question:

```
user_question = "I just discovered the course. Can I still join it?"
What's the first value of the resulting vector?
```

- -0.24

- -0.04

- 0.07

- 0.27

In [2]:
from sentence_transformers import SentenceTransformer

model_name = "multi-qa-distilbert-cos-v1"
embedding_model = SentenceTransformer(model_name)

In [3]:
user_question = "I just discovered the course. Can I still join it?"

In [18]:
v = embedding_model.encode(user_question)

In [19]:
v[0]

0.078222655

## Q2. Creating the embeddings
Now for each document, we will create an embedding for both question and answer fields.

We want to put all of them into a single matrix X:

- Create a list embeddings
- Iterate over each document
- ``qa_text = f'{question} {text}'``
- compute the embedding for qa_text, append to embeddings
- At the end, let ``X = np.array(embeddings)`` ``(import numpy as np)``

What's the shape of X? (``X.shape``). Include the parantheses.

In [6]:
import requests 

base_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main'
relative_url = '03-vector-search/eval/documents-with-ids.json'
docs_url = f'{base_url}/{relative_url}?raw=1'
docs_response = requests.get(docs_url)
documents = docs_response.json()

In [7]:
ml_qa = []
for doc in documents:
    if doc['course'] == "machine-learning-zoomcamp":
        ml_qa.append(doc)

In [8]:
len(ml_qa)

375

In [9]:
ml_qa[0]

{'text': 'Machine Learning Zoomcamp FAQ\nThe purpose of this document is to capture frequently asked technical questions.\nWe did this for our data engineering course and it worked quite well. Check this document for inspiration on how to structure your questions and answers:\nData Engineering Zoomcamp FAQ\nIn the course GitHub repository there’s a link. Here it is: https://airtable.com/shryxwLd0COOEaqXo\nwork',
 'section': 'General course-related questions',
 'question': 'How do I sign up?',
 'course': 'machine-learning-zoomcamp',
 'id': '0227b872'}

In [10]:
embeddings = []
for qa in ml_qa:
    question = qa['question']
    text = qa['text']
    qa_text = f'{question} {text}'
    embeddings.append(embedding_model.encode(qa_text))

In [11]:
import numpy as np

X = np.array(embeddings)

In [23]:
X.shape

(375, 768)

In [20]:
scores = X.dot(v)

In [23]:
scores.max()

0.6506573