## Question 1. dlt Version

Let's install dlt with Qdrant support and Qdrant client.

```bash
pip install -q "dlt[qdrant]" "qdrant-client[fastembed]"
```

What's the version of dlt that you installed?

In [1]:
import dlt
print(dlt.__version__)

1.13.0


## Question 2. dlt pipeline

Now let's create a pipeline.

How many rows were inserted into the zoomcamp_data collection?

In [2]:
import dlt
import requests

@dlt.resource
def zoomcamp_data():
    docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
    docs_response = requests.get(docs_url)
    documents_raw = docs_response.json()

    for course in documents_raw:
        course_name = course['course']
        for doc in course['documents']:
            doc['course'] = course_name
            yield doc

In [3]:
from dlt.destinations import qdrant

qdrant_destination = qdrant(qd_path="db.qdrant")

pipeline = dlt.pipeline(
    pipeline_name="zoomcamp_pipeline",
    destination=qdrant_destination,
    dataset_name="zoomcamp_tagged_data"
)

load_info = pipeline.run(zoomcamp_data())
print(pipeline.last_trace)

  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Fetching 5 files: 100%|██████████| 5/5 [00:06<00:00,  1.39s/it]


Run started at 2025-07-11 13:28:41.229847+00:00 and COMPLETED in 2 minutes and 53.65 seconds with 4 steps.
Step extract COMPLETED in 0.88 seconds.

Load package 1752240534.0695724 is EXTRACTED and NOT YET LOADED to the destination and contains no failed jobs

Step normalize COMPLETED in 0.14 seconds.
Normalized data for the following tables:
- zoomcamp_data: 948 row(s)
- _dlt_pipeline_state: 1 row(s)

Load package 1752240534.0695724 is NORMALIZED and NOT YET LOADED to the destination and contains no failed jobs

Step load COMPLETED in 2 minutes and 39.80 seconds.
Pipeline zoomcamp_pipeline load step completed in 2 minutes and 39.76 seconds
1 load package(s) were loaded to destination qdrant and into dataset zoomcamp_tagged_data
The qdrant destination used e:\LLM\Workshop_dlt\db.qdrant location to store data
Load package 1752240534.0695724 is LOADED and contains no failed jobs

Step run COMPLETED in 2 minutes and 53.64 seconds.
Pipeline zoomcamp_pipeline load step completed in 2 minutes

## Question 3. Embeddings

When inserting the data, an embedding model was used. Which one?

In [4]:
import json

with open("db.qdrant/meta.json", "r") as f:
    meta = json.load(f)

print(meta)

{'collections': {'zoomcamp_tagged_data': {'vectors': {'fast-bge-small-en': {'size': 384, 'distance': 'Cosine', 'hnsw_config': None, 'quantization_config': None, 'on_disk': None, 'datatype': None, 'multivector_config': None}}, 'shard_number': None, 'sharding_method': None, 'replication_factor': None, 'write_consistency_factor': None, 'on_disk_payload': None, 'hnsw_config': None, 'wal_config': None, 'optimizers_config': None, 'init_from': None, 'quantization_config': None, 'sparse_vectors': None, 'strict_mode_config': None}, 'zoomcamp_tagged_data__dlt_version': {'vectors': {'fast-bge-small-en': {'size': 384, 'distance': 'Cosine', 'hnsw_config': None, 'quantization_config': None, 'on_disk': None, 'datatype': None, 'multivector_config': None}}, 'shard_number': None, 'sharding_method': None, 'replication_factor': None, 'write_consistency_factor': None, 'on_disk_payload': None, 'hnsw_config': None, 'wal_config': None, 'optimizers_config': None, 'init_from': None, 'quantization_config': None,