## Setting Up Qdrant: A Step-by-Step Guide 

Hey there! Welcome to the exciting world of vector databases and Retrieval Augmented Generation (RAG). In this guide, we'll walk you through setting up Qdrant, a powerful vector database, so you can start building amazing context-aware applications. 

Whether you prefer the convenience of the cloud or the control of a local setup, we've got you covered. Let's dive in!

### ☁️  Qdrant Cloud: The Easy Route

Qdrant Cloud offers a managed service, taking the hassle out of infrastructure management. Here's how to get started:

1. **Sign Up:** Head over to [Qdrant Cloud](https://qdrant.tech) and create your free account. The free tier is perfect for development and testing. 

2. **Create a Cluster:**
   - Go to the "Clusters" section in the sidebar.
   - Click "Create" and name your cluster (e.g., "practical-rag").
   - Hit "Create Cluster" and wait for it to spin up.

3. **Get API Key and URL:**
   - In the "Clusters" section, click the right arrow (>) next to your cluster name.
   - Find and copy the "Cluster URL" and click the red "Get API Key" button to copy your API key. Keep these safe!

4. **Set Environment Variables:**  (Optional but recommended)
   - Open your terminal and run:

   ```bash
   export QDRANT_API_KEY=<your-api-key>
   export QDRANT_URL=<your-qdrant-cloud-url>

##### 📝 Note

When you set environment variables like `QDRANT_API_KEY` and `QDRANT_URL` using the `export` command, they only stick around for your current terminal session. As soon as you close the terminal window or start a new session, those variables are gone like a magician's rabbit!

If you want your environment variables to be available every time you open a terminal, you have options:

*   **`.bashrc` or `.zshrc`:** Add the `export` commands to your shell's configuration file (`.bashrc` for Bash, `.zshrc` for Zsh). This will load the variables automatically when you start a new terminal session. 

Alternatively, store them in a `.env` file for easy access in your code:

**`.env` file:** Store your variables in a `.env` file and use a library like `python-dotenv` to load them into your Python environment. This is a common practice for keeping sensitive information like API keys out of your code. 


   ```bash
   echo "QDRANT_API_KEY=$QDRANT_API_KEY" >> .env && echo "QDRANT_URL=$QDRANT_URL" >> .env
   ```

### 💻  Local Setup: For the Hands-On Developer

If you prefer a local setup, follow these steps:

1. **Install Docker:** Ensure you have Docker installed and running on your system. 

2. **Download Qdrant Image:** Open your terminal and run:

   ```bash
   docker pull qdrant/qdrant
   ```

3. **Run Qdrant:**

   ```bash
   docker run -p 6333:6333 -p 6334:6334 \
       -v $(pwd)/qdrant_storage:/qdrant/storage:z \
       qdrant/qdrant
   ```

   This command starts Qdrant and makes it accessible at `localhost:6333` (REST API and Web UI) and `localhost:6334` (gRPC API). Data will be stored in the `./qdrant_storage` directory.

### 🛠️  Development Environment Setup

1. **Create a Virtual Environment:** (Recommended)

   ```bash
   conda create -n p_rag python=3.10
   conda activate p_rag  # Activate the environment
   ```

2. **Install Dependencies:**

   ```bash
   pip install python-dotenv==1.0.1 qdrant-client==1.9.0 openai==1.23.6 transformers==4.40.1 sentence-transformers==2.7.0 datasets==2.19.0
   ```

These libraries will enable you to interact with Qdrant, work with embeddings, and handle data. 

In [1]:
import os
from dotenv import load_dotenv

from qdrant_client import QdrantClient

load_dotenv('.env')

True

In [2]:
client = QdrantClient(
    url=os.getenv('QDRANT_URL'),
    api_key=os.getenv('QDRANT_API_KEY')
)

Run the following code, and notice that our list of collections is empty. As it should be, because you haven't created one yet.

In [3]:
client.get_collections()

CollectionsResponse(collections=[])

## 🧠 Collection Creation: Key Points to Remember  

*   **Points are the Building Blocks:** Collections in Qdrant are all about storing and managing *points*. Each point is like a little package containing two main things:

    *  🔢 **Vector:** This is your data transformed into an array of numbers (like a secret code for your text, image, or whatever you're storing). 

    * 📝 **Payload:** Think of this as a sticky note attached to your vector, holding extra info about your data (metadata). 

*  🧲 **Similarity is Key:** All points in a collection must have vectors of the same size and use the same distance metric (like cosine similarity). This lets you find similar points quickly. 

*   **Decisions, Decisions:** Before creating a collection, you need to figure out:

    * 🐱‍💻  **Data Type:** What kind of data are you working with? Text, images, cat videos? 

    *   **Vector Size & Distance:** This depends on your embedding model and how you want to measure similarity.

    *   **Payload Content:** What extra info do you want to store with each point?

*  📚 **Text Time!**  For the next few projects, we'll be focusing on text data, using OpenAI's `text-embedding-3-large` model (which has a vector size of 3072) and cosine similarity. 

In [4]:
from qdrant_client.http import models

collection_config = models.VectorParams(
    size=3072,
    distance=models.Distance.COSINE
)

client.create_collection(
    collection_name="p_rag_series_1",
    vectors_config=collection_config
)

True

And now you can verify that the collection has been created by running `client.get_collections()` or through the UI:


In [5]:
client.get_collections()

CollectionsResponse(collections=[CollectionDescription(name='p_rag_series_1')])

Let's go ahead and delete this collection, just to demonstrate how.

In [6]:
client.delete_collection(collection_name="p_rag_series_1")

True

Now, go ahead and close the client down.

In [7]:
client.close()

We're about to dive into some cool ways to use vectors and vector search with Qdrant - since the book and this series has multimodality as a them, it's worth calling out that Qdrant can handle different types of data and managing multiple users. Let's take a quick look!

### 🗃️ 1. Named Vectors: Organizing Your Data 

*   Imagine you have a collection that's like a big box where you can store different kinds of things. Named vectors are like dividers in that box, helping you keep things organized.

*  🖼️ 📝 You can have different named vectors for different types of data, like one for images ("image\_vector") and another for text ("text\_vector"). 

*   Each named vector can have its own size (dimensionality) and distance metric, depending on the type of data it holds.

*   This is super useful when you're working with diverse data and need to compare similar items within each category.

### 🤝 2. Multitenancy: Sharing is Caring (Sometimes) 

*   Think of multitenancy as having multiple users sharing the same Qdrant instance, like roommates sharing an apartment. 

*   You can either have everyone share the same collection (like a shared kitchen) or give each user their own separate collection (like private bedrooms).

*   Sharing a collection is efficient but requires careful management to ensure data privacy. You'll need to add user IDs to the payload and filter data accordingly.

*   Separate collections offer better isolation but can be less efficient and harder to manage as the number of users grows.

**In this series, we'll keep things simple and focus on using a single collection.**  But it's good to know that Qdrant has options for more complex scenarios!