# Vector db Connection
**connect to qdrant vector store with Python qdrant_client**

**Note:** Before starting this tutorial please make sure you read the *installation.md* file for qdrant installation.

Make sure you installed qdrant_client, to install it:
```bash
pip install qdrant-client
```

**Lets create a qdrant client and connect to our Vector DB**

In [1]:
import qdrant_client

client = qdrant_client.QdrantClient(host="localhost", port=6333)
client

<qdrant_client.qdrant_client.QdrantClient at 0x10594c8b0>

### We have successfuly created the client.
Now lets create the Embedding model we are going to use for representing documents

In [4]:
from sentence_transformers import SentenceTransformer

MODEL_NAME_OR_PATH = "sentence-transformers/all-mpnet-base-v2" # you may use any model you like
model = SentenceTransformer(MODEL_NAME_OR_PATH)
model

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

### We have a qdrant client and a model to extract embeddings.
Lets create a collection that we will store our data
- You may give any name to *collection_name*
- You may directly define dimension of your model's outputs. This code will only work for transformers based models
- You may choose DOT or EUCLID as a distince value if you like to. (I suggest to use COSINE for sentence-transformer models)

In [7]:
from qdrant_client.http.models import VectorParams, Distance

COLLECTION_NAME = "example_collection"

client.create_collection(
    collection_name=COLLECTION_NAME,
    vectors_config=VectorParams(
        size=model.get_sentence_embedding_dimension(),
        distance=Distance.COSINE,
    )
)

client.get_collections()

CollectionsResponse(collections=[CollectionDescription(name='example_collection')])

### The collection is successfuly created.
Now lets read the data we want to index into collection
The data that i am going to use is related with books and their title. The data is created by ChatGPT.

In [8]:
import json

DATA_PATH = './data.json'
with open(DATA_PATH, 'r') as f:
    data = json.load(f)

# print first item of the data
data[0]

{'title': 'Data Structures and Algorithms',
 'date': '2023-08-02',
 'author': 'Emily Johnson'}

In order to index data into collection, we need to do several steps:
1. Create PointStruct for every document
2. Extract embeddings for text data
3. Index Points to collection

In [9]:
from qdrant_client.http.models import PointStruct

points = []
for idx, doc in enumerate(data):
    text = doc['title']
    vector = model.encode(text).tolist() # encode text to vector
    
    points.append(PointStruct(
        id=idx,
        vector=vector,
        payload=doc # you may store any data in payload
    )) # add point to the list

# upsert points to the collection. 
client.upsert(
    collection_name=COLLECTION_NAME,
    points=points,
    wait=True # If you don't want to wait for the operation to complete, set wait=False
)    

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

### Collection is ready
Now lets Search a document from the collection

In [16]:
QUERY = "AI"
query_vector = model.encode(QUERY).tolist()

search_result = client.search(
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=2 # return top 3 results.
)

for item in search_result:
    print(f"{item.id} - {item.score} - {item.payload}")

1 - 0.50849575 - {'author': 'William Smith', 'date': '2023-08-01', 'title': 'Artificial Intelligence Trends'}
13 - 0.45802432 - {'author': 'Michael Brown', 'date': '2023-08-03', 'title': 'Machine Learning Basics'}


### Even the words **Artificial Intelligence** and **Machine Learning** are not in the query, the Sementic Search managed the retrieving relevant results.
This is because Embeddings represent the meaning of the sentence, which means, sentences that are semanticly similar are closer to each other than the other sentences.