# Hybrid Search
In this example we are going to show how to do a hybrid query combining the vector database Milvus and the relational database Postgres. A hybrid query allows you to search based on many parameters and is useful for situations where you have to narrow down your results. In the future, Milvus 2.0 will allow you to perform this type of searching without having to use a secondary relational database. 


## Data

In this example we are using randomly generated data. We do this because we are mainly trying to demonstrate the flow of doing a hybrid search. 

## Requirements

| Python Packages | Docker Servers |
| --------------- | -------------- |
| pymilvus        | Milvus-1.1.0   |
| numpy           | Postgres          |
|  psycopg2 |
|  faker |

We have included a `requirements.txt` file in order to easily satisfy the required packages. 

## Up and Running


## Installing Packages
Install the required python packages. If you are on mac and recieve an error downloading psycopg2, please first install postgresql with brew.

In [1]:
# ! brew install postgresql
! pip install -r requirements.txt



### Starting Milvus Server

This demo uses Milvus 1.1.0, please refer to the [Install Milvus](https://milvus.io/docs/v1.1.0/install_milvus.md) guide to learn how to use this docker container. For this example we wont be mapping any local volumes. 

In [5]:
! docker run --name milvus_cpu_1.1.0 -d \
-p 19532:19530 \
-p 19122:19121 \
milvusdb/milvus:1.1.0-cpu-d050721-5e559c

d293a53200d9df14781accfcb063f191af1979906916f286a6af218ea8c6a6da


### Starting Postgres Server
For now, Milvus doesn't support storing multiple attributes for the data. Because of this we have to use another service to store these attributes and search through them, in this case PostgreSQL. 

In [3]:
! docker run --name postgres -d  -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres

48a21935e8e67cdae93c6754de8de9baffa537a3f817f94479b84b0ebb179649


### Confirm Running Servers

In [6]:
! docker logs milvus_cpu_1.1.0


    __  _________ _   ____  ______    
   /  |/  /  _/ /| | / / / / / __/    
  / /|_/ // // /_| |/ / /_/ /\ \    
 /_/  /_/___/____/___/\____/___/     

Welcome to use Milvus!
Milvus Release version: v1.1.0, built at 2021-05-06 14:50.43, with OpenBLAS library.
You are using Milvus CPU edition
Last commit id: 5e559cd7918297bcdb55985b80567cb6278074dd

Loading configuration from: /var/lib/milvus/conf/server_config.yaml
WARNNING: You are using SQLite as the meta data management, which can't be used in production. Please change it to MySQL!
Supported CPU instruction sets: avx2, sse4_2
FAISS hook AVX2
Milvus server started successfully!


In [7]:
! docker logs postgres

********************************************************************************
         anyone with access to the Postgres port to access your database without
         a password, even if POSTGRES_PASSWORD is set. See PostgreSQL
         documentation about "trust":
         https://www.postgresql.org/docs/current/auth-trust.html
         In Docker's default configuration, this is effectively any other
         container on the same system.

         It is not recommended to use POSTGRES_HOST_AUTH_METHOD=trust. Replace
         it with "-e POSTGRES_PASSWORD=password" instead to set a password in
         "docker run".
********************************************************************************
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configurat

## Code Overview


### Connecting to Servers
We first start off by connecting to the servers. In this case they are all docker containers and are running on localhost with their default ports.

In [8]:
#Connectings to Milvus and Postgres

import milvus
import psycopg2

milv = milvus.Milvus(host='localhost', port='19532')
conn = psycopg2.connect(host='localhost', port='5432', user='postgres', password='postgres')
cursor = conn.cursor()

### Creating the Collection 

The next step is to create the collection in Milvus in order to store the vectors. We need to specify the parameters `collection_name`, `dimension`, `index_file_size`, and `metric_type` when creating it. In this case we are storing 128-dimensional vectors and using the Euclidean distance. Our data segments are also set to the default 1024MB. 

In this case we are also deleting the collection so that we have a fresh start each time this notebook is loaded up. 

In [10]:
collection_name = 'hybrid_search'
VEC_DIM = 128

milv.drop_collection(collection_name)

param = {
            'collection_name': collection_name,
            'dimension': VEC_DIM,
            'index_file_size':1024,
            'metric_type':milvus.MetricType.L2
        }
status, ok = milv.has_collection(collection_name)

if not ok:
    status = milv.create_collection(param)
    print(status)

Status(code=0, message='Create collection successfully!')


### Creating the Index
Currently, a collection only supports one index type. In this case we are using the ivf_sq8 index. Since this index is an ivf index, we must provide the parameter `nlist`. This parameter tells milvus how many clusters to make in each index file.

In [12]:
index_param = {
    'nlist': 16384
}
status = milv.create_index(collection_name, milvus.IndexType.IVF_SQ8, index_param)
status, index = milv.get_index_info(collection_name)
print(index)

(collection_name='hybrid_search', index_type=<IndexType: IVF_SQ8>, params={'nlist': 16384})


### Creating the Table in Postgres  
PostgreSQL will be used to store the Milvus ID and its corresponding attributes. Here is a description of the attributes:
- `sex`:	   Define the sex of the human: male or female.
- `age`:	 Specifies the age of the human: 1-99
- `has_glasses`: 	Defines if the human face wears glasses: True or False.

In [13]:
def create_pg_table(conn,cursor,table_name):
    try:       
        sql = "CREATE TABLE " + table_name + " (ids bigint, sex char(10), age smallint, has_glasses boolean);"
        cursor.execute(sql)
        conn.commit()
        print("Created postgres table!")
    except:
        print("Can't create postgres table.")
        

 Before creating the table we are clearing any existing tables. This is done in order to have a clean run each time when loading this notebook.

In [14]:
table_name ='hybrid_search'
drop_table = "DROP TABLE IF EXISTS " + table_name

cursor.execute(drop_table)
conn.commit()

create_pg_table(conn, cursor, table_name)

Created postgres table!


### Processing and Storing the Data
For this example we are going to be using randomly generated data to simulate a users situation. We are going to randomly assign sex, age, and if they wear glasses to randomly generated vectors. 


#### 1. Generating Embeddings 


In [15]:
import numpy as np

def generate_data(amount):
    embed = np.random.rand(amount, VEC_DIM).astype('float32')
    return embed


#### 2. Storing Data by ID in Postgres
For this example we are loading in the IDs and attributes through chunks. For each chunk, we write to a .csv file and then write that csv file to the Postgres server.

In [16]:
import random
from faker import Faker
import os
fake = Faker()

def record_txt(ids,fname):
    with open(fname,'w+') as f:
        for i in range(len(ids)):
            sex = random.choice(['female','male'])
            age = random.randint(1,99)
            has_glasses = random.choice(['True','False'])
            line = str(ids[i]) + "|" + sex + "|" + str(age) + "|" + str(has_glasses) + "\n"
            f.write(line)
            
def copy_data_to_pg(conn, cursor,fname ,table_name):
    fname = os.path.join(os.getcwd(),fname)
    try:
        sql = "COPY " + table_name + " FROM STDIN DELIMITER '|' CSV HEADER"
        cursor.copy_expert(sql, open(fname, "r"))
        conn.commit()
        
    except Exception as e:
        conn.rollback()
        print("copy data to postgres failed: ", e)


#### 3. Inserting into Milvus and Postgres
When inserting the data into Milvus and Postgres, we push the vectors by chunks of size `BASE_LEN`. Milvus and Postgres perform better when doing batch inserts.

In [18]:
filen = 't.csv'
VEC_NUM = 10000
BASE_LEN = 1000
count = 0
while count < (VEC_NUM // BASE_LEN):
    vectors = generate_data(BASE_LEN)
    vectors_ids = [id for id in range(count*BASE_LEN,(count+1)*BASE_LEN)]
    status, ids = milv.insert(collection_name=collection_name, records=vectors, ids=vectors_ids)
    record_txt(ids,filen)
    copy_data_to_pg(conn, cursor,filen ,table_name)
    count =count + 1
    print("Insert Step: " + str(count) + "/" + str(int(VEC_NUM/BASE_LEN)))

Insert Step: 1/10
Insert Step: 2/10
Insert Step: 3/10
Insert Step: 4/10
Insert Step: 5/10
Insert Step: 6/10
Insert Step: 7/10
Insert Step: 8/10
Insert Step: 9/10
Insert Step: 10/10


### Performing Search
Once we have the data all loaded up, we can finally then perform the searches. We begin by first creating a vector to search for. With this vector we first search through Milvues to find the closest vector IDs. We then combine these IDs with the attributes being searched for in order to perform the search in the Postgres server. In this example we are searching the closest vectors that match `sex`, `has_glasses`, and is under `age`.

In [19]:
TOP_K = 10
_param = {'nprobe': 64}

def search_in_milvus(vector, milvus_connection):
    status, results = milvus_connection.search(collection_name = collection_name,query_records=vector, top_k=TOP_K, params=_param)
    
    return results

In [20]:
def search_in_pg(conn,cursor,result_ids,result_distance,sex,age,glasses):
    ids = str(result_ids[0])
    i = 1
    while i < len(result_ids):
        ids = ids + "," + str(result_ids[i])
        i = i + 1
    sql = "select * from " + table_name + " where ids in (" + ids + ")" + "and age <=" + str(age) + " and sex='" + sex + "' and has_glasses='" + str(glasses) + "';"

    try:
        cursor.execute(sql)
        rows=cursor.fetchall()
        return rows
    except Exception as e:
        print("search failed!:", e)

In this example we are querying 4 random vectors.

In [21]:
query_vec = generate_data(4)
milvus_results = search_in_milvus(query_vec, milv)

After recieving the results from Milvus, we then have to pull out the IDs and Distances from the result in order to search the Postgres server. Finally, all the values are bundled up for each query vector under `hybrid_results`.

In [22]:
sex = "male"
glasses = "True"
age = 64


hybrid_results = []

for single_query in milvus_results:
    result_ids, result_distances = [], []
    for result_vector in single_query:
        result_ids.append(result_vector.id)
        result_distances.append(result_vector.distance)
        
    sql_results = search_in_pg(conn, cursor, result_ids, result_distances, sex, age, glasses)
    hybrid_results.append((result_ids, result_distances, sql_results))


### Search Results
Once we have all the results the only step left is to print out all the results. In this case we order all the results by distance before printing out the results. 

In [23]:
def merge_rows_distance(full_results):
    
    rows = full_results[2]
    distance = full_results[1]
    ids = full_results[0]
    
    new_results = []
    if len(rows)>0:
        for row in rows:
            index_flag = ids.index(row[0])
            temp = [row[0]] + list(row[1:5]) + [distance[index_flag]]
            new_results.append(temp)
            
        new_results = np.array(new_results)
        sort_arg = np.argsort(new_results[:,4])
        new_results = new_results[sort_arg].tolist()
        columns = ["ids:", "sex:", "age:", "has_glasses:", "distance:"]
        
        new_results.insert(0, columns)
        
        col_width = max(len(word) for row in new_results for word in row) + 2  # padding
        for row in new_results:
            print("".join(word.ljust(col_width) for word in row))
        
    else:
        print("no result")

In [24]:
for x in range(len(hybrid_results)):
    print("Query: " + str(x))
    merge_rows_distance(hybrid_results[x])
    print("-"*120)

Query: 0
ids:                sex:                age:                has_glasses:        distance:           
4823                male                21                  True                13.637537002563477  
7922                male                20                  True                14.373846054077148  
------------------------------------------------------------------------------------------------------------------------
Query: 1
ids:                sex:                age:                has_glasses:        distance:           
4036                male                37                  True                14.453821182250977  
4106                male                33                  True                14.65188217163086   
1027                male                36                  True                14.950194358825684  
3077                male                42                  True                15.067806243896484  
-----------------------------------------------------