# Hybrid Search
In this example ,we try to do a hybrid query combining the vector database Milvus and the relational database Postgres.

## Data

The data used in this test are ANN_SIFT1B

Data Link：<http://corpus-texmex.irisa.fr/>
- Base Data Set：ANN_SIFT1B Base_set
- Query Data Set：ANN_SIFIT1B Query_set

## Requirements

| Python Packages | Docker Servers |
| --------------- | -------------- |
| PyMilvus        | Milvus-1.1.0   |
| Postgres           | Postgres          |
|  Psycopg2 |
|  Faker |
|  Numpy |

For this example we are assuming you are familiar with using Numpy, Psycopg2, and Faker

## Up and Running

### Start Milvus Server

This demo uses Milvus 1.1.0, please refer to the [Install Milvus](https://milvus.io/docs/v1.1.0/install_milvus.md) guide to learn how to use this docker container. For this example we wont be mapping any local volumes. 

In [11]:
! docker run -d \
-p 19532:19530 \
-p 19122:19121 \
milvusdb/milvus:1.1.0-cpu-d050721-5e559c

2f8c3d29ad87ebcb2bc14e0044c9c88185e767a3473bf8d5584b8e499e203348


### Install PostgreSQL
For now, Milvus doesn't support to store string type data. Thus, we need a relational database to store questions and answers. In this example, we use [PostgreSQL](https://www.postgresql.org/).

Install by launching Postgres.app: https://www.postgresql.org/download/

## Install Packages
Install the required python package

In [16]:
!pip install pymilvus==1.1.0
!pip install numpy
!pip install psycopg2
!pip install faker

Collecting pymilvus==1.1.0
  Downloading pymilvus-1.1.0-py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 351 kB/s eta 0:00:01
Installing collected packages: pymilvus
  Attempting uninstall: pymilvus
    Found existing installation: pymilvus 0.3.0
    Uninstalling pymilvus-0.3.0:
      Successfully uninstalled pymilvus-0.3.0
Successfully installed pymilvus-1.1.0
You should consider upgrading via the '/data/workspace/minicoda/bin/python3 -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/data/workspace/minicoda/bin/python3 -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/data/workspace/minicoda/bin/python3 -m pip install --upgrade pip' command.[0m
Collecting text-unidecode==1.3
  Using cached text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
Installing collected packages: text-unidecode
  Attempting uninstall: text-unidecode
    Found existing installation: text-unidecode 1.2
    Uninstalling text-u

## Running code
### Connecting to Servers
We first start off by connecting to the servers. In this case the docker containers are running on localhost and the ports are the default ports.

In [33]:
#Connectings to Milvus and Postgres

from milvus import *
import psycopg2

milvus = Milvus(host='localhost', port='19532')
conn = psycopg2.connect(host='192.168.1.85', port='5432', user='postgres', password='postgres',database='test')
cursor = conn.cursor()

### Building Collection 

The next step is to create the collection in Milvus in order to store and retireval the vectors. We need to specify the parameter collection_name, dimension, index_file_size and metric_type when creating

In [97]:
collection_name = 'newu'
VEC_DIM = 128

In [98]:
param = {
            'collection_name': collection_name,
            'dimension': VEC_DIM,
            'index_file_size':1024,
            'metric_type':MetricType.L2
        }
status, ok = milvus.has_collection(collection_name)

if not ok:
    milvus.create_collection(param)

### Create an index
Currently, a collection only supports one index type.Prepare the parameters needed to create indexes.

In [99]:
index_param = {
    'nlist': 16384
}
status = milvus.create_index(collection_name, IndexType.IVF_SQ8, index_param)
status, index = milvus.get_index_info(collection_name)
print(index)

(collection_name='newu', index_type=<IndexType: IVF_SQ8>, params={'nlist': 16384})


### Create table in Postgres  
PostgresSQL will be used to store the Milvus ID and its corresponding attributes.Here is the description of the attributes:
- `sex`	   Define the gender of the human face: male or female.
- `time`	Specifies the query time range, e.g. [2021-05-15 00:10:21, 2021-05-16 10:54:12]
- `glasses `	Defines if the human face wears glasses: True or False.

In [34]:
def create_pg_table(conn,cursor,table_name):
    try:       
        sql = "CREATE TABLE " + table_name + " (ids bigint, sex char(10), get_time timestamp, is_glasses boolean);"
        cursor.execute(sql)
        conn.commit()
        print("create postgres table!")
    except:
        print("can't create postgres table")
        

In [30]:
pg_name ='newu'
create_pg_table(conn, cursor, pg_name)

create postgres table!


### Process and Store SIFT1B dataset
#### 1.Generate embeddings 


In [35]:
import numpy as np

FILE_PATH = '../sift_data/sift_data/bigann_base.bvecs'
BASE_LEN = 1000
count = 0

def load_bvecs_data(fname,base_len,idx):
    begin_num = base_len * idx
    # print(fname, ": ", begin_num )
    x = np.memmap(fname, dtype='uint8', mode='r')
    d = x[:4].view('int32')[0]
    data =  x.reshape(-1, d + 4)[begin_num:(begin_num+base_len), 4:]   
    data = (data + 0.5) / 255
    # data = normaliz_data(data)
    data = data.tolist()
    return data


### 2.Store data by ID
The generated (or specific) IDs and their corresponding attributes 

In [36]:
import random
from faker import Faker
import os
fake = Faker()

def record_txt(ids,fname):
    with open(fname,'w+') as f:
        for i in range(len(ids)):
            sex = random.choice(['female','male'])
            get_time = fake.past_datetime(start_date="-120d", tzinfo=None)
            is_glasses = random.choice(['True','False'])
            line = str(ids[i]) + "|" + sex + "|'" + str(get_time) + "'|" + str(is_glasses) + "\n"
            f.write(line)
            
def copy_data_to_pg(conn, cursor,fname ,table_name):
    fname = os.path.join(os.getcwd(),fname)
    sql = "copy " +  table_name  + " from '" + fname + "' with CSV delimiter '|';"
    print(sql)
    try:
        cursor.execute(sql)
        #cur.copy_expert(sql, open(fname, "r"))
        conn.commit()
        print("insert pg sucessful!")
    except Exception as e:
        conn.rollback()
        print("copy data to postgres failed: ", e)


### 3.Insert Milvus
Insert the generated vectors into Milvus and store the ID and corresponding attributes of the vector in Postgres

In [37]:
filen = '/data/t.csv'
VEC_NUM = 10000
while count < (VEC_NUM // BASE_LEN):
    vectors = load_bvecs_data(FILE_PATH,BASE_LEN,count)
    vectors_ids = [id for id in range(count*BASE_LEN,(count+1)*BASE_LEN)]
    status, ids = milvus.insert(collection_name=collection_name, records=vectors, ids=vectors_ids)
    record_txt(ids,filen)
    copy_data_to_pg(conn, cursor,filen ,pg_name)
    count =count + 1

copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!
copy newu from '/data/t.csv' with CSV delimiter '|';
insert pg sucessful!


###  Search in Milvus
After completing the import of data, you can customize the conditions for the query. 

In [38]:
def load_query_list(fname, query_location):
    query_location = int(query_location)
    x = np.memmap(fname, dtype='uint8', mode='r')
    d = x[:4].view('int32')[0]
    data =  x.reshape(-1, d + 4)[query_location:(query_location+1), 4:]
    data = (data + 0.5) / 255
    query_vec = data.tolist()
    return query_vec

### Search the similarity query
After loading the data, we can do similarity search. First, the query statement is converted into a vector and then searched in Milvus. milvus will return the similarity id and distance, and then find out the corresponding attributes in Postgres based on the resulting id.
Finally, the query results are displayed based on the Id and distance

In [62]:
TOP_K = 10
DISTANCE_THRESHOLD = 1
collection_name = 'newu'
def search_in_milvus(vector,milvus):
    output_ids = []
    output_distance = []
    _param = {'nprobe': 64}
    status, results = milvus.search(collection_name = collection_name,query_records=vector, top_k=TOP_K, params=_param)
    #print(status, results[0][0].id)
    for result in results:
        #print(result)
        for i in range(TOP_K):
           # if result[i].distance < DISTANCE_THRESHOLD:
            output_ids.append(result[i].id)
            output_distance.append(result[i].distance) 
    return  output_ids,output_distance

In [60]:
def search_in_pg(conn,cursor,result_ids,result_distance,sex,time,glasses):
    sql1 = str(result_ids[0])
    i = 1
    while i < len(result_ids):
        sql1 = sql1 + "," + str(result_ids[i])
        i = i + 1
    sql = "select * from " + pg_name + " where ids in (" + sql1 + ") and sex='" + sex + "' and get_time between '" + time[0] + "' and '" + time[1] + "' and is_glasses='" + str(glasses) + "';"

    try:
        cursor.execute(sql)
        rows=cursor.fetchall()
        # print("search sucessful!")
        return rows
    except:
        print("search faild!")

In [59]:
def merge_rows_distance(rows,ids,distance):
    new_results = []
    if len(rows)>0:
        #print(len(rows))
        for row in rows:
            index_flag = ids.index(row[0])
            temp = [row[0]] + list(row[1:5]) + [distance[index_flag]]
            new_results.append(temp)
        new_results = np.array(new_results)
        #print(new_results)
        sort_arg = np.argsort(new_results[:,4])
        #print(sort_arg)
        new_results = new_results[sort_arg].tolist()
        print("\nids                 sex         time              glasses  distance")
        for new_result in new_results:
            print( new_result[0], "\t", new_result[1], new_result[2], "\t", new_result[3],"\t", new_result[4])
    else:
        print("no result")

### Search Results
Given a case where the attributes are respectively as follows, perform a query for similar vectors that are consistent with the attributes

In [53]:
time_insert = []
sex ="male"
glasses = "True"
num ="10"
temp = "[2021-03-23 07:49:42 , 2021-03-26 01:00:13 ]"
time_insert.append(temp[1:20])
time_insert.append(temp[22:41])

In [63]:
QUERY_PATH = '../sift_data/sift_data/bigann_query.bvecs'
query_location = num
query_vec = load_query_list(QUERY_PATH,query_location)
result_ids, result_distance = search_in_milvus(query_vec,milvus)
rows = search_in_pg(conn,cursor,result_ids, result_distance, sex,time_insert,glasses)
merge_rows_distance(rows,result_ids,result_distance)


ids                 sex         time              glasses  distance
9395 	 male       2021-03-24 07:49:42 	 True 	 2.0576701164245605
9425 	 male       2021-03-25 01:00:13 	 True 	 2.07944655418396
