# Recall service in recommendation system

In this example, we implement the recall function in the recommendation system by importing movie vectors to milvus, extracting user features using model of PaddlePaddle, and then querying in the Milvus and Redis. Next, let's go through the codes together.

## Data
In this project, we use [MovieLens 1M](https://grouplens.org/datasets/movielens/1m/). This dataset contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users. 
We use the following files:
- movies.dat: Contains movie information.   
MovieID::Title::Genres   

 - Titles are identical to titles provided by the IMDB (including
  year of release)
 - Genres are pipe-separated

 - Some MovieIDs do not correspond to a movie due to accidental duplicate
  entries and/or test entries
 - Movies are mostly entered by hand, so errors and inconsistencies may exist

- movie_vectors.txt: Contains movie vectors that could be imported to Milvus.

## Requirements

1. Python 3.6/3.7
2. [Milvus 1.1.0](https://milvus.io/docs/install_milvus.md)
3. Redis
4. requirements.txt
5. Models
6. Dataset

### Install required python packages

In [None]:
pip install -r requirements.txt

### Start Milvus Server
#### 1. Download Configuration Files

In [None]:
!mkdir -p /home/$USER/milvus/conf
!cd /home/$USER/milvus/conf
!wget https://raw.githubusercontent.com/milvus-io/milvus/v1.1.0/core/conf/demo/server_config.yaml

#### 2. Start docker container

In [None]:
!docker run -d --name milvus_cpu_1.1.0 \
-p 19530:19530 \
-p 19121:19121 \
-v /home/$USER/milvus/db:/var/lib/milvus/db \
-v /home/$USER/milvus/conf:/var/lib/milvus/conf \
-v /home/$USER/milvus/logs:/var/lib/milvus/logs \
-v /home/$USER/milvus/wal:/var/lib/milvus/wal \
milvusdb/milvus:1.1.0-cpu-d050721-5e559c

#### 2. Confirm docker status

In [None]:
! docker logs milvus_cpu_1.1.0

### Start Redis Server

In [None]:
!docker run -d -p 6379:6379 redis

### Download pretrained models
This model is used to transform input user information into vectors.

In [None]:
!wget https://paddlerec.bj.bcebos.com/aistudio/user_vector.tar.gz --no-check-certificate
!mkdir movie_recommender/user_vector_model
!tar xf user_vector.tar.gz -C movie_recommender/user_vector_model/
!rm user_vector.tar.g

### Download dataset

In [None]:
# Download movie information
!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movies.dat --no-check-certificate
# Download movie vecotrs
!wget -P movie_recommender https://paddlerec.bj.bcebos.com/aistudio/movie_vectors.txt --no-check-certificate

## Code Overview
### Import movie vectors to Milvus
#### 1. Connectings to Milvus and redis

In [77]:
from milvus import Milvus, IndexType, MetricType, Status
import redis

milvus = Milvus(host = '127.0.0.1', port = 19530)
r = redis.StrictRedis(host="127.0.0.1", port=6379) 

#### 2. Load movie information to Redis

In [None]:
import json
import codecs

#1::Toy Story (1995)::Animation|Children's|Comedy
def process_movie(lines, redis_cli):
    for line in lines:
        if len(line.strip()) == 0:
            continue
        tmp = line.strip().split("::")
        movie_id = tmp[0]
        title = tmp[1]
        genre_group = tmp[2]
        tmp = genre_group.strip().split("|")
        genre = tmp
        movie_info = {"movie_id" : movie_id,
                "title" : title,
                "genre" : genre
                }
        redis_cli.set("{}##movie_info".format(movie_id), json.dumps(movie_info))
        
with codecs.open("movie_recommender/movies.dat", "r",encoding='utf-8',errors='ignore') as f:
        lines = f.readlines()
        process_movie(lines, r)

#### 3. Create the target collection and partition

In [None]:
TABLE_NAME = 'demo_films'
PARTITION_NAME = 'Movie'

param = {'collection_name':TABLE_NAME, 
         'dimension':32, 
         'index_file_size':2048, 
         'metric_type':MetricType.L2
        }

milvus.create_collection(param)
milvus.create_partition(TABLE_NAME, PARTITION_NAME)

#### 4. Get embeddings and IDs
The vectors in `movie_vectors.txt` are obtained from the `user_vector_model` downloaded above. So we can directly get the vectors and the IDs by reading the file.

In [34]:
def get_vectors():
    with codecs.open("movie_recommender/movie_vectors.txt", "r", encoding='utf-8', errors='ignore') as f:
        lines = f.readlines()
    ids = [int(line.split(":")[0]) for line in lines]
    embeddings = []
    for line in lines:
        line = line.strip().split(":")[1][1:-1]
        str_nums = line.split(",")
        emb = [float(x) for x in str_nums]
        embeddings.append(emb)
    return ids, embeddings

ids, embeddings = get_vectors()

#### 4. Import movie vectors to Milvus
Import vectors into the partition **Movie** under the collection **demo_films**.

In [37]:
status = milvus.insert(collection_name=TABLE_NAME, records=embeddings, ids=ids, partition_tag=PATITION_NAME)
status[0]

### Recall vectors in Milvus
#### 1. Genarate user embedding
Pass in the gender, age and occupation of the user we want to recommend. **user_vector_model** model will generate the corresponding user vector.
Occupation is chosen from the following choices:
*  0:  "other" or not specified
*  1:  "academic/educator"
*  2:  "artist"
*  3:  "clerical/admin"
*  4:  "college/grad student"
*  5:  "customer service"
*  6:  "doctor/health care"
*  7:  "executive/managerial"
*  8:  "farmer"
*  9:  "homemaker"
*  10:  "K-12 student"
*  11:  "lawyer"
*  12:  "programmer"
*  13:  "retired"
*  14:  "sales/marketing"
*  15:  "scientist"
*  16:  "self-employed"
*  17:  "technician/engineer"
*  18:  "tradesman/craftsman"
*  19:  "unemployed"
*  20:  "writer"

In [None]:
import numpy as np
from paddle_serving_app.local_predict import LocalPredictor

class RecallServerServicer(object):
    def __init__(self):
        self.uv_client = LocalPredictor()
        self.uv_client.load_model_config("movie_recommender/user_vector_model/serving_server_dir") 
        
    def hash2(a):
        return hash(a) % 1000000

    def get_user_vector(self):
        dic = {"userid": [], "gender": [], "age": [], "occupation": []}
        lod = [0]
        dic["userid"].append(hash2('0'))
        dic["gender"].append(hash2('M'))
        dic["age"].append(hash2('23'))
        dic["occupation"].append(hash2('6'))
        lod.append(1)

        dic["userid.lod"] = lod
        dic["gender.lod"] = lod
        dic["age.lod"] = lod
        dic["occupation.lod"] = lod
        for key in dic:
            dic[key] = np.array(dic[key]).astype(np.int64).reshape(len(dic[key]),1)
        fetch_map = self.uv_client.predict(feed=dic, fetch=["save_infer_model/scale_0.tmp_1"], batch=True)
        return fetch_map["save_infer_model/scale_0.tmp_1"].tolist()[0]

recall = RecallServerServicer()
user_vector = recall.get_user_vector()

user_vector

#### 2. Query in Milvus
Pass in the user vector, and then recall vectors in the previously imported data collection and partition.

In [104]:
TOP_K = 20
SEARCH_PARAM = {'nprobe': 20}
status, results = milvus.search(collection_name=TABLE_NAME, query_records=[user_vector], top_k=TOP_K, params=SEARCH_PARAM)

#### 3. Search movie information by IDs

In [106]:
recall_results = []
for x in results[0]:
    recall_results.append(r.get("{}##movie_info".format(x.id)).decode('utf-8'))
recall_results

['{"movie_id": "760", "title": "Stalingrad (1993)", "genre": ["War"]}',
 '{"movie_id": "1350", "title": "Omen, The (1976)", "genre": ["Horror"]}',
 '{"movie_id": "1258", "title": "Shining, The (1980)", "genre": ["Horror"]}',
 '{"movie_id": "632", "title": "Land and Freedom (Tierra y libertad) (1995)", "genre": ["War"]}',
 '{"movie_id": "3007", "title": "American Movie (1999)", "genre": ["Documentary"]}',
 '{"movie_id": "2086", "title": "One Magic Christmas (1985)", "genre": ["Drama", "Fantasy"]}',
 '{"movie_id": "3920", "title": "Faraway, So Close (In Weiter Ferne, So Nah!) (1993)", "genre": ["Drama", "Fantasy"]}',
 '{"movie_id": "1303", "title": "Man Who Would Be King, The (1975)", "genre": ["Adventure"]}',
 '{"movie_id": "1051", "title": "Trees Lounge (1996)", "genre": ["Drama"]}',
 '{"movie_id": "1605", "title": "Excess Baggage (1997)", "genre": ["Adventure", "Romance"]}',
 '{"movie_id": "652", "title": "301, 302 (1995)", "genre": ["Mystery"]}',
 '{"movie_id": "1275", "title": "High

## Start rank service

After completing the recall service, the results can be further sorted using the **movie_recommender** model, and then the movies with high similarity scores can be recommended to users. You can try this deployable recommendation system by [quick start](QUICK_START.md).