<a href="https://colab.research.google.com/github/rajivsam/arangomlFeatureStore/blob/master/examples/feature_store_consumer_application.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Overview
The purpose of this notebook is to illustrate how a recommender system application can use the  _arangomlFeatureStore_ to make recommendations. The application will need to connect to feature store to retrieve embeddings. The connection information provided here is representative. You will need to replace it with the information specific to your case. If you are running this notebook immediately after running the _feature_store_producer_DS.ipynb_ notebook, you can use the connection information obtained by running the last cell in that notebook. The details of the tasks involved in making recommendations are in cells labeled appropriately.

## Install the required packages

In [None]:
!pip install -i https://test.pypi.org/simple/ arangomlFeatureStore
!pip install  pyArango python-arango PyYAML==5.2 numpy scikit-surprise

## Provide the connection information to the Feature Store
__Note: THIS IS REPRESENTATIVE AND PROVIDED FOR ILLUSTRATION. REPLACE WITH INFORMATION VALID FOR YOUR SESSION__ 

In [None]:
connection_info_producer_fs = {'dbName': 'TUTpoaywuvsyv9bogx0e3vdnr',
 'edge_col': 'entity-feature-value',
 'entity_col': 'entity',
 'feature_value_col': 'feature-value',
 'graph_name': 'feature_store_graph',
 'hostname': 'tutorials.arangodb.cloud',
 'password': 'TUTc9fby27ixqcidev68cmxbc',
 'port': 8529,
 'protocol': 'https',
 'replication_factor': 3,
 'username': 'TUTu6kipkgspt01shbmdi3gg9'}


## Set up the arangomlFeatureStore for use in colab

In [None]:
import arangomlFeatureStore as p
import sys
sys.path.append(p.__path__)
print(f"Feature store at {p.__path__}")
sys.path.insert(0, p.__path__)


In [None]:
!chmod -R 777 /usr/local/lib/python3.7/dist-packages/arangomlFeatureStore

In [None]:
from arangomlFeatureStore.feature_store_admin import FeatureStoreAdmin
from arango.database import StandardDatabase

## Connect to the FeatureStore specified by the connection information

In [None]:
fa = FeatureStoreAdmin(conn_config = connection_info_producer_fs)

In [None]:
fs = fa.get_feature_store()

## Retrieve the Item embeddings and User embeddings associated with the tags _NMF-item-embeddings_ and _NMF-user-embeddings_ respectively

In [None]:
item_embs = fs.get_featureset_with_tag('tag', 'NMF-item-embeddings')

In [None]:
user_embs = fs.get_featureset_with_tag('tag', 'NMF-user-embeddings')

## Need to convert JSON to numeric type (Marshalling and Unmarshalling from JSON to Python)

In [None]:
import json
item_col_names = []
embedding_rows = []

for row in item_embs:
  item_id_toks = row['_key'].split('-')
  item_id = item_id_toks[1]
  item_col_names.append(item_id)
  embedding_rows.append(json.loads(row['embedding']))


## Making a recommendation for a user
The following steps are performed to make a recommendation for a user. In this segment, a random user, user #24 (or user23 in the zero based array index used by python) is used for illustration.
1. The ratings computed for $user23$ from the matrix factorization model is $$ ratings_{user23} = user\ embedding_{user23}. movie\ embedding$$, where $user\ embedding_{user23}$ is a $1\times5$ dimensional vector and $movie\ embedding$ is a $5\times 1638$ matrix. There are $1638$ movies. The code uses items to represent the movies
2. From the computed ratings, we filter out the movies that the user has already viewed and rated.
3. We then sort the remaineder in descending order of computed ratings from the model
4. We recommend the top 10 movies of the sorted list.

The segments for each of these steps are shown below

## Compute ratings for user23 from the NMF model

In [None]:
import numpy as np
item_embs_T = np.array(embedding_rows).transpose()

In [None]:
user23_emb = np.array(json.loads(user_embs[23]['embedding']))

In [None]:
user23_emb = user23_emb.reshape((1, user23_emb.shape[0]))

In [None]:
item_embs_T.shape

In [None]:
import pandas as pd
df_user23_nmf = pd.DataFrame(np.matmul(user23_emb, item_embs_T)).transpose()
df_user23_nmf.columns = ["NMF_calc_rating"]
df_user23_nmf["Item_ID"] = item_col_names
df_user23_nmf["actual_rating"] = ['not rated']*df_user23_nmf.shape[0]
df_user23_nmf = df_user23_nmf.set_index('Item_ID')


In [None]:
df_user23_nmf.head()

## Determine the movies that the user has already seen

In [None]:
user23 = fs.find_entity(attrib_name='_key', attrib_value= user_embs[23]['_key'])

## Movies rated have a numeric value, those not rated will have the value "not rated" in the actual_rating column of the data frame created above

In [None]:
for k, v in user23[0]['ratings'].items():
  df_user23_nmf.at[k, "actual_rating"] = v



In [None]:
df_user23_nmf

## Filter the "not rated" movies

In [None]:
df_user23_nmf_us = df_user23_nmf.query('actual_rating == "not rated"')

## Sort in descending order of computed ratings and serve the top 10

In [None]:
df_user23_nmf_us.sort_values(by = ["NMF_calc_rating"], ascending=False).head(10)