<a href="https://colab.research.google.com/github/altosaar/food2vec/blob/master/onboarding_recommendation_ideas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

3 ideas to start exploring how best to use embeddings learned from a large collection of data (55k users, 15M meals) on a smaller set where mechanistic models of blood glucose are used to inform recommendations. 

In [None]:
# mount google drive to be able to load model parameters
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import numpy as np
import torch
import pandas as pd
np.random.seed(534343)

In [None]:
# load model parameters
state_dict = torch.load('/content/drive/MyDrive/rankfromsets/best_state_dict', map_location=torch.device('cpu'))

In [None]:
state_dict['model'].keys()

odict_keys(['user_embeddings.weight', 'user_bias.weight', 'attribute_emb_sum.weight', 'attribute_bias_sum.weight'])

In [None]:
user_emb = state_dict['model']['user_embeddings.weight'].numpy()
user_bias = state_dict['model']['user_bias.weight'].numpy()
word_emb = state_dict['model']['attribute_emb_sum.weight'].numpy()
word_bias = state_dict['model']['attribute_bias_sum.weight'].numpy()
num_users, emb_size = user_emb.shape
user_emb.shape, user_bias.shape, word_emb.shape, word_bias.shape

((54996, 128), (54996, 1), (9963, 128), (9963, 1))

In [None]:
df = pd.read_csv('/content/drive/MyDrive/rankfromsets/food_name_vocab_min_count_20.csv', header=None, index_col=0)
id2word = df[1].to_dict()
word2id = {v: k for k, v in id2word.items()}

### idea 1 - matching users to user embeddings

can match users by having them choose from a list of representative users

this would reduce the need to onboard 

In [None]:
# pick 3 random users as examples
example_users = np.random.choice(num_users, 3)
num_top = 20
for user in example_users:
    # compute their top food words
    word_logits = user_emb[user] @ word_emb.T + user_bias[user] + word_bias.T
    top_words = np.argsort(np.squeeze(word_logits))[::-1]
    user_word_string = '\n'.join([id2word[i] for i in top_words[:num_top]])
    print(f'user {user}:')
    print(f'top preferred words:\n{user_word_string}')
    print('\n')

user 42282:
top preferred words:
myoplex
ro
casein
ripped
mens
tripleberry
starkist
creatine
c
bananna
advobar
farm
complex
carte
gladiator
charlie
spegetti
knockout
grandmas
rtd


user 7314:
top preferred words:
nutrisystem
struesel
francisco
based
tazo
fancy
sweetener
backyard
request
praegers
river
chai
pirate
tom
shake
per
skim
in
pretzels
cheezit


user 28101:
top preferred words:
creamer
chardonnay
graze
moscato
women
aunt
herby
grove
merlot
meunster
reisling
break
blood
scandinavian
chive
svelte
madras
sargento
net
truly




### idea 2: matching existing meals with cosine similarity

assuming preference data has been elicited (or some type of preference with associated language), meal or recipe recommendation can be done by cosine similarity

### idea 3: using rankfromsets to sort by preference 

assuming the mechanistic model of blood glucose has been used to predict what meals people may want to eat, rankfromsets can be used to rank these meals in order of preference (e.g. using the nearest neighbor user embedding).