# 04-2 Text Based Recommender

In this notebook we build a recommender system using text based description of each product in our inventory. The idea is as follows:
1. Encode each product description as a latent vector. Cache all of these encodings into a database.
2. Given a query in the form of a string, encode the query as latent vector.
3. For each product description, compare query vector with product vector via cosine similarity.
4. Return product(s) with highest similarity score.

In [1]:
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.nn.functional as F
import torch

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import support_victor_machine as supp

from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords

In [2]:
articles = pd.read_csv('../data/articles.csv')

articles.head()

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
3,110065001,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,9,Black,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
4,110065002,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,10,White,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."


In [16]:
product_desc = articles.groupby('product_code', as_index=False)['detail_desc'].value_counts()[['product_code', 'detail_desc']]

product_desc.head()

Unnamed: 0,product_code,detail_desc
0,108775,Jersey top with narrow shoulder straps.
1,110065,"Microfibre T-shirt bra with underwired, moulde..."
2,111565,"Semi shiny nylon stockings with a wide, reinfo..."
3,111586,Tights with built-in support to lift the botto...
4,111593,"Semi shiny tights that shape the tummy, thighs..."


In [18]:
product_desc.to_csv('../data/product_desc.csv',index=False)

In [6]:
article_desc = articles[['article_id','product_code','detail_desc']]

article_desc

Unnamed: 0,article_id,product_code,detail_desc
0,108775015,108775,Jersey top with narrow shoulder straps.
1,108775044,108775,Jersey top with narrow shoulder straps.
2,108775051,108775,Jersey top with narrow shoulder straps.
3,110065001,110065,"Microfibre T-shirt bra with underwired, moulde..."
4,110065002,110065,"Microfibre T-shirt bra with underwired, moulde..."
...,...,...,...
105537,953450001,953450,Socks in a fine-knit cotton blend with a small...
105538,953763001,953763,Loose-fitting sports vest top in ribbed fast-d...
105539,956217002,956217,"Short, A-line dress in jersey with a round nec..."
105540,957375001,957375,Large plastic hair claw.


---

## Bert

To encode the product descriptions, we'll use Google's BERT model. BERT is a **transformer**, a type of neural network built using a mechanism called **self-attention** embedded inside of a stack of encoders and a stack of decoders. The gist of BERT is that it learns to associate pairs of words in a sentence. 

For example, if we passed the sentence: "The dress was blue with puffy sleeves.", BERT can recognize that the adjective "blue" is describing "the dress", while the adjective "puffy" is describing the "sleeves".

Let us demonstrate how BERT encodes textual data into vector data.

In [11]:
from pytorch_transformers import BertTokenizer, BertModel, BertForMaskedLM

In [68]:
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

In [69]:
text = '[CLS] Who is Brad Pitt ? [SEP] Brad Pitt is an actor [SEP]'

tokenized_text = bert_tokenizer.tokenize(text)

tokenized_text

['[CLS]',
 'who',
 'is',
 'brad',
 'pitt',
 '?',
 '[SEP]',
 'brad',
 'pitt',
 'is',
 'an',
 'actor',
 '[SEP]']

In [70]:
indexed_tokens = bert_tokenizer.convert_tokens_to_ids(tokenized_text)
segment_ids = [0,0,0,0,0,0,0,1,1,1,1,1,1]

indexed_tokens

[101, 2040, 2003, 8226, 15091, 1029, 102, 8226, 15091, 2003, 2019, 3364, 102]

In [24]:
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors= torch.tensor([segment_ids])

In [25]:
model = BertModel.from_pretrained('bert-base-uncased')

model.eval()

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

In [26]:
tokens_tensor = tokens_tensor.to('cuda')
segments_tensor = segments_tensor.to('cuda')
model.to('cuda')

with torch.no_grad():
    outputs = model(tokens_tensor, token_type_ids = segments_tensor)
    # PyTorch-Transformers models always output tuples.
    # In this case, the first element is the hidden state of the last layer of the Bert Model (i.e. our word embedding).
    encoded_layers = outputs[0]
    
encoded_layers

tensor([[[-0.9093,  0.2959, -0.3539,  ..., -1.1109,  0.7214,  0.3194],
         [-0.6692, -0.3558, -0.0102,  ..., -0.0533,  0.7098, -0.3872],
         [-0.0745, -0.6621,  0.2456,  ..., -0.2663,  0.3234,  0.1851],
         ...,
         [-0.0936,  0.4367, -0.1312,  ..., -0.1786, -0.0175, -0.0775],
         [-0.1089,  0.4316,  0.0312,  ..., -0.3157,  0.0407, -1.3747],
         [ 0.7852,  0.1316, -0.3963,  ...,  0.1040, -0.4144, -0.2295]]],
       device='cuda:0')

In [28]:
encoded_layers[0]

tensor([[-0.9093,  0.2959, -0.3539,  ..., -1.1109,  0.7214,  0.3194],
        [-0.6692, -0.3558, -0.0102,  ..., -0.0533,  0.7098, -0.3872],
        [-0.0745, -0.6621,  0.2456,  ..., -0.2663,  0.3234,  0.1851],
        ...,
        [-0.0936,  0.4367, -0.1312,  ..., -0.1786, -0.0175, -0.0775],
        [-0.1089,  0.4316,  0.0312,  ..., -0.3157,  0.0407, -1.3747],
        [ 0.7852,  0.1316, -0.3963,  ...,  0.1040, -0.4144, -0.2295]],
       device='cuda:0')

In [41]:
encoded_layers.shape

torch.Size([1, 13, 768])

---

Now let's see how to use the idea above to measure similarity between a query string and a product description.

In [48]:
article_desc.loc[0:5]

Unnamed: 0,article_id,product_code,detail_desc
0,108775015,108775,Jersey top with narrow shoulder straps.
1,108775044,108775,Jersey top with narrow shoulder straps.
2,108775051,108775,Jersey top with narrow shoulder straps.
3,110065001,110065,"Microfibre T-shirt bra with underwired, moulde..."
4,110065002,110065,"Microfibre T-shirt bra with underwired, moulde..."
5,110065011,110065,"Microfibre T-shirt bra with underwired, moulde..."


In [58]:
tokenizer = RegexpTokenizer(pattern=r'\w\w+')

In [85]:
stopwords_vocab = stopwords.words('english')

batch_sentences = [article_desc.loc[0, 'detail_desc'].lower(), article_desc.loc[3, 'detail_desc'].lower()]

batch_sentences

['jersey top with narrow shoulder straps.',
 'microfibre t-shirt bra with underwired, moulded, lightly padded cups that shape the bust and provide good support. narrow adjustable shoulder straps and a narrow hook-and-eye fastening at the back. without visible seams for greater comfort.']

So here we have two pieces of text data: 1) the product description for ```108775``` and 2) the product description for ```110065```.

We will encode both descriptions into vectors using BERT, then measure their similarity using Cosine Similarity.

In [86]:
batch_tokens = [tokenizer.tokenize(sent) for sent in batch_sentences]

batch_tokens = [ [token for token in sent if token not in stopwords_vocab] for sent in batch_tokens]

batch_tokens

[['jersey', 'top', 'narrow', 'shoulder', 'straps'],
 ['microfibre',
  'shirt',
  'bra',
  'underwired',
  'moulded',
  'lightly',
  'padded',
  'cups',
  'shape',
  'bust',
  'provide',
  'good',
  'support',
  'narrow',
  'adjustable',
  'shoulder',
  'straps',
  'narrow',
  'hook',
  'eye',
  'fastening',
  'back',
  'without',
  'visible',
  'seams',
  'greater',
  'comfort']]

In [66]:
len(batch_tokens[0]), len(batch_tokens[1])

(5, 27)

In [74]:
desc1 = article_desc.loc[0,'detail_desc']
desc2 = article_desc.loc[3,'detail_desc']

desc1_tokens = bert_tokenizer.tokenize(desc1)
desc2_tokens = bert_tokenizer.tokenize(desc2)

index_tokens1 = bert_tokenizer.convert_tokens_to_ids(desc1_tokens)
index_tokens2 = bert_tokenizer.convert_tokens_to_ids(desc2_tokens)

In [76]:
len(index_tokens1), len(index_tokens2)

(7, 54)

In [80]:
segment1 = [0 for i in range(len(index_tokens1))]
segment2 = [0 for i in range(len(index_tokens2))]

tokens1_tensor = torch.tensor([index_tokens1])
segment1_tensor = torch.tensor([segment1])

tokens2_tensor = torch.tensor([index_tokens2])
segment2_tensor = torch.tensor([segment2])


tokens1_tensor = tokens1_tensor.to('cuda')
segment1_tensor = segment1_tensor.to('cuda')

tokens2_tensor = tokens2_tensor.to('cuda')
segment2_tensor = segment2_tensor.to('cuda')

with torch.no_grad():
    outputs1 = model(tokens1_tensor, token_type_ids = segment1_tensor)
    outputs2 = model(tokens2_tensor, token_type_ids = segment2_tensor)
    # PyTorch-Transformers models always output tuples.
    # In this case, the first element is the hidden state of the last layer of the Bert Model (i.e. our word embedding).
    encoded1_layers = outputs1[0]
    encoded2_layers = outputs2[0]
    
encoded1_layers

tensor([[[ 0.2008, -0.8764, -0.1606,  ..., -0.5034,  0.0699,  0.1527],
         [ 0.4071, -0.5673, -0.0292,  ..., -0.5300,  0.0326, -0.0323],
         [ 0.0269, -0.3291, -0.1295,  ..., -0.4558, -0.4106, -0.0549],
         ...,
         [ 0.4381, -0.5147,  0.1010,  ..., -0.3515, -0.3497, -0.2480],
         [ 0.4352, -0.4855,  0.0797,  ..., -0.4644, -0.2847, -0.2938],
         [ 0.2863, -1.3549,  0.0071,  ..., -0.2281,  0.2523, -0.1669]]],
       device='cuda:0')

- Note: because the two product descriptions had a different number of words, the corresponding vector encodings are also different in size. What we have to do then, reduce both encodings down to a uniform size. One possible approach: take the mean of all the vectors and return a single vector for each of the descriptions.

In [81]:
encoded1_mean = torch.mean(encoded1_layers, 1)
encoded2_mean = torch.mean(encoded2_layers, 1)

- Now we have 2 vectors of the same dimension, hence we can take their cosine.

In [84]:
cos = nn.CosineSimilarity(dim=1, eps=1e-6)

cos(encoded1_mean, encoded2_mean).item()

0.6666138768196106

---

## Remarks

The methodology above is meant to be a rough demonstration of the possibility of using product descriptions as the basis for a recommender system.

A much better approach is to use **Sentence BERT** or (S-BERT) which is a BERT model but built to encode entire sentences into a single vector. We can then take cosine similarities like before to measure similarities between product descriptions and return most similar items.

Finally, this text-based idea of measuring similarity also naturally generalizes to a full-blown search engine functionality. Indeed, we implement such a search engine which returns recommended products based on their product descriptions against a queried string passed by the user. See the web application for a demonstration of this functionality.