<a href="https://www.kaggle.com/code/mohsinal/example-context-based-recommender-engine?scriptVersionId=122363258" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

### Sentence transformers 
A popular choice for building recommender systems that rely on text data. These models can take in raw text data and transform it into fixed-length vectors that capture the semantic meaning of the text. These vectors can then be used to find similarities and differences between pieces of text, which is useful for tasks like recommendation.

To build your naïve contextual based recommender system using sentence transformers, you will need to follow these general steps:

### Prepare your data: 
Collect and clean the text data that you will be using for your recommender system. This could be a collection of product reviews, news articles, or any other type of text data that you want to use for recommendations.

### Train your sentence transformer:
You can either train your own sentence transformer from scratch or use a pre-trained model. Pre-trained models like BERT, RoBERTa, and DistilBERT are already available and can be fine-tuned on your specific dataset to generate embeddings for your text data.

### Generate embeddings:
Use the trained sentence transformer to generate embeddings for each piece of text in your dataset. These embeddings should capture the semantic meaning of the text and be of a fixed length.

### Find similarities:
Use a similarity measure, such as cosine similarity, to find the similarity between embeddings for different pieces of text. This will allow you to identify similar pieces of text that could be recommended to users.

### Create a recommendation engine:
Based on the similarity measure, create a recommendation engine that recommends similar pieces of text to users based on their input.

Keep in mind that this is a very basic approach and there are many ways to improve the performance of your recommender system, such as using more advanced algorithms or incorporating additional features like user preferences or ratings.

lets install sentence transformers

In [None]:
!pip install -U sentence-transformers

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
data=pd.read_csv(os.path.join(dirname, filename))

In [None]:
df=data.copy()

In [None]:
df.head()

# Prepare your data:
Lets clean the data remove any stoping words and punctuations, we will use NLTK and spacy for this

These lines first import the necessary libraries for text processing, including spaCy and NLTK. The 'spacy.load' method loads a pre-trained English language model for text processing, and the 'stop_words' attribute of that model is used to obtain a set of common English stopwords. Additional stopwords are then added to this set.

In [None]:
import spacy
from nltk.tokenize import word_tokenize
sp = spacy.load('en_core_web_sm')
all_stopwords = sp.Defaults.stop_words
all_stopwords.add('&')
all_stopwords.add(',')
all_stopwords.add('.')
all_stopwords.add('@')
all_stopwords.add('/')
all_stopwords.add(':')
all_stopwords.add('?')

The 'remove_Stopingwords_Punctuation' function is defined to remove stopwords and punctuation from text. The function first tokenizes the text using NLTK's 'word_tokenize' method, and then removes stopwords using a list comprehension. The resulting list of words is then joined back into a string using the 'join' method.

In [None]:
def remove_Stopingwords_Punctuation(text):
    text_tokens = word_tokenize(text)
    tokens_without_sw = [word for word in text_tokens if not word in all_stopwords]
    return " ".join(tokens_without_sw)


The next three lines apply the 'remove_Stopingwords_Punctuation' function to the 'ProductName' and 'Description' columns of the DataFrame, and concatenate the resulting strings with other relevant columns to form a 'Feature_Set' column.

In [None]:
df["ProductName"]=df["ProductName"].apply(lambda x : remove_Stopingwords_Punctuation(x))

In [None]:
df["Description"]=df["Description"].apply(lambda x : remove_Stopingwords_Punctuation(x))

In [None]:
df["Feature_Set"]=df["ProductBrand"]+df["ProductName"]+df["Gender"]+df["Description"]+df["PrimaryColor"]

In [None]:
subset=df[["ProductID","Feature_Set"]]

In [None]:
subset

In [None]:
subset.dropna(axis=0, inplace=True)

In [None]:
subset.reset_index(drop=True,inplace=True)

In [None]:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')

In [None]:
sentence_embeddings = model.encode(subset["Feature_Set"])

In [None]:
sentence_embeddings.shape

In [None]:
subset["Feature_Set"].values

In [None]:
subset

In [None]:
product_id=10015921
cosine_scores = util.cos_sim(model.encode(subset[subset["ProductID"]==product_id]["Feature_Set"].values[0]), sentence_embeddings)

In [None]:
score=cosine_scores[0].tolist()
recomendations_cos=[]
for i in range(0,5):
    maxx=score.index(max(score))
    recomendations_cos.append(subset['ProductID'][maxx])
    score[maxx]=-1

In [None]:
recomendations_cos

In [None]:
df[df["ProductID"].isin(recomendations_cos)]