<h1 style="color:blue">1mg Substitute/Similar Medicine - Recommendation Engine</h1>

Most of the popular ecommerce supports **product recommendation**, when a user searches for a product, the **product recommendation engine** provides similar products.This article focuses on creating a product recommendation engine for ecommerce products in a pharmaceutical domain.

However this project focus is on **e-Pharmacy**, the use case is to provide users with **Substitute** or medicine with **Similar** chemical composition tablets.

In [12]:
#Basic Libraries
import numpy as np
import pandas as pd

#Visualization Libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

#Text Handling Libraries
import re
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

# clustering
from sklearn.cluster import KMeans

#
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer

In [13]:
# Function for removing NonAscii characters
def _removeNonAscii(s):
    return "".join(i for i in s if  ord(i)<128)

# Function for converting into lower case
def make_lower_case(text):
    return text.lower()

# Function for removing stop words
def remove_stop_words(text):
    text = text.split()
    stops = set(stopwords.words("english"))
    text = [w for w in text if not w in stops]
    text = " ".join(text)
    return text

# Function for removing punctuation
def remove_punctuation(text):
    tokenizer = RegexpTokenizer(r'\w+')
    text = tokenizer.tokenize(text)
    text = " ".join(text)
    return text

# Function for removing the html tags
def remove_html(text):
    html_pattern = re.compile('<.*?>')
    return html_pattern.sub(r'', text)

# recommendation
def get_tfid_recommendation(title, sim):
    idx = indices[title]
    sim_scores = list(enumerate(sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:50]
    product_indices = [i[0] for i in sim_scores]
    idx_range = products.iloc[product_indices]
    
    recommendation = df[df['id'].isin(idx_range.values)]
    return recommendation[['name','PricePerTablet']].sort_values('PricePerTablet').head(10)

In [14]:
df = pd.read_csv('1mg.csv')
df = df.drop_duplicates()
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16867 entries, 0 to 16867
Data columns (total 10 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    16867 non-null  object 
 1   name                  16867 non-null  object 
 2   desc                  16862 non-null  object 
 3   activeIngredient      16867 non-null  object 
 7   manufacturer          16867 non-null  object 
 8   url                   16863 non-null  object 
 9   PricePerTablet        27 non-null     float64
dtypes: float64(1), object(9)
memory usage: 1.4+ MB


In [15]:
df_new = df.reset_index()
products = df_new['id']
indices = pd.Series(df_new.index, index=df_new['id'])
df_new[['id','desc']]

Unnamed: 0,id,desc
0,2 Dep 30mg Tablet,2 Dep 30mg Tablet works by increasing the leve...
1,Glibocer M 500mg/0.3mg Tablet,Glibocer M 500mg/0.3mg Tablet belongs to a cat...
2,Vogliplay M 500mg/0.3mg Tablet,Vogliplay M 500mg/0.3mg Tablet belongs to a ca...
3,Prandial M 0.3 Tablet,Prandial M 0.3 Tablet belongs to a category of...
4,Vogloyd M 500mg/0.3mg Tablet,Vogloyd M 500mg/0.3mg Tablet belongs to a cate...
...,...,...
16862,Zyrova C 5 Capsule,Zyrova C 5 Capsule should be taken with or wit...
16863,Zyrova F 5 Tablet,Zyrova F 5 Tablet can be taken with a meal or ...
16864,Zyrtec OD 10mg Tablet,Zyrtec OD 10mg Tablet can be taken with or wit...
16865,Zytolix P Syrup,"Give Zytolix P Syrup to your child orally, eit..."


In [16]:
# Applying all the functions in description and storing as a cleaned_desc
df_new['desc'] = df_new['desc'].astype(str)

df_new['cleaned_desc'] = df_new['desc'].apply(_removeNonAscii)
df_new['cleaned_desc'] = df_new.cleaned_desc.apply(func = make_lower_case)
df_new['cleaned_desc'] = df_new.cleaned_desc.apply(func=remove_punctuation)
df_new['cleaned_desc'] = df_new.cleaned_desc.apply(func=remove_html)
df_new['cleaned_desc'][0]

'2 dep 30mg tablet works by increasing the level of chemical messengers serotonin and noradrenaline in the brain that have a calming effect on the brain and relax the nerves thus treating your illness it may be taken with or without food it is advised to take this medicine at a fixed time each day to maintain a consistent level in the blood if you miss any doses take it as soon as you remember do not skip any doses and finish the full course of treatment even if you feel better this medication mustn t be stopped suddenly as it may worsen your symptoms some common side effects of this medicine include nausea headache and dry mouth it even causes dizziness and sleepiness so do not drive or do anything that requires mental focus until you know how this medicine affects you however these side effects are temporary and usually resolve on their own in some time please consult your doctor if these do not subside or bother you before taking 2 dep 30mg tablet inform your doctor if you have any 

In [17]:
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(df_new['cleaned_desc'])

In [18]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [8]:
searchstr = 'Aclofen Plus Tablet'
res = get_tfid_recommendation(searchstr,cosine_sim)
res.values

array([['Acec P Syrup', nan],
       ['Acecloflam P Tablet', nan],
       ['Aceclogip-Plus Tablet', nan],
       ['Aceloflam Plus Tablet', nan],
       ['Acemove 100mg/500mg Tablet', nan],
       ['Acent P 100mg/500mg Tablet', nan],
       ['Aceta P 100mg/500mg Tablet', nan],
       ['Achnil P Tablet', nan],
       ['Acphen-P Tablet', nan],
       ['Aczen P Tablet', nan]], dtype=object)

In [9]:
searchstr = 'Aclofen Plus Tablet'
res = get_tfid_recommendation(searchstr,cosine_sim)
res.values

array([['Acec P Syrup', nan],
       ['Acecloflam P Tablet', nan],
       ['Aceclogip-Plus Tablet', nan],
       ['Aceloflam Plus Tablet', nan],
       ['Acemove 100mg/500mg Tablet', nan],
       ['Acent P 100mg/500mg Tablet', nan],
       ['Aceta P 100mg/500mg Tablet', nan],
       ['Achnil P Tablet', nan],
       ['Acphen-P Tablet', nan],
       ['Aczen P Tablet', nan]], dtype=object)

In [19]:
searchstr = 'Advog M 0.3 Plus Tablet'
res = get_tfid_recommendation(searchstr,cosine_sim)
res[['name','PricePerTablet']]

Unnamed: 0,name,PricePerTablet
8871,Medfor V 500mg/0.2mg Tablet,4.1
12077,Prandial M 0.2 Tablet,5.6
1118,Apribose M 0.2 Tablet SR,6.3
1119,Apribose M 0.3 Tablet SR,7.2
1,Glibocer M 500mg/0.3mg Tablet,7.9
373,Advog M 0.2 Tablet SR,8.8
10513,Obimet V 0.2 Tablet PR,8.8
3,Prandial M 0.3 Tablet,9.3
2,Vogliplay M 500mg/0.3mg Tablet,9.6
4,Vogloyd M 500mg/0.3mg Tablet,9.87


In [20]:
searchstr = 'Glibocer M 500mg/0.3mg Tablet'
res = get_tfid_recommendation(searchstr,cosine_sim)
res[['name','PricePerTablet']]

Unnamed: 0,name,PricePerTablet
8871,Medfor V 500mg/0.2mg Tablet,4.1
12077,Prandial M 0.2 Tablet,5.6
1118,Apribose M 0.2 Tablet SR,6.3
1119,Apribose M 0.3 Tablet SR,7.2
373,Advog M 0.2 Tablet SR,8.8
10513,Obimet V 0.2 Tablet PR,8.8
3,Prandial M 0.3 Tablet,9.3
2,Vogliplay M 500mg/0.3mg Tablet,9.6
4,Vogloyd M 500mg/0.3mg Tablet,9.87
5,Welvog MF 500mg/0.3mg Tablet,10.1


In [25]:
# recommendation
def get_new_recommendation(title, sim,topn):
    idx = indices[title]
    sim_scores = list(enumerate(sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:topn]
    product_indices = [i[0] for i in sim_scores]
    idx_range = products.iloc[product_indices]
    
    recommendation = df[df['id'].isin(idx_range.values)]
    return recommendation[['name','PricePerTablet']].sort_values('PricePerTablet').head(10)

In [27]:
searchstr = 'Advog M 0.3 Plus Tablet'
res = get_new_recommendation(searchstr,cosine_sim,10)
res[['name','PricePerTablet']]

Unnamed: 0,name,PricePerTablet
12077,Prandial M 0.2 Tablet,5.6
1118,Apribose M 0.2 Tablet SR,6.3
1119,Apribose M 0.3 Tablet SR,7.2
373,Advog M 0.2 Tablet SR,8.8
375,Advog M 0.3 Tablet SR,10.5
7169,Ibvog-M 0.3 Tablet,12.0
6661,Gluconorm-V 0.3 Tablet SR,14.9
15965,Vogli-M 0.3 Tablet,
16010,Volibo M 0.2 Tablet,
