WHY BUILD RECOMMENDER SYSTEMS
Recommender systems are created to find out the items that a user is most likely to purchase. Almost all the e-commerce 
websites these days use recommender systems to make product recommendation at their site. For example, Netflix uses it to
make movie recommendations. If you use Amazon music, then you must have seen the music recommendations which may have helped
you in finding new music. Companies like Facebook, linkedIn, or other social media platforms also use recommender systems to
help you connect with new people.

# 1) collaborative filtering recommender systems

'''
1)  User-based collaborative filtering:

In this model products are recommended to a user based on the fact that 
the products have been liked by users similar to the user. For example vitthal & sachin like same same movies and 
a new movie comes out that Vitthal likes,then we can recommend that movie to Sachin because vitthal and sachin
seem to like the same movies.


2)  Item-based collaborative filtering: 
These systems identify similar items based on users’ previous ratings.For example if users A,B and C gave a 5 star rating 
to books X and Y then when a user D buys book Y they also get a recommendation to purchase book X because the system 
identifies book X and Y as similar based on the ratings of usersA,B and C.

'''

# 2)  POPULARITY BASED SYSTEMS


These systems can be thought as the elementary form of collaborative filtering. The items are recommended based upon how
popular those items are among other buyers or users. For example, a restaurant may be advised to you because it has been 
rated high or has received the most number of positive reviews by the users. So these systems require historical data to
make a suggestion. They are mostly, used by websites like Forbes, Bloomberg, or other news sites. Note – These systems 
cannot make personalized recommendations as they do not take into account the user information.

# 3) CONTENT-BASED SYSTEMS

These recommenders recommend items or products based upon the feature similarity of products. For example,
if you have given a high rate to the hotel facing the beach, then similar hotels will be recommended to you.

In [16]:
############################ Popularity based example ##############################

import pandas as pd
import numpy as np


# read data 

path1=r'E:\ML\Recomm system\chefmozaccepts.csv'
path2=r'E:\ML\Recomm system\rating_final.csv'
path3=r'E:\ML\Recomm system\geoplaces2.csv'    

df1=pd.read_csv(path1)

df2=pd.read_csv(path2)

#df3=pd.read_csv(path3,'rb')

dcuisine=pd.merge(df1,df2, on ='placeID')

# To generate a recommendation based on counts

# Using groupby to group the restaurants and getting the count by rating
count_by_rating = pd.DataFrame(dcuisine.groupby(['placeID'])['rating'].count())

# Arranging the output in descending order and taking head to get the top 5 most popular restaurants
count_by_rating.sort_values('rating', ascending=False).head(5)


Unnamed: 0_level_0,rating
placeID,Unnamed: 1_level_1
132862,90
135032,84
135052,75
135057,60
135025,60


From the above table of top 5 restaurants. The system will recommend the restaurant with id 135032 over
the restaurant with id 135052

In [20]:
######### Correlation-based recommender systems are also called item-based systems ############################################

 
places_geo = pd.read_csv(path3,
                     sep = ",", encoding= 'mbcs')
 
dcuisine.head()
 
# Checking the place_geo data
places_geo.head()
 
 
# Subsetting data by required columns
places_geo =  places_geo[['placeID', 'name']]
places_geo.head()

Unnamed: 0,placeID,name
0,134999,Kiku Cuernavaca
1,132825,puesto de tacos
2,135106,El Rincón de San Francisco
3,132667,little pizza Emilio Portes Gil
4,132613,carnitas_mata


Let us check the rating these places are getting and see how popular these places are. Once we have this information 
we would check the summary statistics for cuisines dataset.

In [24]:

# Average rating by place
average_rating = pd.DataFrame(dcuisine.groupby('placeID')['rating'].mean())
#average_rating.reset_index(level = 0, inplace=True)
average_rating.head()

# We will use count to get how popular these places are
average_rating['rating_count'] = pd.DataFrame(dcuisine.groupby('placeID')['rating'].count())
average_rating.head()

# Generating descriptive statistics
average_rating.describe()


Unnamed: 0,rating,rating_count
count,114.0,114.0
mean,1.192298,20.149123
std,0.336879,17.654618
min,0.25,3.0
25%,1.0,6.0
50%,1.204167,15.0
75%,1.425595,30.0
max,2.0,90.0


In [25]:
# Let’s now sort the dataset by using sort_values() method to get the most popular place in the dataset.
average_rating.sort_values('rating_count', ascending=False).head()

Unnamed: 0_level_0,rating,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
132862,1.388889,90
135032,1.178571,84
135052,1.28,75
135057,1.266667,60
135025,1.666667,60


As a restaurant with placeID 135032 is the one which has a maximum count

For demo purposes, we will see which places can be recommended to users based upon the Pearson correlation 
and rating given by him to other restaurants.

In [27]:



places_geo[places_geo['placeID'] == 135052] # restaurant name is La Cantina Restaurante

# Checking what all cuisines this place serves
dcuisine[dcuisine['placeID'] == 135052] 

# Most of the matrix is sparse as one person can only review few palces
places_geo_table = pd.pivot_table(data = dcuisine, values='rating', index='userID', columns='placeID')
places_geo_table.head()

# Ratings given to el cafetaria restaurant by other users
la_rating = places_geo_table[135052]
la_rating[la_rating>=0]

# Creating the correlation table 
places_similar_to_la = places_geo_table.corrwith(la_rating)

corr_table_la = pd.DataFrame(places_similar_to_la, columns=['PearsonR'])
corr_table_la.dropna(inplace=True) # droping NA values from the sparse table
corr_table_la.head()

# Cominbing with the rating as rating given by other users is required
corr_table_la_summary = corr_table_la.join(average_rating['rating_count'])
corr_table_la_summary[corr_table_la_summary['rating_count']>=10].sort_values('PearsonR', ascending=False).head(10)

Unnamed: 0_level_0,PearsonR,rating_count
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135047,1.0,30
135045,1.0,52
132951,1.0,10
135052,1.0,75
135054,1.0,30
135058,1.0,54
132572,1.0,15
132872,1.0,24
135076,1.0,52
132954,1.0,36


Finally, what we get back here is the list of top 9 places which are similar to el cafeteria restaurant 
based upon their popularity and correlation

In [28]:
################## EXAMPLE CLASSIFICATION BASED RECOMMENDER SYSTEMS #######################################

Classification based algorithm is powered by machine learning algorithms like navie Bayes, logistic regression,
etc. These models are capable of making personalized recommendations because they take into account purchase history, 
user attributes, as well as other contextual data

In [34]:
# loading required libraries
import numpy as np
import pandas as pd

from pandas import Series, DataFrame
from sklearn.linear_model import LogisticRegression

bank_data = pd.read_csv(r'E:\ML\Recomm system\bank.csv')
bank_data.head() # We have 42k observations and 37 variables.

# Seperating independent and taregt variable
x_vars = bank_data.iloc[:, [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]].values
y_var = bank_data["y"]

# Building the logistic model
Logmod = LogisticRegression()
Logmod.fit(x_vars, y_var)

# Creating x_var data for new user
new_user = [[0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1]]
y_pred = Logmod.predict(new_user)
y_pred # The customer will not buy the product if approached.



array(['no'], dtype=object)

In [35]:
#########################   CONTENT-BASED RECOMMENDER SYSTEMS  #################################

In this final Machine learning based recommender system, we will be using an unsupervised algorithm known as KNN
(K Nearest Neighbours). KNN algorithm first memorizes the data and then tells us which two or more items are similarly 
based upon mathematical calculation

In [39]:
import numpy as np
import pandas as pd

import sklearn
from sklearn.neighbors import NearestNeighbors

mtcars = pd.read_csv(r'E:\ML\Recomm system\cars.csv')

# Setting the features similar to Merc 450SL
t = [16, 250, 160, 3.7]
feature_matix = mtcars.iloc[:,[1, 3, 4, 6]].values

# Recommendation is made based upon 2 similar cars
knn = NearestNeighbors(n_neighbors=1).fit(feature_matix)

# printing the recommendation
print(knn.kneighbors([t]))

# Getting the names of the cars
mtcars.iloc[11:12,[0,1, 3, 4, 6]]

(array([[32.6486891]]), array([[11]], dtype=int64))


Unnamed: 0,car_names,mpg,disp,hp,wt
11,Merc 450SE,16.4,275.8,180,4.07
