# Correlation-based analysis with Python

In [11]:
import numpy as np
import pandas as pd

Data: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

By: Blanca Vargas-Govea, Juan Gabriel GonzÃ¡lez-Serna, Rafael Ponce-MedellÃ­n. Effects of relevant contextual features in the performance of a restaurant recommender system. In RecSysâ€™11: Workshop on Context Aware Recommender Systems (CARS-2011), Chicago, IL, USA, October 23, 2011.



In [12]:
finalrating = pd.read_csv("finalrating.csv")
cuisine = pd.read_csv("cuisines.csv")
placeslocation = pd.read_csv("places.csv", encoding='ISO-8859-1')

In [13]:
places = placeslocation[['placeID','name']]

In [14]:
ratings = pd.DataFrame(finalrating.groupby('placeID')['rating'].mean())

### Grouping data and counting ratings

In [15]:
ratings['rating_counts'] = pd.DataFrame(finalrating.groupby('placeID')['rating'].count())

In [16]:
top5=ratings.sort_values('rating_counts', ascending=False).head()
top5

Unnamed: 0_level_0,rating,rating_counts
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135085,1.333333,36
132825,1.28125,32
135032,1.178571,28
135052,1.28,25
132834,1.0,25


Restaurant 135085 is the top

### Top restaurant ratings

In [17]:
places_cross = pd.pivot_table(data=finalrating,values='rating',index='userID',columns='placeID')

In [18]:
toprestaurant = places_cross[135085]
toprestaurant[toprestaurant>=0.0]

userID
U1001    0.0
U1002    1.0
U1007    1.0
U1013    1.0
U1016    2.0
U1027    1.0
U1029    1.0
U1032    1.0
U1033    2.0
U1036    2.0
U1045    2.0
U1046    1.0
U1049    0.0
U1056    2.0
U1059    2.0
U1062    0.0
U1077    2.0
U1081    1.0
U1084    2.0
U1086    2.0
U1089    1.0
U1090    2.0
U1092    0.0
U1098    1.0
U1104    2.0
U1106    2.0
U1108    1.0
U1109    2.0
U1113    1.0
U1116    2.0
U1120    0.0
U1122    2.0
U1132    2.0
U1134    2.0
U1135    0.0
U1137    2.0
Name: 135085, dtype: float64

### Similarity and correlation of restaurants to the top restaurant

In [20]:
similartotop = places_cross.corrwith(toprestaurant)
corr_top = pd.DataFrame(similartotop, columns=['PearsonR'])
corr_top.dropna(inplace=True)
corr_top.head()

Unnamed: 0_level_0,PearsonR
placeID,Unnamed: 1_level_1
132572,-0.428571
132723,0.301511
132754,0.930261
132825,0.700745
132834,0.814823


In [23]:
summ = corr_top.join(ratings['rating_counts'])


Unnamed: 0_level_0,PearsonR,rating_counts
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
132572,-0.428571,15
132723,0.301511,12
132754,0.930261,13
132825,0.700745,32
132834,0.814823,25


### Restaurants with more than 10 ratings

In [26]:
summ[summ['rating_counts']>=10].sort_values('PearsonR', ascending=False).head(10)

Unnamed: 0_level_0,PearsonR,rating_counts
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
135076,1.0,13
135085,1.0,36
135066,1.0,12
132754,0.930261,13
135045,0.912871,13
135062,0.898933,21
135028,0.892218,15
135042,0.881409,20
135046,0.867722,11
132872,0.840168,12


#### Restaurants with correlation 1, have the same user giving the same score to the same places

#### Let's see what type of restaurants are the top  places most correlated to the top restaurant

In [32]:
typerestaurantscorrtop = pd.DataFrame([135085,132754,135045,135062,135028,135042,135046], index=np.arange(7), columns=['placeID'])

totalsummary = pd.merge(typerestaurantscorrtop, cuisine, on='placeID')

totalsummary

Unnamed: 0,placeID,Rcuisine
0,135085,Fast_Food
1,132754,Mexican
2,135028,Mexican
3,135042,Chinese
4,135046,Fast_Food


### Which restaurants are at the top?

Top place:

In [35]:
places[places['placeID']==135085]

Unnamed: 0,placeID,name
121,135085,Tortas Locas Hipocampo


Other top restaurant that is also Fast Food

In [37]:
places[places['placeID']==135046]

Unnamed: 0,placeID,name
42,135046,Restaurante El Reyecito


#### How much similar El Reyecito is to Tortas Locas Hipocampo, taking into account the whole set of  different types of restaurants?

In [38]:
cuisine['Rcuisine'].describe()

count         916
unique         59
top       Mexican
freq          239
Name: Rcuisine, dtype: object

There are 59 different cuisine types, nevertheless, there was also another Fast Food place in the top 5 similar restaurants. In conclusion, El Reyecito is a good recommendation given that people like Tortas Locas Hipocampo.