# Notes

This notebook was originally created and stored on Google Colaboratory. For convenience, output has been saved according to the intended input (stored on Google Drive). This is for display and not intended to be ran locally.

# Simulating user-side recommendation
This notebook will simulate what will happen on the user's side: a user will 'download' the U and V matrix (built from clustering and preforming WNMF) and from a subset of the user's own reviews, find the most similar cluster. Once the most similar cluster is found the remaining reviews will be tested for error against the cluster's predicted value.

In order to run the simulation, this module will also handle test user selection. 

In [0]:
# Mounting drive
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
# DONE: read U and V files
# DONE: from Jonathon's user dict, read it and find the top users.
# DONE: from the top users, find their reviews and make a dictionary <User, {set of users' businesses they reviewed}>
# DONE: from each users review set, compare how many businesses are in the V matrix
# DONE: for each user with at least 3 businesses in the V matrix, preform cross-validation for each user (separate it as evenly as possibly, find which cluster it belongs to for the subset, test the rest of them for prediction, repeat)

## Reading the U and V matrix
We import pandas and numpy, then assign the main filepaths here. Finally we use pandas to read the files as dataframes.

In [0]:
import pandas as pd
import numpy as np
import scipy.stats

filepath_u = '/content/gdrive/My Drive/Summer Research 2019/Individual Work Folder/Minh/u.csv'
filepath_v = '/content/gdrive/My Drive/Summer Research 2019/Individual Work Folder/Minh/v.csv'

df_u = pd.read_csv(filepath_u, index_col=0)
df_v = pd.read_csv(filepath_v, index_col=0)

df_v.head(10)

Unnamed: 0,"""A-Ri-Rang"": ZMcbVIEXsLO7j1Q1GXKPSw","""Activities and Recreation Center - ARC"": zAD08AT1GgX-cVNy9iMq4A","""Aldi"": tDTUSKxPUUkpFaxBZZl4Cw","""Alexander's Steakhouse"": CpNMXASiwtJv5eCDf0n63g","""Alexandra's"": rOOsFYRPiTOcOnaiChJklQ","""All Creatures Animal Hospital"": zkieW2F82--bbOIRhOCj0g","""Alto Vineyards"": M8G8S2takaE_NOVutGzEkg","""Am-Ko Oriental Foods & Gifts"": zPPw_vUatLmvnRpW1OoX6w","""Amtrak"": jZeLcFDCfBhAxjq1Owe65A","""Anderson Dental"": N-IwlZQimaUrxYF_jGUCOg","""Animal Emergency Clinic of Champaign County"": BhfQ8Rh4L-ca_so-3QjWoA","""Anita Purves Nature Center"": vV6oNhupyM8gEhjmhqLPjQ","""Antonio's Pizza"": BoTQepQTjGbTXA4g4Hj-WA","""Applebee's Neighborhood Grill & Bar"": r_QcMIgY2zxdcV40kK_rxw","""Armored Gopher Games"": cn999G2xZNMj8ZYDot0C8w","""Aroma Cafe"": 0FrYsoVHheQGoXEQsH2d2Q","""Art Coop"": gZI007iLBe_LZbrHSDxHtg","""Art Mart"": dVBp-ayxdvwG95BYfqKwDw","""Assembly Hall"": V7CbZXLAG_wg-wZfQxCZcA","""Atlanta Bread"": nG59soBP2SaQ25rf2n-0bA","""B Won"": C_Zj7H7bUJ5kGCwFmEwW6w","""BJ Grand Salon & Spa"": ChjKd6HgLKmUAQnsrpDSBA","""Bacaro"": jeTfL2kCyBtmFGSrSQHqVw","""Bacca Cigar"": u68_uFbvsqrwZVeHo-a71g","""Bagelmen's Inc"": xo_53Ec0MacJEC5ux7dWhQ","""Bar Guiliani"": tEkUEOsxj9h0pzClYrGXMQ","""Barfly"": 67STv2NXNvt7K8sdbjqetQ","""Basil Thai Cafe"": XtJj67rKT16a4tQw7bxtyw","""Basil Thai Urbana"": w5Y_FiGPSlPemJVxAZZToQ","""Basmati Indian Cuisine"": 7U7uLS9YjjhBM1SoyNH30w","""Bella Mia Boutique"": 2t8Z3VwvV6YpW2gnP-VHig","""Bentley's Pub"": u4D2VypIx3nMsRRbOwx-sA","""Bevande Coffee"": N5PfEojrY4rFqpqzno4aZg","""Biaggi's Ristorante Italiano"": qeJnMI5RmyJ2TZvkqvjn6Q","""Big Lots - Champaign"": OYpRlyukZZh9zH2mxxv4rg","""Big Mouth's"": Z2W3K8x9cRGXvlGWerZH-w","""Billy Barooz Bar & Grill"": PmVxbit6HDDsEUS-j9aDfg","""Blain's Farm & Fleet"": tLgKerW16F9V239MjNA2PA","""Blues"": Q2bnRzJ8AC-3lWyQY8DZqA","""Bo Bo China"": EVkytEhlC1nswqmgrHdviw",...,"""The Bread Company"": Ah4i15g8Ow_zphzcpulTxQ","""The Cake Artist's Studio"": rNbAIbTn0zN8yNk0IV7dfQ","""The Canopy Club"": SVXpyYPAuvJVKcfZ0nMKyg","""The Estate Sale"": pWxCiLYvlUHd9TVJ7n0mEQ","""The Fubar Lounge"": PWs6xJQJPHxknd3FcdKn9Q","""The Great Impasta"": pH72Y8aqqJlq3bgtj-d3UA","""The Habitat for Humanity ReStore"": fZQ3QjGMRELHG9f0WnN8vw","""The Home Depot"": 6vsjWxIMHs-34L9wuTBJRw","""The New Sweet Indulgence"": nplkC6vnh4qT9xH-vhup6w","""The Pita Pit"": DS3-yphtWDHAdXLtIoqdAA","""The Red Herring Vegetarian Restaurant"": kKCwp86xU9XKRnAALQDhrw","""The Ribeye"": PBmfdx-tC2D54FI3HtcKww","""The Y Eatery"": aKiE0aZ6vGyOlH-uVOHONw","""Thomas M. Siebel Center for Computer Science"": jDlvFXuxis4rC2NcWQbqig","""Timpone's"": 0grgvnq4GgoY-estWytUhg","""Tuesday Morning"": RDHH5aVSsblGfOf17P2Tjg","""UPS"": LQRcuOgluaRRMyTH4i3FPw","""University Group"": Hsvqv6AQQ13SvO0uGFu1zg","""University High School"": AqzK7Dr-9zMoxBpsZNlkoA","""University of Illinois Veterinary Teaching Hospital"": D5quwbQTguD4hqMAEARnBg","""University of Illinois"": ZiLOXVloAOtr9_cpQTpr8g","""Urbana Dog Park"": FM6hRJtjDwNeZWa64wpdRg","""Urbana Free Library"": wmhFEY8IctqfSBO0NDcb1w","""Wal-Mart"": 4VYi3I-nVttZeOmrGEXHjQ","""Walnut Street Tea"": pLBF8QrRkMicJcsjAqW7ag","""Ward & Associates Realtors"": yL0PKzDCONu3jHpJG4tQSQ","""Wienerschnitzel"": QHbrfA9nSNLFcevsfQk6dQ","""Will Am-Fm Radio"": YBNS9AHuSSMSVPGlsgvqbA","""William M Staerkel Planetarium"": DNBKTu7qoN_18JIybyIVPw","""Wingstop"": 4ZAy5ZCTWbduzzhtR0egHQ","""Women's Health Practice"": c_pScQ7eKYTesWkcqDMn-g","""Wonderdogs"": SuQpsHxcxCAB8kxPEIXiBg","""Woori Jib Restaurant"": GWS5s8HSm1zjqMLJD7wQLg","""World Harvest Foods"": LElQBiDw8HyU_LrbjP8C1A","""Yellow Checker Cab"": J6TGHzwKKq5bqBR4_WsnUA","""Yellowfin Japanese Restaurant"": C31ExBTn_6UxbTVkWPtNkg","""Yuko Hair Salon"": 4LXNVWV_Yp8HbkikmRQ2gg","""Za's Italian Cafe"": FUr2uEolARu7rv2TQYKPqg","""Zelma's"": h2vFVV5pJjn8arA7GR6mQg","""Zorba's Restaurant"": kyXEnWKQGWSThY6EcjORuw"
0,0.589713,0.189171,0.282108,0.008324,0.924193,6.457445e-06,0.934173,0.765077,0.151248,0.008078,1.772545e-05,0.902004,0.652465,2.787796e-05,0.293195,0.710923,1.376052,0.710693,0.485849,0.900254,1.648884,0.011148,1.093057,0.204138,0.306971,0.024081,0.076519,0.108534,0.884378,0.107771,1.136343,0.840836,0.000286,1.013348,0.119995,0.241378,0.654435,0.702194,0.445306,0.058285,...,0.176774,0.188668,2.953435,0.86718,0.439567,0.463981,0.257662,0.868923,2.466121,0.371996,0.952288,0.161906,0.403039,0.53997,1.017565,0.082497,0.073979,0.518484,1.102034,0.995094,0.631408,0.802216,0.665364,0.091523,0.431139,0.409727,0.422908,0.62877,0.925807,0.352822,1.026689,1.363333,0.065443,0.425378,0.455914,0.444654,0.679641,0.133681,0.418355,0.520488
1,0.391738,0.136206,0.622061,0.608884,0.64254,1.312524e-07,0.8973,0.514578,0.286113,1.432604,6.684977e-08,0.934829,0.799282,6.580644e-08,0.951006,0.34781,0.444846,0.160346,0.528664,0.146201,0.281419,0.102423,0.043053,1.379008,0.315083,0.490884,0.326649,0.063139,0.268315,0.462796,0.526077,1.114492,0.004117,0.186216,1.429729,0.052492,0.536867,0.706852,0.377622,0.11901,...,0.429592,1.070456,1.377212,0.967518,0.681314,0.279585,0.955212,0.41857,0.500346,0.165281,0.41428,0.265057,0.746115,0.828622,0.473509,0.305427,0.19936,0.09521,0.244475,0.283766,1.23562,0.45818,0.388159,0.107357,0.565355,1.38646,0.440123,0.573318,0.569842,0.804273,1.052091,0.829529,0.338789,0.502706,0.440383,0.832275,2.615745,0.131151,0.07223,0.208539
2,1.189029,0.169157,0.70695,0.710809,0.362059,0.01972971,1.206264,0.656842,0.065729,0.144091,0.03961086,0.622902,1.09208,0.0234092,1.626076,0.104648,0.965091,0.018876,0.540213,0.585629,0.027076,0.307807,0.470368,0.560605,0.62452,0.764794,0.537788,0.293874,0.388724,0.057298,0.956498,0.740492,0.132899,0.571687,0.066783,0.026834,0.004865,0.297992,1.75172,0.113127,...,1.740426,0.892172,1.467027,0.176546,0.300571,0.207345,0.127996,0.869158,0.366832,0.939188,0.38477,0.627552,0.482316,1.1521,0.8699,0.342133,0.381838,0.863323,0.042251,0.759708,0.719868,0.878097,0.514145,0.141478,0.921244,0.081631,0.42912,0.406442,0.702434,1.34972,0.218797,0.515582,0.150749,0.192148,0.466017,0.685539,0.283234,0.810259,0.135706,0.387302
3,0.055995,0.535808,0.770573,1.110093,0.609252,0.0001596103,0.941448,0.778174,0.289376,1.738331,0.000398102,0.393492,0.847321,6.767752e-05,0.819652,0.11772,0.886114,1.31392,0.277134,0.001106,1.212168,0.422252,0.606266,0.825224,0.268252,0.489355,0.619689,0.218576,0.622371,0.206777,0.690922,0.511288,0.149617,0.325459,0.590811,0.786785,1.07178,0.300297,0.077989,0.250182,...,0.254752,0.537518,0.339379,1.023681,0.809596,0.579334,0.310806,0.645065,2.889778,0.92669,0.315694,0.459882,0.201945,0.595135,0.275094,0.925972,0.107772,0.323629,1.309226,0.431501,0.859781,0.622321,1.742378,0.174696,0.966679,0.627103,0.794772,0.167325,0.577108,0.597989,0.895877,1.169047,0.352029,0.88364,0.033798,0.838171,0.123009,0.01964,0.242258,0.421377
4,0.661624,0.803223,1.446443,1.006778,0.951002,1.255637e-05,0.688446,0.256011,0.148048,2.834012,1.990955e-05,0.538418,0.074134,6.423253e-06,0.139665,0.760387,0.660003,1.119958,0.54429,0.933413,0.584557,0.701335,0.33555,1.222423,0.821091,0.809509,0.608977,0.153559,0.732273,0.057502,0.076084,0.553968,0.179098,1.188253,1.59505,0.394814,0.81722,0.872954,0.859447,0.160547,...,1.588851,0.91916,0.483613,0.705884,0.321052,1.155256,0.772558,0.815612,1.21194,1.262292,1.002779,0.950409,0.720954,0.221389,0.273157,0.868095,0.034848,0.521269,1.23304,1.005153,0.606342,0.667869,0.255614,0.075825,0.644462,0.503963,0.451706,0.733073,0.353228,1.243764,0.072548,0.243086,0.549932,0.604454,0.233715,0.363174,1.482595,0.511053,0.630517,0.675801
5,0.896634,0.58813,0.48837,0.25749,0.883867,1.073903e-05,0.238392,0.506868,0.445277,3.508531,2.804425e-05,0.693484,0.426705,4.149154e-06,0.520509,0.598607,1.078254,0.041161,0.853913,0.663283,0.887214,0.613741,0.852005,0.655549,0.413661,1.218929,0.802171,0.424765,0.163801,0.145793,0.20829,0.717759,0.089006,0.280572,0.612773,0.562174,0.939384,0.608053,0.833929,0.064087,...,0.441089,0.019465,3.651987,0.816222,0.598329,1.036593,1.038865,1.00455,0.15877,0.188548,0.477352,1.033173,0.578619,0.464956,0.248637,0.180899,0.112447,0.306178,0.160855,0.627255,0.186604,1.113064,0.404103,0.090317,1.04754,1.239518,0.257769,0.061259,0.752314,1.111051,0.985725,0.101026,1.177544,0.884832,0.618423,0.358765,1.142961,1.276591,0.181649,0.960736
6,0.13538,0.419655,0.232999,0.35178,0.649391,0.4174487,0.977374,0.350075,0.413659,1.532904,0.2514423,0.480431,0.644079,0.1072541,0.753807,0.290615,0.663014,0.794164,0.423965,0.34091,0.443438,1.187625,0.996054,0.383254,0.92748,0.937881,1.016441,0.183268,0.502078,0.473377,0.410854,0.487049,0.088246,1.068596,0.078401,0.052021,1.070153,0.481936,1.395284,0.064311,...,0.128217,0.521107,3.195895,0.985736,0.781834,1.0499,0.170091,0.599052,0.19122,0.709341,0.134901,1.004306,0.596194,0.779434,0.606234,0.369211,0.233007,0.89412,1.149456,0.880705,0.092117,0.596701,1.074696,0.104892,0.623808,0.474902,0.381567,0.797127,0.9362,0.164373,0.101434,0.454118,1.258627,1.402767,0.01303,0.41884,0.869813,0.506171,0.121842,1.019517
7,0.205893,0.022479,1.173735,0.610239,0.626294,3.088715e-05,0.234649,0.706842,0.065687,1.632987,0.0006305407,0.79903,0.355066,0.0002909128,0.929188,0.576518,0.452111,0.669071,0.126197,0.529206,0.698403,0.409997,0.8617,0.305656,0.69375,0.004338,0.470956,0.438067,0.465027,1.181688,1.015902,0.606693,0.119467,0.848512,0.110736,0.465638,0.032729,0.421894,0.608745,0.207064,...,0.435401,0.154675,1.597275,0.860528,0.286159,0.395551,0.977899,0.084607,0.419121,0.963176,0.752528,0.846915,0.246007,0.291205,0.051646,0.128666,0.132177,0.210386,0.448857,0.221601,0.545115,0.645219,1.208969,0.122881,0.436406,0.831353,0.054422,0.303037,0.832902,1.071131,0.716114,0.911564,0.774466,0.512419,0.61481,0.876847,0.72866,0.636091,0.632421,0.609339
8,0.531713,0.176918,1.050099,0.337256,0.376139,0.00860019,0.212578,0.468602,0.252834,0.178402,0.02646276,0.906079,0.034391,0.01310506,0.201375,0.416404,0.459037,0.874461,0.952996,1.124244,0.119557,1.246567,0.962046,1.075295,0.434106,0.883308,0.02922,0.458986,0.119722,0.149597,0.068966,0.163759,0.125949,0.022694,0.319815,0.329042,0.357314,0.941471,0.345396,0.448417,...,0.922493,1.027512,1.179707,0.042115,0.701557,0.119606,1.390154,0.378163,0.288624,0.294043,0.084835,0.176887,1.068967,0.971554,0.073587,0.427212,0.147793,0.210525,0.365723,1.070932,0.11513,0.330118,0.579218,0.01803,0.471806,0.847626,0.750288,0.551843,0.645873,0.433706,0.391249,0.149208,0.373486,1.013338,0.670399,0.380213,1.764376,0.630877,0.507101,1.029708
9,1.085207,0.716055,1.224584,0.864598,0.942368,0.07910053,0.477108,0.517588,0.548543,0.319243,0.2242999,0.721456,0.718207,0.1145224,0.288127,0.856863,0.746808,0.761014,0.253596,0.311505,0.607095,0.552635,0.437212,0.721994,0.818724,0.008762,0.277414,0.220039,0.175931,0.383127,0.473337,0.940031,0.23337,0.02643,0.973064,1.019772,0.59922,0.999497,0.634946,0.057575,...,0.052075,0.815392,1.406875,0.338582,0.547535,0.699418,0.523015,0.942483,0.531331,0.520849,0.754136,1.058981,0.383512,0.904047,1.054376,0.011025,0.139603,0.652363,1.081893,0.43905,0.915967,0.950935,0.404433,0.294635,0.91432,0.099777,0.048483,0.787441,0.645783,1.025563,0.942776,0.282483,0.945599,0.972663,0.146385,0.6941,0.154179,0.682208,0.532489,1.297484


1. Reading a table of users by how many years they've been active in
2. Reading the review file: this is used to find what the test users have rated

In [0]:
filepath = '/content/gdrive/My Drive/Summer Research 2019/Individual Work Folder/Jonathan/user_years_with_rating.csv'
df_y = pd.read_csv(filepath, index_col=0)

filepath = '/content/gdrive/My Drive/Summer Research 2019/Individual Work Folder/Ian/yelp_detailed_small.csv'
df_r = pd.read_csv(filepath)

df_y.head(10)

Unnamed: 0,user_name_id,years_with_rating
0,T. A.: ZT70ZzpVrWl013A-sId3Kw,9
1,Lisa: qmjoMFMZdLH69_6eGTGDZw,9
2,Jamie: D7vqQ2D1mmj_1EV1lNXsnQ,8
3,Mariam: VQKHUfAhPJ5wmNthq62mfQ,8
4,J: BsK9Oy3pTJ0SCdaKsgx_AQ,8
5,Edward: 5QOtcHU1SoqEqBCRR6FhsA,8
6,Heather: ymi0YEAIyZm9DN0H-OBb5A,8
7,Grace: Jiw1kr7W1IbbqL1p_V7d1A,8
8,Tony: StYuEMDMWO1IOdMz5zgE8Q,8
9,Kelly: 1TVhmXcDVqFTtyJNKEFVxw,8


## Getting the user test set
Here we extract the top x users as canidates for testing

In [0]:
# from the top users, find their reviews and make a dictionary <User, {set of users' businesses they reviewed}>
# x is assigned for the amount of users we want to search through
x = 100

df_y = df_y.iloc[0:x]
df_y = df_y['user_name_id'].tolist()

df_y[0:5]

['T. A.: ZT70ZzpVrWl013A-sId3Kw',
 'Lisa: qmjoMFMZdLH69_6eGTGDZw',
 'Jamie: D7vqQ2D1mmj_1EV1lNXsnQ',
 'Mariam: VQKHUfAhPJ5wmNthq62mfQ',
 'J: BsK9Oy3pTJ0SCdaKsgx_AQ']

In [0]:
dict_ub = {}
for i in df_y:
  df_temp = df_r[df_r['user_name_id'] == i]
  s = set(df_temp['business_name_id'])
  dict_ub[i] = s
len(dict_ub[df_y[0]])

54

## Comparing user review
We take the set from each user and intersect it with the total business set, and map it as the new set within the dict

In [0]:
s = set(df_v.columns)
for i in dict_ub.keys():
  dict_ub[i] = s.intersection(dict_ub[i])
len(dict_ub[df_y[0]])

17

## Generating predictions w/ cross-validation
For each user, we iterate through their set by:

1.   Multiply U and V
2.   Selecting one element out of the set to exclude
3.   Running Pearson correlation against the clusters with remaining set to find the most common cluster
4.   See what rating is predicted

In [0]:
# multiplying U and V
# converting to numpy to multiply
u = df_u.to_numpy()
v = df_v.to_numpy()
uv = np.matmul(u, v)
# converting back to dataframe
uv = pd.DataFrame(uv, columns = df_v.columns)

# list of error 
i_abs_diff = []

# For every <User, Set> pair
for i, j in dict_ub.items():
  # we exclude testing on any persons with less than 3 common businesses
  if len(j) < 3:
    continue
  # make a pivot table to extract the user's reviews
  temp_pivot = df_r[df_r['user_name_id'] == i]
  temp_pivot = temp_pivot.pivot(index = 'user_name_id', columns = 'business_name_id', values = 'stars')

  # For each item k in set j
  for k in j:
    temp_s = j.copy()

    # we test as if item k is not part of the set
    temp_s.remove(k)
    temp_s = list(temp_s)
    closest_cluster = (-1, -1)
    # for each row (cluster) in the UV matrix 
    for m in range(len(uv)):
      # make a list for both the UV and the user's reviews
      uv_list = [uv.loc[m][n] for n in temp_s]
      user_list = [temp_pivot.loc[i][n] for n in temp_s]
      pearsons = scipy.stats.pearsonr(uv_list, user_list)
      if (pearsons[1] > closest_cluster[1]):
        closest_cluster = (m, pearsons[1])
    # once the closest cluster has been found, we perform the tests
    if(closest_cluster[0] != -1):
      i_abs_diff.append(abs(uv.loc[closest_cluster[0]][k] - temp_pivot.loc[i][k]))
# print out MAE for entire set
print(sum(i_abs_diff) / len(i_abs_diff))



1.8392121121106115


### Improving prediction generation
Based on Dr. Wang's feedback, we can try these changes and see if there are any improvements:


*   Use cosine similarity instead of pearson's
*   Considering top-*k* clusters for prediction instead of just the single nearest cluster
*   For optimization, only multiply U and V by the user's rated businesses, instead of the full matrix which is not necessary for a particular user.



In [0]:
# k = number of clusters to consider
k = 3

error = []
# For every <User, Set> pair
for i, j in dict_ub.items():
  # v is the subset of (latent factors x Businesses) where the businesses 
  # are both in the volunteer set and i's (the current test user's) review set.
  # Everytime we want to multiply by only the users businesses, we need to remake the v matrix
  v = df_v[[x for x in j]]
  v_columns = v.columns
  
  # convert U and V to numpy arrays
  v = v.to_numpy()
  u = df_u.to_numpy()

  # matmul and convert back to DataFrame  
  uv = np.matmul(u, v)
  uv = pd.DataFrame(uv, columns = v_columns)

  # we reduce the df_r (review set) down to i's reviews and sort by date
  r = df_r[df_r['user_name_id'] == i]
  r = r.sort_values('date')
  r = r[r.business_name_id.isin(j)]

  # We run tests from 2 to n - 1 user reviews (a user requires a minimum of 3 reviews). Each test will: run cosine similarity, find the top 3 clusters, and compare the user's review for x business.
  
  for x in range(2, len(r)):
    # Select the business to test
    currb = r.iloc[x, :]['business_name_id']

    # Generate the list of businesses to run cosine similarity
    testset = r.iloc[0:x, :]['business_name_id'].tolist()
    l1 = [r[r['business_name_id'] == business].iloc[0]['stars'] for business in testset]

    # For each cluster, we take the businesses currently being tested and run cosine similarity
    csd = []
    for cluster in range(len(uv)):
      l2 = [uv.iloc[cluster][business] for business in testset]
      csdistance = scipy.spatial.distance.cosine(l1, l2)
      csd.append((csdistance, cluster))
    csd.sort(reverse = True)
    csd = csd[0:k]
    # With the top-k clusters in csd, we generate a prediction on currb, multiplying the score from the cluster by its weight
    weights = [cluster[0] for cluster in csd]
    sw = sum(weights)
    weights = map(lambda weight: weight / sw, weights)
    prediction = 0
    for weight, cluster in zip(weights, csd):
      prediction += (uv.iloc[cluster[1]][currb] * weight)
    error.append(abs(r.iloc[x, :]['stars'] - prediction))
sum(error) / len(error)

1.4307514088615507