## Spreadsheet Layout

This spreadsheet has 2 sheets: an Items sheet and Users sheet.

The Items sheet contains the feature values for 100 items and 15 features, along with the weight (singular value) for each feature.

The Users sheet contains the feature values for 20 users and those 15 features.

### Deliverables

The output you are supposed to turn in consists of 3 parts: top movies for 2 features, and recommendations for a user.

#### Top Movies

For the first 2 features, provide the top 5 movies for that feature. Only provide the movie IDs.

#### Recommendations

Compute the top 5 movies for user 4469. Each movie should be scored with the function

$$s_{ui} \sum_{f} a_{uf} \sigma_f b_{if}$$

where $a_{uf}$ is the user weight for feature $f$, and $b_{if}$ is the item weight for feature $f$.

Only provide the movie IDs of the 5 recommendations.


In [27]:
import pandas as pd
import numpy as np

In [28]:
df_items = pd.read_excel('assignment.xlsx', sheet_name='Items')

df_items.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,,Weight,203.615022,121.021257,102.797882,96.830823,85.133738,82.763808,79.835,77.66642,74.607018,73.10941,69.09656,68.136588,66.480305,64.513088,63.569436
1,Movie ID,Title,,,,,,,,,,,,,,,
2,11,Star Wars: Episode IV - A New Hope (1977),-0.106651,-0.094401,0.199949,0.025196,-0.064437,0.276259,-0.206928,0.054093,-0.02595,-0.088964,-0.117164,0.104474,0.049532,0.140639,-0.031206
3,12,Finding Nemo (2003),-0.02398,-0.02696,-0.038067,0.248296,-0.118894,-0.068092,0.064117,0.03739,-0.082457,0.037841,-0.067603,-0.017252,-0.069223,0.063107,0.159913
4,13,Forrest Gump (1994),-0.157042,-0.039889,-0.2244,0.062452,0.098026,0.078255,-0.047104,0.147743,-0.10054,-0.018727,0.076231,0.016129,-0.040507,-0.275025,-0.052564


In [29]:
# format properly
item_weights = df_items.iloc[0, 2:].values

col_names = np.array(['movie_id', 'movie_name'] + [f'feat_{i}' for i in range(1, 16)])

df_items.columns = col_names

df_items = df_items.iloc[2:].set_index(['movie_id', 'movie_name'])

df_items.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,feat_11,feat_12,feat_13,feat_14,feat_15
movie_id,movie_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
11,Star Wars: Episode IV - A New Hope (1977),-0.106651,-0.094401,0.199949,0.025196,-0.064437,0.276259,-0.206928,0.054093,-0.02595,-0.088964,-0.117164,0.104474,0.049532,0.140639,-0.031206
12,Finding Nemo (2003),-0.02398,-0.02696,-0.038067,0.248296,-0.118894,-0.068092,0.064117,0.03739,-0.082457,0.037841,-0.067603,-0.017252,-0.069223,0.063107,0.159913
13,Forrest Gump (1994),-0.157042,-0.039889,-0.2244,0.062452,0.098026,0.078255,-0.047104,0.147743,-0.10054,-0.018727,0.076231,0.016129,-0.040507,-0.275025,-0.052564
14,American Beauty (1999),-0.044468,0.198715,-0.006577,0.00876,0.056093,-0.02037,-0.04615,0.007244,0.008791,-0.067562,0.063375,-0.177547,-0.100531,0.068353,-0.085437
22,Pirates of the Caribbean: The Curse of the Black Pearl (2003),0.036256,-0.132421,0.03641,0.00751,-0.010657,-0.065372,0.092768,0.003816,-0.0819,0.083958,0.16774,-0.09534,0.175285,0.106542,-0.28754


#### Ex. 1) Top Movies

For the first 2 features, provide the top 5 movies for that feature. Only provide the movie IDs.


In [30]:
df_items.feat_1.sort_values(ascending=False).head()

movie_id  movie_name                       
4327      Charlie's Angels (2000)              0.281990
414       Batman Forever (1995)                0.218089
3049      Ace Ventura: Pet Detective (1994)    0.190879
8467      Dumb & Dumber (1994)                 0.190638
854       The Mask (1994)                      0.158601
Name: feat_1, dtype: float64

In [31]:
df_items.feat_2.sort_values(ascending=False).head()

movie_id  movie_name                                  
14        American Beauty (1999)                          0.198715
680       Pulp Fiction (1994)                             0.189565
24        Kill Bill: Vol. 1 (2003)                        0.181570
275       Fargo (1996)                                    0.161559
38        Eternal Sunshine of the Spotless Mind (2004)    0.161058
Name: feat_2, dtype: float64

#### Ex. 2) Recommendations

Compute the top 5 movies for user 4469. Each movie should be scored with the function

$$s_{ui} \sum_{f} a_{uf} \sigma_f b_{if}$$

where $a_{uf}$ is the user weight for feature $f$, and $b_{if}$ is the item weight for feature $f$.

Only provide the movie IDs of the 5 recommendations.


In [32]:
# weights the movie factors by the factors weight
df_items_weighted = pd.DataFrame(df_items.values * np.array(item_weights))
df_items_weighted.columns = df_items.columns 
df_items_weighted.index = df_items.index

In [33]:
print(df_items_weighted.shape)
df_items_weighted.head()

(100, 15)


Unnamed: 0_level_0,Unnamed: 1_level_0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,feat_11,feat_12,feat_13,feat_14,feat_15
movie_id,movie_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
11,Star Wars: Episode IV - A New Hope (1977),-21.715813,-11.424585,20.554339,2.439722,-5.485769,22.864286,-16.520092,4.201176,-1.936072,-6.504127,-8.095605,7.118496,3.2929,9.07308,-1.98374
12,Finding Nemo (2003),-4.882766,-3.262738,-3.913246,24.042696,-10.121929,-5.635559,5.118772,2.90398,-6.151891,2.766519,-4.671159,-1.175484,-4.601981,4.07123,10.165575
13,Forrest Gump (1994),-31.976067,-4.827393,-23.067861,6.047319,8.345353,6.476658,-3.760545,11.474657,-7.500995,-1.369106,5.267317,1.098988,-2.692918,-17.742731,-3.341438
14,American Beauty (1999),-9.054366,24.048763,-0.676076,0.8482,4.7754,-1.685928,-3.684385,0.562623,0.655904,-4.939413,4.379018,-12.097423,-6.683317,4.409654,-5.431168
22,Pirates of the Caribbean: The Curse of the Black Pearl (2003),7.382226,-16.025797,3.742896,0.727223,-0.907291,-5.410461,7.406146,0.296338,-6.110314,6.138135,11.590246,-6.496168,11.652975,6.873359,-18.27873


In [34]:
df_users = pd.read_excel('assignment.xlsx', sheet_name='Users', index_col=0)

df_users.columns = [f'feat_{c}' for c in df_users.columns]
df_users.index.name = 'user_id'

df_users.head()

Unnamed: 0_level_0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,feat_11,feat_12,feat_13,feat_14,feat_15
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
4768,-0.014298,0.014642,-0.008921,0.014074,-0.017659,0.018015,0.005764,-0.010051,0.014683,0.007715,0.010227,0.043056,0.000872,-0.014774,-0.004907
156,-0.013291,-0.016269,-0.009024,-8.4e-05,-0.003538,0.019479,-0.010982,-0.007748,-0.00134,0.014136,-0.001195,-0.005888,0.005631,0.014152,0.02256
5323,-0.008081,-0.008262,-0.00524,0.001877,-0.007379,-0.000531,0.012647,0.011586,0.004024,8.1e-05,-0.008868,-0.002357,0.013291,0.006782,-0.01374
174,-0.015941,-0.024773,-0.001699,0.005521,0.023275,-0.007985,-0.003707,-0.009816,-0.015222,0.021099,0.011536,-0.009982,0.004509,-0.020597,0.006358
4529,-0.001024,-0.009292,-0.010646,0.015831,-0.01337,-0.012996,-0.001516,-0.005744,0.006796,-0.018666,-0.017129,-0.016093,-0.004893,0.016069,0.012877


In [39]:
# performs a cross join to gather all permutations of user and movie
df_cross_joined = pd.merge(
    df_items_weighted.reset_index().assign(key=1),
    df_users.reset_index().assign(key=1)
, on='key')

print(df_cross_joined.shape)
df_cross_joined.head()

(2500, 34)


Unnamed: 0,movie_id,movie_name,feat_1_x,feat_2_x,feat_3_x,feat_4_x,feat_5_x,feat_6_x,feat_7_x,feat_8_x,...,feat_6_y,feat_7_y,feat_8_y,feat_9_y,feat_10_y,feat_11_y,feat_12_y,feat_13_y,feat_14_y,feat_15_y
0,11,Star Wars: Episode IV - A New Hope (1977),-21.715813,-11.424585,20.554339,2.439722,-5.485769,22.864286,-16.520092,4.201176,...,0.018015,0.005764,-0.010051,0.014683,0.007715,0.010227,0.043056,0.000872,-0.014774,-0.004907
1,11,Star Wars: Episode IV - A New Hope (1977),-21.715813,-11.424585,20.554339,2.439722,-5.485769,22.864286,-16.520092,4.201176,...,0.019479,-0.010982,-0.007748,-0.00134,0.014136,-0.001195,-0.005888,0.005631,0.014152,0.02256
2,11,Star Wars: Episode IV - A New Hope (1977),-21.715813,-11.424585,20.554339,2.439722,-5.485769,22.864286,-16.520092,4.201176,...,-0.000531,0.012647,0.011586,0.004024,8.1e-05,-0.008868,-0.002357,0.013291,0.006782,-0.01374
3,11,Star Wars: Episode IV - A New Hope (1977),-21.715813,-11.424585,20.554339,2.439722,-5.485769,22.864286,-16.520092,4.201176,...,-0.007985,-0.003707,-0.009816,-0.015222,0.021099,0.011536,-0.009982,0.004509,-0.020597,0.006358
4,11,Star Wars: Episode IV - A New Hope (1977),-21.715813,-11.424585,20.554339,2.439722,-5.485769,22.864286,-16.520092,4.201176,...,-0.012996,-0.001516,-0.005744,0.006796,-0.018666,-0.017129,-0.016093,-0.004893,0.016069,0.012877


In [56]:
df_user_movie_factors = pd.DataFrame({f'feat_{i}': df_cross_joined[f'feat_{i}_x'] * df_cross_joined[f'feat_{i}_y'] for i in range(1, 16)})

df_user_movie_factors.head()

Unnamed: 0,feat_1,feat_2,feat_3,feat_4,feat_5,feat_6,feat_7,feat_8,feat_9,feat_10,feat_11,feat_12,feat_13,feat_14,feat_15
0,0.310494,-0.167281,-0.183361,0.034337,0.096872,0.411909,-0.095222,-0.042228,-0.028428,-0.05018,-0.082791,0.306493,0.002871,-0.134044,0.009735
1,0.288621,0.185871,-0.185474,-0.000204,0.01941,0.445373,0.18142,-0.032551,0.002594,-0.091941,0.009672,-0.041917,0.018544,0.128404,-0.044754
2,0.175481,0.094386,-0.107706,0.00458,0.040477,-0.012135,-0.208935,0.048675,-0.007791,-0.000527,0.071789,-0.016782,0.043767,0.061531,0.027257
3,0.346162,0.283026,-0.034932,0.013469,-0.127682,-0.18256,0.061239,-0.041239,0.029471,-0.137229,-0.093388,-0.071057,0.014849,-0.186882,-0.012612
4,0.022244,0.106154,-0.218814,0.038624,0.073344,-0.297138,0.02505,-0.024132,-0.013157,0.121405,0.138668,-0.114555,-0.016113,0.145792,-0.025545


In [63]:
user_movie_preference = pd.concat([
    df_cross_joined[['user_id', 'movie_id', 'movie_name']],
    df_user_movie_factors.sum(axis=1).rename('preference')
], axis=1).set_index(['user_id', 'movie_id', 'movie_name'])['preference']

user_movie_preference.head()

user_id  movie_id  movie_name                               
4768     11        Star Wars: Episode IV - A New Hope (1977)    0.389174
156      11        Star Wars: Episode IV - A New Hope (1977)    0.883069
5323     11        Star Wars: Episode IV - A New Hope (1977)    0.214068
174      11        Star Wars: Episode IV - A New Hope (1977)   -0.139366
4529     11        Star Wars: Episode IV - A New Hope (1977)   -0.038172
Name: preference, dtype: object

In [66]:
# top recommendations for user 4469
user_movie_preference.loc[4469].sort_values(ascending=False).head()

movie_id  movie_name                     
278       The Shawshank Redemption (1994)     0.20768
453       A Beautiful Mind (2001)            0.183286
98        Gladiator (2000)                   0.173611
238       The Godfather (1972)                0.17218
13        Forrest Gump (1994)                0.170744
Name: preference, dtype: object