### Cofactor

Liang's extension of Alternating Least Squares Algorithm. [Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence](https://dl.acm.org/doi/10.1145/2959100.2959182)

It co-factorizes both user-item interaction matrix and SPPMI matrix(kind of item-item co-occurence matrix) with shared item matrix. It claims that two different matrix reveals different information, thus exploiting both matrix will be helpful.

In [1]:
from buffalo.algo.cfr import CFR
from buffalo.algo.options import CFROption
from buffalo.data.stream import StreamOptions
from buffalo.misc import aux
from buffalo.misc import log

In [2]:
opt = CFROption().get_default_option() # initialize default Cofactor option
opt                                    # Check buffalo/algo/options.py to see further.

{'evaluation_on_learning': True,
 'compute_loss_on_training': True,
 'early_stopping_rounds': 0,
 'save_best': False,
 'evaluation_period': 1,
 'save_period': 10,
 'random_seed': 0,
 'validation': {},
 'save_factors': False,
 'd': 20,
 'num_iters': 10,
 'num_workers': 1,
 'num_cg_max_iters': 3,
 'cg_tolerance': 1e-10,
 'eps': 1e-10,
 'reg_u': 0.1,
 'reg_i': 0.1,
 'reg_c': 0.1,
 'alpha': 8.0,
 'l': 1.0,
 'optimizer': 'manual_cg',
 'model_path': '',
 'data_opt': {}}

In [3]:
data_opt = StreamOptions().get_default_option()
data_opt.data.sppmi = {"windows": 5, "k": 10}
data_opt.input.main = 'data/ml-1m/stream'
data_opt.input.uid = 'data/ml-1m/uid'
data_opt.input.iid = 'data/ml-1m/iid'
data_opt.data.value_prepro = aux.Option({'name': 'OneBased'})
data_opt.data.path = './3-cfr.h5py'
data_opt.data.internal_data_type = 'matrix'

In [4]:
cofactor = CFR(opt, data_opt=data_opt)

[INFO    ] 2019-10-04 10:40:41 [stream.py:278] Create database from stream data
[INFO    ] 2019-10-04 10:40:41 [stream.py:101] gathering itemids from data/ml-1m/stream...
[INFO    ] 2019-10-04 10:40:41 [stream.py:125] Found 3706 unique itemids
[INFO    ] 2019-10-04 10:40:41 [stream.py:287] Creating working data...
[INFO    ] 2019-10-04 10:40:49 [stream.py:295] Building data part...
[INFO    ] 2019-10-04 10:40:49 [base.py:362] Building compressed triplets for rowwise...
[INFO    ] 2019-10-04 10:40:49 [base.py:363] Preprocessing...
[INFO    ] 2019-10-04 10:40:49 [base.py:366] In-memory Compressing ...
[INFO    ] 2019-10-04 10:40:50 [base.py:249] Load triplet files. Total job files: 7
[INFO    ] 2019-10-04 10:40:50 [base.py:396] Finished
[INFO    ] 2019-10-04 10:40:50 [base.py:362] Building compressed triplets for colwise...
[INFO    ] 2019-10-04 10:40:50 [base.py:363] Preprocessing...
[INFO    ] 2019-10-04 10:40:50 [base.py:366] In-memory Compressing ...
[INFO    ] 2019-10-04 10:40:51 [b

In [5]:
cofactor.initialize()

In [6]:
cofactor.train()

[INFO    ] 2019-10-04 10:41:03 [buffered_data.py:71] Set data buffer size as 67108864(minimum required batch size is 245).
[INFO    ] 2019-10-04 10:41:03 [cfr.py:207] Iteration 1: Loss 0.000 Elapsed 0.518 secs
[INFO    ] 2019-10-04 10:41:04 [cfr.py:207] Iteration 2: Loss 0.000 Elapsed 0.477 secs
[INFO    ] 2019-10-04 10:41:04 [cfr.py:207] Iteration 3: Loss 0.000 Elapsed 0.486 secs
[INFO    ] 2019-10-04 10:41:05 [cfr.py:207] Iteration 4: Loss 0.000 Elapsed 0.490 secs
[INFO    ] 2019-10-04 10:41:05 [cfr.py:207] Iteration 5: Loss 0.000 Elapsed 0.489 secs
[INFO    ] 2019-10-04 10:41:06 [cfr.py:207] Iteration 6: Loss 0.000 Elapsed 0.415 secs
[INFO    ] 2019-10-04 10:41:06 [cfr.py:207] Iteration 7: Loss 0.000 Elapsed 0.267 secs
[INFO    ] 2019-10-04 10:41:06 [cfr.py:207] Iteration 8: Loss 0.000 Elapsed 0.181 secs
[INFO    ] 2019-10-04 10:41:06 [cfr.py:207] Iteration 9: Loss 0.000 Elapsed 0.320 secs
[INFO    ] 2019-10-04 10:41:07 [cfr.py:207] Iteration 10: Loss 0.000 Elapsed 0.380 secs


{'train_loss': 0.0}

### Recommendation for users

In [7]:
uids = [str(x) for x in range(61, 70)]
recommendation_result = cofactor.topk_recommendation(uids, topk=3)
for uid, iids in recommendation_result.items():
    print(f"for user {uid}, recommendations are ", f"\nitems {iids}.\n")

for user 61, recommendations are  
items ['Patriot,_The_(2000)', 'Perfect_Storm,_The_(2000)', 'Scary_Movie_(2000)'].

for user 62, recommendations are  
items ['Rear_Window_(1954)', 'Witness_(1985)', 'Chinatown_(1974)'].

for user 63, recommendations are  
items ['Austin_Powers:_The_Spy_Who_Shagged_Me_(1999)', 'Blair_Witch_Project,_The_(1999)', 'American_Pie_(1999)'].

for user 64, recommendations are  
items ['Jurassic_Park_(1993)', 'Terminator_2:_Judgment_Day_(1991)', 'American_Beauty_(1999)'].

for user 65, recommendations are  
items ['Braveheart_(1995)', 'Saving_Private_Ryan_(1998)', 'Jurassic_Park_(1993)'].

for user 66, recommendations are  
items ['Braveheart_(1995)', 'American_Beauty_(1999)', 'Airplane!_(1980)'].

for user 67, recommendations are  
items ['Bridge_on_the_River_Kwai,_The_(1957)', 'To_Kill_a_Mockingbird_(1962)', 'Graduate,_The_(1967)'].

for user 68, recommendations are  
items ['Shakespeare_in_Love_(1998)', 'Groundhog_Day_(1993)', 'Toy_Story_2_(1999)'].

for use

### Recommendation for users in given pools

In [9]:
pool = ['Rules_of_Engagement_(2000)', 
        'Remember_the_Titans_(2000)', 
        'Skulls,_The_(2000)', 
        '28_Days_(2000)', 
        'Frequency_(2000)', 
        'Gone_in_60_Seconds_(2000)', 
        'What_Lies_Beneath_(2000)', 
        'Reindeer_Games_(2000)', 
        'Final_Destination_(2000)', 
        'Shanghai_Noon_(2000)']
uids = [str(x) for x in range(5)]
recommendation_result = cofactor.topk_recommendation(uids, topk=3, pool=pool)
for uid, iids in recommendation_result.items():
    print(f"for user {uid}, recommendations are ", f"\nitems {iids}.\n")

for user 1, recommendations are  
items ['Shanghai_Noon_(2000)', 'Frequency_(2000)', 'Remember_the_Titans_(2000)'].

for user 2, recommendations are  
items ['Remember_the_Titans_(2000)', 'Shanghai_Noon_(2000)', 'Frequency_(2000)'].

for user 3, recommendations are  
items ['Shanghai_Noon_(2000)', '28_Days_(2000)', 'Frequency_(2000)'].

for user 4, recommendations are  
items ['Shanghai_Noon_(2000)', 'Final_Destination_(2000)', 'Frequency_(2000)'].



### Find Most similar items

In [10]:
print('Similar movies to Toy_Story_2_(1999) in similar items')
similar_items = cofactor.most_similar('Toy_Story_2_(1999)', 10)
print(similar_items)
for rank, (movie_name, score) in enumerate(similar_items):
    print(f'{rank + 1:02d}. {score:.3f} {movie_name}')


Similar movies to Toy_Story_2_(1999) in similar items
[("Bug's_Life,_A_(1998)", 0.93695074), ('Toy_Story_(1995)', 0.91278535), ('Babe_(1995)', 0.8598581), ('Shakespeare_in_Love_(1998)', 0.84673494), ('Being_John_Malkovich_(1999)', 0.83271587), ('Election_(1999)', 0.8022457), ('American_Beauty_(1999)', 0.788048), ('South_Park:_Bigger,_Longer_and_Uncut_(1999)', 0.77576375), ('Groundhog_Day_(1993)', 0.7618997), ('Aladdin_(1992)', 0.7471918)]
01. 0.937 Bug's_Life,_A_(1998)
02. 0.913 Toy_Story_(1995)
03. 0.860 Babe_(1995)
04. 0.847 Shakespeare_in_Love_(1998)
05. 0.833 Being_John_Malkovich_(1999)
06. 0.802 Election_(1999)
07. 0.788 American_Beauty_(1999)
08. 0.776 South_Park:_Bigger,_Longer_and_Uncut_(1999)
09. 0.762 Groundhog_Day_(1993)
10. 0.747 Aladdin_(1992)


### Find Most similar items given pool

In [12]:
pool = ['Rules_of_Engagement_(2000)', 
        'Remember_the_Titans_(2000)', 
        'Skulls,_The_(2000)', 
        '28_Days_(2000)', 
        'Frequency_(2000)', 
        'Gone_in_60_Seconds_(2000)', 
        'What_Lies_Beneath_(2000)', 
        'Reindeer_Games_(2000)', 
        'Final_Destination_(2000)', 
        'Shanghai_Noon_(2000)']
similar_items = cofactor.most_similar('Toy_Story_2_(1999)', 5, pool=pool)
for rank, (movie_name, score) in enumerate(similar_items):
    print(f'{rank + 1:02d}. {score:.3f} {movie_name}')

01. 0.385 Shanghai_Noon_(2000)
02. 0.379 28_Days_(2000)
03. 0.364 Frequency_(2000)
04. 0.297 Gone_in_60_Seconds_(2000)
05. 0.224 Final_Destination_(2000)
06. 0.195 What_Lies_Beneath_(2000)
