## 1. Matrix Factorization
In this notebook, we show how to run [ALS](http://yifanhu.net/PUB/cf.pdf) and [BPR-MF](https://arxiv.org/pdf/1205.2618.pdf) with Buffalo

In [1]:
import buffalo
from buffalo import ALS, BPRMF
from buffalo import aux, log
from buffalo import ALSOption, BPRMFOption
from buffalo import MatrixMarketOptions
log.set_log_level(1) # set log level 3 or higher to check more information

In [2]:
MODEL_TO_USE = "ALS"
# MODEL_TO_USE = "BPR"# un-comment this if you want to use BPR

### To Run buffalo model, you have to set two options.
    - model option
    - data option

### Model Option

In [3]:
if MODEL_TO_USE == "ALS":
    opt = ALSOption().get_default_option()  
elif MODEL_TO_USE == "BPR":
    opt = BPRMFOption().get_default_option()

you may change other the option values
```
   opt.key = val
```

for example, one can set validation option.

In [4]:
opt.evaluation_on_learning =  True
opt.validation = aux.Option({'topk': 10})

`opt.validation = aux.Option({'topk': 10})` means we evaluate the model using validation data by top@10 metric

`opt.evaluation_on_learning =  True` makes Buffalo model do evaluation during training


#### Options are shown below

In [5]:
opt

{'evaluation_on_learning': True,
 'compute_loss_on_training': True,
 'early_stopping_rounds': 0,
 'save_best': False,
 'evaluation_period': 1,
 'save_period': 10,
 'random_seed': 0,
 'validation': {'topk': 10},
 'adaptive_reg': False,
 'save_factors': False,
 'accelerator': False,
 'd': 20,
 'num_iters': 10,
 'num_workers': 1,
 'hyper_threads': 256,
 'num_cg_max_iters': 3,
 'reg_u': 0.1,
 'reg_i': 0.1,
 'alpha': 8,
 'optimizer': 'manual_cg',
 'cg_tolerance': 1e-10,
 'block_size': 32,
 'eps': 1e-10,
 'model_path': '',
 'data_opt': {}}

To see full description of options, see `Algooption`, `ALSOption`, and `BPROption` in `buffalo/algo/options.py`

an option of one model is different from an option of other type of model

### Data Option

In [6]:
data_opt = MatrixMarketOptions().get_default_option()

Similar to model option, data option can be set in this way.
```
    data_opt.key = val
```


You must set `data_opt.input.main` option.

This should be the path of input data(matrix market or stream)

In [7]:
data_opt.input.main = 'data/ml-1m/main.mtx'

Additionally, we can set list of itemids, and list of userids also

By doing so, you can query similar users/items or recommendations by itemids or userids.

In [8]:
data_opt.input.iid = 'data/ml-1m/iid'
data_opt.input.uid = 'data/ml-1m/uid'

In [9]:
data_opt

{'type': 'matrix_market',
 'input': {'main': 'data/ml-1m/main.mtx',
  'uid': 'data/ml-1m/uid',
  'iid': 'data/ml-1m/iid'},
 'data': {'internal_data_type': 'matrix',
  'validation': {'name': 'sample', 'p': 0.01, 'max_samples': 500},
  'batch_mb': 1024,
  'use_cache': False,
  'tmp_dir': '/tmp/',
  'path': './mm.h5py',
  'disk_based': False}}

### Open Data

#### You can open data in two ways
- open data when initializing model
- open data directly

There is no difference

#### open data when initializing model

In [10]:
model = ALS(ALSOption().get_default_option(), data_opt=data_opt)
del model

it opens the data when loading model (indirect way)

#### open data directly

In [11]:
import buffalo

In [12]:
data = buffalo.data.load(data_opt)
data.create()
model = ALS(ALSOption().get_default_option(), data=data)
del data
del model

it opens data dirctly, and passes the opened data to the model
afterwards, we will use opened data 

In [13]:
data = buffalo.data.load(data_opt)
data.create()

In [14]:
if MODEL_TO_USE == "ALS":
    model = ALS(opt, data=data)
elif MODEL_TO_USE == "BPR":
    model = BPRMF(opt, data=data)
model.initialize()

In [15]:
val_res = model.train()

In [16]:
val_res

{'train_loss': 0.2804447780030053,
 'val_ndcg': 0.053509737512824056,
 'val_map': 0.036605582307829496,
 'val_accuracy': 0.10280898876404494,
 'val_auc': 0.5500847197037205,
 'val_rmse': 2.9060066759494854,
 'val_error': 2.713486196756363}

### Saving and Loading model

In [17]:
!mkdir model

mkdir: model: File exists


In [18]:

model.save("model/model-ml-1m")
del model
if MODEL_TO_USE == "ALS":
    model = ALS()
elif MODEL_TO_USE == "BPR":
    model = BPRMF()
model.load("model/model-ml-1m")

### Recommendation for users

In [19]:
uids = [str(x) for x in range(61, 70)]
recommendation_result = model.topk_recommendation(uids, topk=3)
for uid, iids in recommendation_result.items():
    print(f"for user {uid}, recommendations are ", f"\nitems {iids}.\n")

for user 61, recommendations are  
items ['Rules_of_Engagement_(2000)', 'Remember_the_Titans_(2000)', 'Skulls,_The_(2000)'].

for user 62, recommendations are  
items ['Midnight_in_the_Garden_of_Good_and_Evil_(1997)', 'Bonnie_and_Clyde_(1967)', 'Coming_Home_(1978)'].

for user 63, recommendations are  
items ['Eyes_Wide_Shut_(1999)', 'Summer_of_Sam_(1999)', 'Go_(1999)'].

for user 64, recommendations are  
items ['Jurassic_Park_(1993)', 'Braveheart_(1995)', 'Star_Wars:_Episode_VI_-_Return_of_the_Jedi_(1983)'].

for user 65, recommendations are  
items ['Air_Force_One_(1997)', 'Patriot,_The_(2000)', 'Backdraft_(1991)'].

for user 66, recommendations are  
items ['American_Beauty_(1999)', 'Star_Wars:_Episode_VI_-_Return_of_the_Jedi_(1983)', 'Braveheart_(1995)'].

for user 67, recommendations are  
items ['12_Angry_Men_(1957)', 'Grapes_of_Wrath,_The_(1940)', 'Bridge_on_the_River_Kwai,_The_(1957)'].

for user 68, recommendations are  
items ['Wrong_Trousers,_The_(1993)', 'Close_Shave,_A_(1

### Recommendation for users in given pools

In [20]:
pool = ['Rules_of_Engagement_(2000)', 
        'Remember_the_Titans_(2000)', 
        'Skulls,_The_(2000)', 
        '28_Days_(2000)', 
        'Frequency_(2000)', 
        'Gone_in_60_Seconds_(2000)', 
        'What_Lies_Beneath_(2000)', 
        'Reindeer_Games_(2000)', 
        'Final_Destination_(2000)', 
        'Shanghai_Noon_(2000)']
uids = [str(x) for x in range(5)]
recommendation_result = model.topk_recommendation(uids, topk=3, pool=pool)
for uid, iids in recommendation_result.items():
    print(f"for user {uid}, recommendations are ", f"\nitems {iids}.\n")

for user 1, recommendations are  
items ['Shanghai_Noon_(2000)', 'Frequency_(2000)', 'Remember_the_Titans_(2000)'].

for user 2, recommendations are  
items ['Remember_the_Titans_(2000)', 'Rules_of_Engagement_(2000)', 'Frequency_(2000)'].

for user 3, recommendations are  
items ['Shanghai_Noon_(2000)', 'Frequency_(2000)', 'Remember_the_Titans_(2000)'].

for user 4, recommendations are  
items ['Shanghai_Noon_(2000)', 'Frequency_(2000)', 'Gone_in_60_Seconds_(2000)'].



Recommendation results are chosen among items in given pool

### Find Most similar items

In [21]:
print('Similar movies to Toy_Story_2_(1999)')
similar_items = model.most_similar('Toy_Story_2_(1999)', 5)
for rank, (movie_name, score) in enumerate(similar_items):
    print(f'{rank + 1:02d}. {score:.3f} {movie_name}')


Similar movies to Toy_Story_2_(1999)
01. 0.958 Toy_Story_(1995)
02. 0.957 Bug's_Life,_A_(1998)
03. 0.949 Shakespeare_in_Love_(1998)
04. 0.945 Being_John_Malkovich_(1999)
05. 0.935 Sixth_Sense,_The_(1999)


### Find Most similar items given pool

In [22]:
pool = ['Rules_of_Engagement_(2000)', 
        'Remember_the_Titans_(2000)', 
        'Skulls,_The_(2000)', 
        '28_Days_(2000)', 
        'Frequency_(2000)', 
        'Gone_in_60_Seconds_(2000)', 
        'What_Lies_Beneath_(2000)', 
        'Reindeer_Games_(2000)', 
        'Final_Destination_(2000)', 
        'Shanghai_Noon_(2000)']
similar_items = model.most_similar('Toy_Story_2_(1999)', 5, pool=pool)
for rank, (movie_name, score) in enumerate(similar_items):
    print(f'{rank + 1:02d}. {score:.3f} {movie_name}')

01. 0.467 Shanghai_Noon_(2000)
02. 0.435 Frequency_(2000)
03. 0.354 Gone_in_60_Seconds_(2000)
04. 0.320 28_Days_(2000)
05. 0.259 What_Lies_Beneath_(2000)
06. 0.186 Final_Destination_(2000)
