#### [Matrix Factorization with Tensorflow](http://katbailey.github.io/post/matrix-factorization-with-tensorflow/) by Katherine Bailey
* Data is located at /home/fred/notebooks/data/movielens-100k
* Code is located at /home/fred/code/factorizer
* Also see: [Matrix Factorization Techniques for Recommender Systems](file:///home/fred/Documents/articles/recommender_systems/mat_factorization_techs_for_rec_systems_koren_2009.pdf) (MFTRS) which is referenced in the article
* Requirements:
  * `pip install feather-format`
  * Nope, that actually didn't work per the error described [here](https://github.com/wesm/feather/issues/268)
  * Instead had to:
    1. Confirm Conda environment `conda info --envs` [per here](http://conda.pydata.org/docs/using/envs.html)
    2. `export CPPFLAGS='fabi-version=10'` per [here](https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/C_002b_002b-Dialect-Options.html)
    3. `conda install feather-format -c conda-forge`

In [45]:
import feather  # really this is feather-format (feather was already taken I suppose)
import os
import pandas as pd
datapath = os.path.join(os.getcwd(), 'data', 'movielens-100k', 'u.data')

# http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
# these column names are required by factorizer.py
df = pd.read_csv(datapath, sep='\t', header=None, names=['user_id', 'item_id', 'rating', 'timestamp'])

# InvalidArgumentError: indices[420] = 1586911 is not in [0, 1586126)
df['user_id'] = df['user_id'].map(lambda x: x - 1)
df['item_id'] = df['item_id'].map(lambda x: x - 1)

featherpath = os.path.join(os.getcwd(), 'data', 'movielens-100k', 'u.feather')
feather.write_dataframe(df, featherpath)

In [46]:
df[0:3]

Unnamed: 0,user_id,item_id,rating,timestamp
0,195,241,3,881250949
1,185,301,3,891717742
2,21,376,1,878887116


In [47]:
df.shape

(100000, 4)

In [48]:
df.max()

user_id            942
item_id           1681
rating               5
timestamp    893286638
dtype: int64

In [49]:
df.min()

user_id              0
item_id              0
rating               1
timestamp    874724710
dtype: int64

In [50]:
import sys
sys.path.append('/home/fred/Documents/code/factorizer')
import factorizer

In [78]:
import importlib
importlib.reload(factorizer)

<module 'factorizer' from '/home/fred/Documents/code/factorizer/factorizer.py'>

In [73]:
# import mock # http://stackoverflow.com/questions/18668947/how-do-i-set-sys-argv-so-i-can-unit-test-it
origargv = sys.argv
sys.argv = ['factorizer.py', featherpath, 100, 'user_bias', 10., 5]
factorizer.main()
sys.argv = origargv

In [70]:
origargv = sys.argv
sys.argv = ['factorizer.py', featherpath, 100, 'item_bias', 10., 5]
factorizer.main()
sys.argv = origargv

In [79]:
origargv = sys.argv
sys.argv = ['factorizer.py', featherpath, 100, 'features', 10., 5]
factorizer.main()
sys.argv = origargv

Final training RMSE 1.07456
Final validation RMSE 1.08329
