Experienmet apply xLearn FM model on movielens dataset.
git clone --recurse-submodules git@github.com:king0980692/xLearn_ml-latest.git
using venv
python3 -m venv xlearn_ml
source xlearn_ml/bin/activate
pip install -r requirements.txt
using python version: 3.8.1
pyenv local 3.8.1
tell poetry to using python3.8 and install the dependency package
poetry env use python3.8
poetry install
enter the virtual enviornment for runing script more simply .
poetry shell
below using the movielens-100k dataset to describe the detail step for this experiment.
mkidr ./data
wget https://files.grouplens.org/datasets/movielens/ml-100k.zip -P ./data
unzip ./data/ml-100k.zip -d ./data
we will create the libsvm format training data, and the all pairs of user and item test data to predict the probability.
python3 ./encoderder/encoderder.py -c ./100k.json
training file:
$ head ./exp/ml.train
5 1:1 944:1
3 1:1 1736:1
4 1:1 1847:1
3 1:1 1958:1
3 1:1 2069:1
5 1:1 2180:1
4 1:1 2291:1
1 1:1 2402:1
5 1:1 2513:1
3 1:1 945:1
all_pair file:
$ head ./exp/ml.test.all_pair
1:1 944:1
1:1 945:1
1:1 946:1
1:1 947:1
1:1 948:1
1:1 949:1
1:1 950:1
1:1 951:1
1:1 952:1
1:1 953:1
this is a json file for encoderder to generate the sparse format of datast, it will look like :
"train": {
"input": "./data/ml-100k/ua.base",
"output": "./exp/ml.train",
"cached": true,
"seperator": "\t",
"header": false,
"sparse": false,
"target_columns": [
{
"index": 0,
"type": "cat"
},
{
"index": 1,
"type": "cat"
},
{
"index": 2,
"type": "truth"
}
]
}
there some important points need to illustrate :
-
input : the input file to generate the sparse format
-
output : the generated file
-
target columns : select your interested column you want to encode, and specify its column type:
- cat : categorical type data
- num: numerical type data
- truth : the labeled data
-
others config : you can see the more infomation in encoderder repository
using above training file to train and predict the probability of all pair
python3 train_predict.py --train ./exp/ml.train --test ./exp/ml.test.all_pair --output ./result/output.txt
Generate the user prediction based on the test file's user .
python3 gen_user_pred.py --score_file ./result/output.txt --truth_file ./exp/ml.test.all_pair
above command will generate a pickle file at ./result/user_pred.pkl
, which is a python dict structure.
It collect all user's prediction result, its format will look like:
print(user_pred[1])
'''
output will look like
[('1101',4.78123),('312',4.18312),....]
'''
Final, read the pickle file and use the actual file to evaluate the predicton result.
python3 eval.py --predict ./result/user_pred_dict.pkl --truth ./exp/ml.test
MAP@10 | |
---|---|
ml-latest | 0.006496 |
ml-100k | 0.0036496 |
ml-10m | 0.0023612 |