Skip to content

king0980692/xLearn_ml-latest

Repository files navigation

xLearn ml-latest

Experienmet apply xLearn FM model on movielens dataset.


Environment set up

Clone this project

git clone --recurse-submodules git@github.com:king0980692/xLearn_ml-latest.git

Prerequisites

pip

using venv

python3 -m venv xlearn_ml
source xlearn_ml/bin/activate
pip install -r requirements.txt

poetry

using python version: 3.8.1

pyenv local 3.8.1

tell poetry to using python3.8 and install the dependency package

poetry env use python3.8
poetry install

enter the virtual enviornment for runing script more simply .

poetry shell

Experiment step

below using the movielens-100k dataset to describe the detail step for this experiment.

Prepare data

mkidr ./data
wget https://files.grouplens.org/datasets/movielens/ml-100k.zip -P ./data
unzip ./data/ml-100k.zip -d ./data

Using encoderder to generate the sparse format data

we will create the libsvm format training data, and the all pairs of user and item test data to predict the probability.

python3 ./encoderder/encoderder.py -c ./100k.json

training file:

$ head ./exp/ml.train

5 1:1  944:1
3 1:1  1736:1
4 1:1  1847:1
3 1:1  1958:1
3 1:1  2069:1
5 1:1  2180:1
4 1:1  2291:1
1 1:1  2402:1
5 1:1  2513:1
3 1:1  945:1

all_pair file:

$ head ./exp/ml.test.all_pair
1:1 944:1
1:1 945:1
1:1 946:1
1:1 947:1
1:1 948:1
1:1 949:1
1:1 950:1
1:1 951:1
1:1 952:1
1:1 953:1

100k.json

this is a json file for encoderder to generate the sparse format of datast, it will look like :

"train": {
    "input": "./data/ml-100k/ua.base",
    "output": "./exp/ml.train",
    "cached": true,
    "seperator": "\t",
    "header": false,
    "sparse": false,
    "target_columns": [
        {
            "index": 0,
            "type": "cat"
        },
        {
            "index": 1,
            "type": "cat"
        },
        {
            "index": 2,
            "type": "truth"
        }
    ]
}

there some important points need to illustrate :

  • input : the input file to generate the sparse format

  • output : the generated file

  • target columns : select your interested column you want to encode, and specify its column type:

    • cat : categorical type data
    • num: numerical type data
    • truth : the labeled data
  • others config : you can see the more infomation in encoderder repository

Training and Testing

using above training file to train and predict the probability of all pair

python3 train_predict.py --train ./exp/ml.train --test ./exp/ml.test.all_pair --output ./result/output.txt

Generate the user pred pickle

Generate the user prediction based on the test file's user .

python3 gen_user_pred.py --score_file ./result/output.txt --truth_file ./exp/ml.test.all_pair

above command will generate a pickle file at ./result/user_pred.pkl, which is a python dict structure. It collect all user's prediction result, its format will look like:

print(user_pred[1])

'''
output will look like 
[('1101',4.78123),('312',4.18312),....]
'''

Evaluation

Final, read the pickle file and use the actual file to evaluate the predicton result.

python3 eval.py --predict ./result/user_pred_dict.pkl --truth ./exp/ml.test

Performance

MAP@10
ml-latest 0.006496
ml-100k 0.0036496
ml-10m 0.0023612

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published