Skip to content

swapUniba/PTCB_wikiembs_KARS

Repository files navigation

Evaluating Content-based Pre-Training Strategies for a Knowledge-aware Recommender System based on Graph Neural Networks

Requirements

Needed libraries:

  • torch
  • pandas
  • pykeen
  • SPARQLWrapper
  • wikipedia2vec
  • tqdm

Usage

You can find four scripts in this repository, all of them have tunable parameters that should be changed by editing the relative code section in the file.

data_to_wikiname.py

This script is used to convert a list of entities, contained in a tsv file with two columns (id, url), where url is the dbpedia url, to the entity Wikipedia name. In some cases the Wikipedia PageID extracted from dbpedia is faulty, so it should be corrected manually by following the prompt instructions.

Parameters:

  • input_file tsv file with two columns: id, url

wikiname_to_emb.py

The output of the previous script should be passed as input of this script together with a training file, this one only if you want to include user embeddings. This script outputs a dictionary of (id, embedding) pairs, which is dumped in the wiki2vec_embeddings.pkl file.

Parameters:

  • id2wikiname_file file generated by data_to_wikiname.py script
  • trianing_file training set to compute user embeddings, set to None if you don't want to include them
  • wiki2vec_dump Wikipedia2Vec pre-trained embeddings file

train_embeddings.py

Used to learn the graph embeddings, there are many adjustable parameters. The output of the previus script can be used as pre-trained embeddings for the model (optional).

Parameters:

  • dataset one between dbbook and movielens
  • emb_dim embedding dimension
  • n_layers number of CompGCN layers
  • epochs number of epochs to learn the embeddings
  • wiki2vec_embeddings_file wiki2vec pre-trained embeddings, not required
  • output_path output path

train_recommender.py

Script to train the recommender model and generate the predictions, many training parameters can be tuned. Of these, embeddings_file should be file generated by the previous script.

Parameters:

  • dataset one between dbbook and movielens
  • batch_size training batch size
  • epochs number of epochs to train the recommender
  • learning_rate learning rate value
  • embeddings_file embeddings learned by train_embeddings.py
  • concat True to concatenate the embeddings
  • ent2id_file ent2id file generated by train_embeddings.py

Evaluation

The output lists generated by train_recommender.py can be evaluated using Elliot. Elliot needs a test file (we provide test_elliot.tsv for each dataset) containing only the ground thruth (positve ratings), that should be specified in the Elliot config file along with the folder generated by the recommender. We also provide the sample config file elliot/proxy_rec.yml that should be edited and placed in the config_files/ Elliot folder, here we can specify the dataset name, the metrics to compute and other parameters. Then, Elliot can be executed through the following command:

python start_experiments.py --config=proxy_rec.yml

Notice that we edited Elliot's code, specifically the F1 and Precision calculation: in the original code, these metrics are computed by ignoring the fact that a recommendation list for some users might be shorter than the cutoff value k, thus we modified the files elliot/elliot/evaluation/metrics/accuracy/f1/f1.py and elliot/elliot/evaluation/metrics/accuracy/precision/precision.py to address this issue.

Our custom files can be found in the elliot/ folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages