HIgher-order Relation Schema Induction using Tensor Factorization with Back-off and Aggregation
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src Sources Uploaded May 10, 2018
LICENSE License added Jun 1, 2018
README.md Update README.md May 10, 2018

README.md

TFBA

Codes for Tensor Factorization with Back-off and Aggregation.

Prerequisites:

install sktensor (https://github.com/mnick/scikit-tensor)

This package contains the following files:

  • dataGen.py -- Used to generate tensors from the set of tuples.
  • factorize.py -- Joint tensor factorization.
  • cliqueMine.py -- Constrained Clique mining.

Running Instructions:

  • python2.7 dataGen.py <tuples_file> <output_dir>
    --- Each line in the input file is a tab separated 4-tuple of the format subject "\t" relation "\t" object "\t" other "\t" frequency.
    --- 3-tuples can also be provided in the same file along with 4-tuples, in which case use the string "" for other.
    --- This script will create pkl files in the output directory.

  • python2.7 factorize.py <data_dir> <output_dir> [other options]
    --- Performs the factorization and store the latent factor matrices and core tensors in the <output_dir> directory.
    --- <data_dir> should be same as the <output_dir> of dataGen.py.
    optional arguments:
    -h, --help show this help message and exit
    --minLambda MINLAMBDA [MINLAMBDA ...]
    ** Enter the min lambda (list), default = 0.1 0.1 0.1
    --maxLambda MAXLAMBDA [MAXLAMBDA ...]
    ** Enter the max lambda (list), needed only for grid search. If no grid search, provide only minLambda option. --step STEP Enter the step size for grid search (default = 0.5)
    --maxIters MAXITERS Enter the maximum iterations (default = 10)
    --rank1 RANK1 Enter rank1 (default = 10)
    --rank2 RANK2 Enter rank2 (default = 10)
    --rank3 RANK3 Enter rank3 (default = 10)
    --fit FIT Y/N, default = N. Give Y for fit computation.
    --cores CORES Number of Threads

  • python2.7 cliqueMine.py <data_dir> <output_dir> --rank r1 r2 r3
    --- Performs constrained clique mining and stores the schemas in <output_dir>
    --- <data_dir> should be same as <data_dir> used to run Factorize.py

References:

[1] Madhav Nimishakavi, Manish Gupta and Partha Talukdar. Relation Schema Induction using Tensor Factorization with Back-off and Aggregation. Proceedings of 2018 Conference on Association for Computaional Linguistics (ACL 2018).