No description, website, or topics provided.
C++ Python C Shell Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
SegPhrase @ 81a52d5
output
src
.gitignore
.gitmodules
Makefile
README.md
domain_keyphrase_extraction.sh
train_dblp.sh

README.md

Latent Keyphrase Inference (LAKI)

Publication

Notes

The current implementation requires SegPhrase to extract domain keyphrases. It has been added under this repository as a submodule.

Requirements

We will take Ubuntu for example.

  • g++ 4.8
$ sudo apt-get install g++-4.8
  • python 2.7
$ sudo apt-get install python
  • scikit-learn
$ sudo apt-get install pip
$ sudo pip install sklearn
  • nltk
$ sudo pip install nltk

Build

LAKI can be easily built by Makefile in the terminal.

$ make

Default Run

$ ./train_dblp.sh  #train a LAKI model using DBLP dataset.
$ ./test/test_inference #receives a string query and returns top ranked document keyphrases

Parameters

All the parameters are located in train_dblp.sh

INPUT=data/AMiner-Paper.txt

INPUT refers to the input file of LAKI, can be downloaded from AMiner. For other datasets, please refer to the format of file indicated by RAW_TEXT (each single line indicates a document) and comment out line 25-28.

OMP_NUM_THREADS=4

Number of threads.

NUM_KEYPHRASES=40000

Number of domain keyphrases extracted by SegPhrase

MIN_PHRASE_SUPPORT=10

Number of occurrences for a valid domain keyphrase in the corpus.

####For other parameters regarding each individual module, please check the corresponding cpp files.