Skip to content
Code for the JCDL paper "Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text"
Python Makefile Batchfile
Branch: master
Clone or download
Pull request Compare This branch is 125 commits ahead, 9 commits behind quadflor:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Code
Documentation
Experiments
Resources/example
.gitignore
LICENSE
README.md
paper_long.pdf

README.md

Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text

This repository contains the code for the JCDL paper Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text. It is based on and extents the multi-label classification framework Quadflor.

Installation

Install Python 3.4 or higher and

#install necessary packages
sudo apt-get install libatlas-base-dev gfortran python3.4-dev python3.4-venv build-essential

#install python modules in a virtual environment with pip (this may take a while):
python3 -m venv lucid_ml_environment
source lucid_ml_environment/bin/activate
cd Code
pip install -r requirements.txt

Replicating the results

In order to enhance the reproducability of our study, we uploaded a copy of the title datasets to Kaggle. Moreover, we provide the configurations used to produce the results from the paper.

To rerun any of the (title) experiments, do the following:

  1. Download the econbiz.csv and pubmed.csv files, respectively, and copy them to the folder Resources.
  2. Open the .cfg file of the respective method that you want to run (MLP, BaseMLP, CNN, or LSTM) from the Experiments folder. Copy the command in the third (if you want to evaluate on a single fold) or fifth (if you want to do a full 10-fold-cross-validation) line.
  3. In the command, adjust the parameter for the option --tf-model-path parameter (specifies where to save the weights of the models, which can be gigabytes, so make sure you have enough disk space), and the --pretrained_embeddings parameter to the location of the GloVe model in your environment.
  4. cd to the folder Code/lucid_ml and run the command.
You can’t perform that action at this time.