Skip to content

malteos/semantic-document-relations

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
January 13, 2020 11:53
January 14, 2020 15:26
January 14, 2020 15:26
January 14, 2020 15:26
January 14, 2020 15:26
January 21, 2020 14:24
January 13, 2020 11:53
January 9, 2020 20:19
March 18, 2021 18:54
January 14, 2020 15:26
January 21, 2020 14:24
January 14, 2020 15:26

Semantic Relations between Wikipedia Articles

Open In Colab DOI

Implementation, trained models and result data for the paper Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles (PDF on Arxiv). The supplemental material is available for download under GitHub Releases or Zenodo.

Wikipedia Relations

Getting started

Requirements:

  • Python >= 3.7 (Conda)
  • Jupyter notebook (for evaluation)
  • GPU with CUDA-support (for training Transformer models)

At first we advise to create a new virtual environment for Python 3.7 with Conda:

conda create -n docrel python=3.7
conda activate docrel

Install all Python dependencies:

pip install -r requirements.txt

Download dataset (and pretrained models):

# Navigate to data directory
cd data

# Wikipedia corpus
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2

# - decompress 
bzip2 -d enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2

# Train and test data
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/train_testdata__4folds.tar.gz

# - decompress
tar -xzf train_testdata__4folds.tar.gz

# Models
# - download
wget https://github.com/malteos/semantic-document-relations/releases/download/1.0/model_wiki.bert_base__joint__seq512.tar.gz

# - decompress
tar -xzf model_wiki.bert_base__joint__seq512.tar.gz

Experiments

Run predefined experiment (settings can be found in experiments/predefined/wiki)

# Config: wiki.bert_base__joint__seq128
# GPU ID: 1 (set via CUDA_VISIBLE_DEVICES=1)
# Output dir: ./output
python cli.py run ./output 1 wiki.bert_base__joint__seq512

Demo

You can run a Jupyter notebook on Google Colab:

Open In Colab

How to cite

If you are using our code, please cite our paper:

@InProceedings{Ostendorff2020,
  title = {Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles},
  booktitle = {Proceedings of the {ACM}/{IEEE} {Joint} {Conference} on {Digital} {Libraries} ({JCDL})},
  author = {Ostendorff, Malte and Ruas, Terry and Schubotz, Moritz and Gipp, Bela},
  year = {2020},
  month = {Aug.},
}

See also

License

MIT

About

Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published