Skip to content
Expert annotated Hallmarks of Cancer Corpus
Branch: master
Clone or download
Latest commit 3bc6e6d Sep 18, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
labels Initial commit Aug 1, 2018
text Initial commit Aug 1, 2018
LICENSE Initial commit Aug 1, 2018
README.md Update README.md Sep 18, 2018

README.md

Hallmarks-of-Cancer

The Hallmarks of Cancer Corpus for text classification

The Hallmarks of Cancer (HOC) Corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to a taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus. The labels are found under the "labels" directory, while the tokenized text can be found under "text" directory. The filenames are the corresponding PubMed IDs (PMID).

In addition to the HOC corpus, we also have the Cancer Hallmarks Analytics Tool which classifes all of PubMed according to the HoC taxonomy.

Please cite the following publications

We would very much appreciate it if you cite our publications:

Automatic semantic classification of scientific literature according to the hallmarks of cancer

@article{baker2015automatic,
  title={Automatic semantic classification of scientific literature according to the hallmarks of cancer},
  author={Baker, Simon and Silins, Ilona and Guo, Yufan and Ali, Imran and H{\"o}gberg, Johan and Stenius, Ulla and Korhonen, Anna},
  journal={Bioinformatics},
  volume={32},
  number={3},
  pages={432--440},
  year={2015},
  publisher={Oxford University Press}
}

Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer

@article{baker2017cancer,
  title={Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer},
  author={Baker, Simon and Ali, Imran and Silins, Ilona and Pyysalo, Sampo and Guo, Yufan and H{\"o}gberg, Johan and Stenius, Ulla and Korhonen, Anna},
  journal={Bioinformatics},
  volume={33},
  number={24},
  pages={3973--3981},
  year={2017},
  publisher={Oxford University Press}
}

Cancer hallmark text classification using convolutional neural networks

@article{baker2017cancer,
  title={Cancer hallmark text classification using convolutional neural networks},
  author={Baker, Simon and Korhonen, Anna-Leena and Pyysalo, Sampo},
  year={2016}
}

Initializing neural networks for hierarchical multi-label text classification

@article{baker2017initializing,
  title={Initializing neural networks for hierarchical multi-label text classification},
  author={Baker, Simon and Korhonen, Anna},
  journal={BioNLP 2017},
  pages={307--315},
  year={2017}
}
You can’t perform that action at this time.