Skip to content


Repository files navigation


This repository contains the datasets for automatic keyphrase extraction task.


* MAUI.tar.gz  	data from University of Waikato (KEA, MAUI systems)
* Wan2008.tar.gz     data from Wan:2008
* Schutz2008.tar.gz 	data from Schutz:2008 (only answer sets and readme are provided. the papers are available at
*    data from Nguyen:2007
* Hulth2003.tar.gz  data from Hulth:2003


  author = {Xiaojun Wan and Jianguo Xiao},
  title = {CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction},
  booktitle = {Proceedings of 22nd International Conference on Computational Linguistics},
  year = {2008},
  address = {Manchester, UK},
  pages = {969--976}

  author = {Alexander Thorsten Schutz}, 
  title = {Keyphrase Extraction from Single Documents in the Open Domain Exploiting Linguistic and Statistical Methods},
  booktitle = {National University of Ireland},
  year = {2008}

  author = {Thuy Dung Nguyen and Min-Yen Kan},
  title = {Key phrase Extraction in Scientific Publications},
  booktitle = {Proceeding of International Conference on Asian Digital Libraries},
  year = {2007},
  pages = {317--326}

  author = {Olena Medelyan and Ian Witten},
  title = {Thesaurus based automatic keyphrase indexing},
  booktitle = {Proceedings of the 6th ACM/IEED-CS joint conference on Digital libraries},
  year = {2002},
  pages = {296--297}

  author = {Anette Hulth},
  title = {Improved automatic keyword extraction given more linguistic knowledge},
  booktitle = {Proceedings of the 2003 conference on Empirical methods in natural language processing},
  year = {2003},
  pages = {216--223}

  author = {Eibe Frank and Gordon W. Paynter and Ian H. Witten and Carl Gutwin and Craig G. Nevill-manning},
  title = {Domain Specific Keyphrase Extraction},
  booktitle = {Proceedings of the 16th International Joint Conference on AI},
  year = {1999},
  pages = {668--673}

  author = {Ian Witten and Gordon Paynter and Eibe Frank and Car Gutwin and Graig Nevill-Manning},
  title = {KEA:Practical Automatic Key phrase Extraction},
  booktitle = {Proceedings of the fourth ACM conference on Digital libraries},
  year = {1999},
  pages = {254--256}

If you have a dataset for automatic keyphrase extraction task and want to share it with others, please contact me for commit rights.