Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



27 Commits

Repository files navigation


Retrieve paper metadata from conference proceedings and journals indexed in DBLP. Currently, retrieval of the following properties is supported:

  • paper title
  • authors
  • heading (corresponds to journal issue or conference session name)
  • page range
  • paper length
  • link to electronic edition of paper

The tool validates the page ranges and adds a log message to column comment in case possible inconsistencies are detected. Tested with ICSE, FSE, TSE, and TOSEM 2014-2018.



Python 3 is required. The dependencies are specified in requirements.txt. To install those dependencies execute:

pip3 install -r requirements.txt

Optional: Setup virtual environment with pyenv and virtualenv before executing the above command:

pyenv install 3.7.2
pyenv virtualenv 3.7.2 dblp-retriever_3.7.2
pyenv activate dblp-retriever_3.7.2

pip3 install --upgrade pip


Basic usage:

python3 -i <path_to_input_file> -o <path_to_output_dir>

Call without parameters to get information about possible parameters:


usage: [-h] -i INPUT_FILE -o OUTPUT_DIR [-d DELIMITER] error: the following arguments are required: -i/--input-file, -o/--output-dir


As input, the tool expects a CSV file with the following three columns: venue, year, and identifier. Column venue is a custom name for the conference or journal, year should be self-explanatory, and identifier is the DBLP identifier of a particular journal volume or conference proceeding.

This identifier can be extracted from the DBLP-URL as follows. In this example, we extract conf/sigsoft/fse2018 as the identifier of the ESEC/FSE 2018 proceedings:


An exemplary input file can be found here:

venue year identifier
ICSE 2014 conf/icse/icse2014
... ... ...
FSE 2018 conf/sigsoft/fse2018
... ... ...
TSE 2018 journals/tse/tse44
... ... ...

To retrieve the paper metadata for the configured venues, you just need to run the following command:

python3 -i input/venues.csv -o output/

The tool logs the retrieval process:

2019-01-22 10:53:02,584 dblp-retriever_logger INFO: Reading venues from input/venues.csv...
2019-01-22 10:53:02,588 dblp-retriever_logger INFO: 20 venues have been imported.
2019-01-22 10:53:02,847 dblp-retriever_logger INFO: Successfully retrieved TOC of venue: conf/icse/icse2014
2019-01-22 10:53:02,977 dblp-retriever_logger INFO: Successfully parsed TOC of venue: conf/icse/icse2014
2019-01-22 10:53:03,121 dblp-retriever_logger INFO: Successfully retrieved TOC of venue: conf/icse/icse2015-1
2019-01-22 10:53:07,530 dblp-retriever_logger INFO: Successfully parsed TOC of venue: journals/tosem/tosem27
2019-01-22 10:53:07,532 dblp-retriever_logger INFO: Exporting papers to output/venues.csv...
2019-01-22 10:53:07,548 dblp-retriever_logger INFO: 1564 papers have been exported.

And writes the retrieved data to the configured output directory:

venue year identifier heading title authors pages length electronic_edition
ICSE 2014 conf/icse/icse2014 Perspectives on Software Engineering Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development? Emerson R. Murphy-Hill; Thomas Zimmermann; Nachiappan Nagappan 1-11 11
ICSE 2014 conf/icse/icse2014 Modeling TradeMaker: automated dynamic analysis of synthesized tradespaces. Hamid Bagheri; Chong Tang; Kevin J. Sullivan 106-116 11
TOSEM 2018 journals/tosem/tosem27 Volume 27, Number 4, November 2018 Variability-Aware Static Analysis at Scale: An Empirical Study. Alexander von Rhein; Jörg Liebig; Andreas Janker; Christian Kästner; Sven Apel 18:1-18:33 33