Skip to content

pisenberg/vispubdata

Repository files navigation

VisPubData Reproducibility and Update

The code in this repository relates to the VisPubData collection of publications in the field of visualization. For more information please check the corresponding publication about the project:

Petra Isenberg, Florian Heimerl, Steffen Koch, Tobias Isenberg, Panpan Xu, Charles D. Stolper, Michael Sedlmair, Jian Chen, Torsten Möller, and John Stasko. vispubdata.org: A Metadata Collection about IEEE Visualization (VIS) Publications. IEEE Transactions on Visualization and Computer Graphics, 23(9):2199–2206, September 2017. doi: 10.1109/TVCG.2016.2615308

If you use this data, we would appreciate a citation to our paper:

@article{Isenberg:2017:VMC,
  author      = {Petra Isenberg and Florian Heimerl and Steffen Koch and Tobias Isenberg and Panpan Xu and Charles D. Stolper and Michael Sedlmair and Jian Chen and Torsten M{\"o}ller and John Stasko},
  title       = {vispubdata.org: A Metadata Collection about {IEEE} Visualization ({VIS}) Publications},
  journal     = {IEEE Transactions on Visualization and Computer Graphics},
  year        = {2017},
  volume      = {23},
  number      = {9},
  month       = sep,
  pages       = {2199--2206},
  doi         = {10.1109/TVCG.2016.2615308},
  shortdoi    = {10/dx8vv8},
  oa_hal_url  = {https://hal.science/hal-01376597},
  url         = {http://www.vispubdata.org/site/vispubdata/},
}

Contributors

Contributors to the code are: Petra Isenberg, Natkamon Tovanich, and Tobias Isenberg

Code Purposes

There are two types of code in this repository, one for reproduing one of the figures we show in the paper mentioned above (for the purpose of showing reproducibility via the Graphics Replicability Stamp Initiative), and the other code for supporting the continued update of the dataset.

Paper reproducibility

The code in the reproducibility/ subdirectory facilitates the reproduction of the plot in Figure 1 of the corresponding paper, albeit adjusted to the updated data in the dataset (as of writing this text, years 2016–2023 of the IEEE VIS conference have been added). The other figures in the paper are a manually created overview of the conference evolution (Figure 2; for that see the files in the naming-graphic/ subdirectory of this repository) and screenshots of the dataset in use by other tools (Figures 3–5). The original version of Figure 1 from the paper looks as follows:

Figure 1 of VisPubData publication (image is in the public domain)

Prerequisites

Running the script

The reproducibility/reproducibility.py script essentially loads the current state of the dataset and then directly produces the new version of the figure in the local directory. There are two ways to get the data. By default, the data is pulled directly from the respective Google Spreadsheets, and the script should run without any further configuration.

Alternatively, one can also download the data to local csv files and then run everything locally. For that, please set

useFiles = True

in the configuration section of the reproducibility/reproducibility.py script at the top, and then download the data as follows: Please first go to the shared Google Spreadsheet that contains the VisPubData dataset, make sure that the first tab on the bottom is selected ("Main dataset"), and then download the data using the menu via File > Download > Comma Separated Values (.csv) and then save the file to reproducibility/vispubdata.csv. Next, please go to the shared Google Spreadsheet that contains the data about the journal presentations, and then download that dataset the same way as before and save the file to reproducibility/vis-journal-presentations.csv. Then everything is in place to run the script.

To run the script in either case, simply do:

cd reproducibility/
python3 reproducibility.py

The script then produces the equivalent of Figure 1 of the paper as reproducibility/reproducibility.pdf, but updated to the most recent version of the dataset, which looks like this (2023 version):

udated version of Figure 1 of VisPubData publication (the image is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license; please attribute the image to the contributors named above and cite the mentioned journal paper)

Notice that the labels have been reworded slightly to reflect the changes that happened in the conference in the meantime as well as to make the distinction between journal conference papers and pure journal papers presented at the conference more clear, and that the labels are ordered differently from the original figure due to the use of a new plotting tool.

How to update the VisPubData dataset

This code will allow you to create an update of the VisPubData dataset. If you have only small fixes of the data to report you might be better off to leave a comment on the Google spreadsheet with the data and to e-mail petra.isenberg@inria.fr to make the change for you. If, however, you would like to, for example, add a new year to the dataset or create a local update for yourself, then read on.

Prerequisits

Start the Jupyter notebook server

  • Navigate to the main folder of this repsitory (the top folder) using the command line. If you are on Anaconda Python it is best to do this using the Anaconda prompt.
  • Start the Jupyter notebook server there by calling:
    jupyter notebook
    
    This will open a browser window and show the subfolders of the repository and the included files.
  • If this does not work out of the box, you can get more help here: https://docs.jupyter.org/en/latest/install.html

Ready? Let's go...

Then use the newly opened browser window to open the the Jupyter notebooks in the respective folders in the order they are named below (double-click on the folder, then double-click on the respective ipynb file, which opens it in a new browser tab). Each Jupyter notebook contains additional prerequisites and the instructions for running it.

  1. dblp-data-extraction/ParseDBLP-VIS-Authors.ipynb
    • The last step in this notebook will take several minutes, depending on your machine and the size of the data. The script, however, shows its progress in iterations (in batches of 100,000) and iterations per second.
    • Unfortunately it is not possible to compute a percentage as the total number of needed iterations is not known ahead of time. The total number of needed iterations depends on the size of the DBLP data when downloaded and should be in the order of 97,000,000 (at time of writing these instructions) or more.
    • So just wait as long as these iteration counts continue to be updated.
    • If error messages appear check that you have the downloaded DBLP data files (according to the instructions above) and placed them into the dblp-data-extraction/data/ subfolder and that you also uncompressed the dblp.xml.gz file.
    • The script is done when you see a note that reports how many articles were found and the script prints DONE.
    • You can the close the window with the Jypiter notebook, as the results have been saved as an updated version of dblp-data-extraction/data/VIS-author-articles.csv.
  2. vispubdata-update/Vispubdata update IEEE VIS papers.ipynb
    • This process is quite involved, carefully read and follow the instructions in the notebook.
    • Again, some of its processing may take a while (e.g., the CrossRef data check in the order of an hour or more), but there are one again progress indicators.
    • Once done, many of the data files in the subdirectory are updated and you can close the browser tab.
  3. vispubdata-update/journals-at-vis-update.ipynb
    • Again, some of its processing may take a while (e.g., the CrossRef data check in the order of 10 minutes or more), but there are one again progress indicators.
    • Once done, many of the data files in the subdirectory are updated and you can close the browser tab.
  4. aminer-citation-update/first_scan.ipynb
  5. aminer-citation-update/merge_data.ipynb

The final results of the process can then be found in the files vispubdata-update/results/vispubdata-update.csv and vispubdata-update/results/vispubdata-update-journals.csv as well as the Aminer citations at vispubdata-update/results/vispubdata-update.csv-aminer.csv and vispubdata-update/results/vispubdata-update-journals.csv-aminer.csv, which could technically be used to update the VisPubData Google spreadsheet (for those with access to it).

If you have any problems then please contact petra.isenberg@inria.fr.

About

The code for running the vispubdata updates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages