This data repository contains the replication data for the paper Images of the arXiv: reconfiguring large scientific image datasets.
- Linux: Ubuntu 18.04
- Intel i7 CPU
- 500GB NVMe solid state drive
- 4TB 72000 rpm hard disk
- 32GB DDR3 RAM
- NVidia RTX 2080 graphics card 8GB VRAM
Ubuntu ships with SQLite. Simply call
sqlite3 /path/to/database.sqlite3
This is included in Python:
import sqlite
This software is handy for having a graphical way to examine the SQLite database and can also be used to run commands https://sqlitebrowser.org/dl/
sudo add-apt-repository -y ppa:linuxgndu/sqlitebrowser
sudo apt-get update
sudo apt-get install sqlitebrowser
- Anaconda (recommended for installing and managing Python packages)
- Python (2 and 3)
- ImageMagick (for convert and identify)
- Jupyter Notebook
- SQLite interfaces for Python and Bash
- tensorflow-gpu
We used two different conda environments for running the required scripts. The first is py37
, which contains basic Python3 packages, matplotlib
, and other utilities. The second is tf_gpu
, which is configured to run TensorFlow 1.14 using GPU acceleration. This package will take longer to install so is provided separately. See the YAML files in the conda
folder.
Provided in SQLite format. Contains metadata regarding articles, images, and figure captions up to the end of 2018.
See dataset_method.md
.
See sqlite_method.md
.
See image_credits.md
.
Scripts for running plots found in the sqlite-scripts
folder.