README

Replication Data for Images of the arXiv: reconfiguring large scientific image datasets

This data repository contains the replication data for the paper Images of the arXiv: reconfiguring large scientific image datasets.

Computer Setup Instructions

Computer Specs

OS

Linux: Ubuntu 18.04

Hardware

Intel i7 CPU
500GB NVMe solid state drive
4TB 72000 rpm hard disk
32GB DDR3 RAM
NVidia RTX 2080 graphics card 8GB VRAM

Installing software

Metha

https://github.com/miku/metha

SQLite (command line)

Ubuntu ships with SQLite. Simply call

sqlite3 /path/to/database.sqlite3

Python SQLite

This is included in Python:

import sqlite

DBBrowser for SQLite (optional)

This software is handy for having a graphical way to examine the SQLite database and can also be used to run commands https://sqlitebrowser.org/dl/

sudo add-apt-repository -y ppa:linuxgndu/sqlitebrowser
sudo apt-get update
sudo apt-get install sqlitebrowser

Other software

Anaconda (recommended for installing and managing Python packages)
Python (2 and 3)
ImageMagick (for convert and identify)
Jupyter Notebook
SQLite interfaces for Python and Bash
tensorflow-gpu

Environments

We used two different conda environments for running the required scripts. The first is py37, which contains basic Python3 packages, matplotlib, and other utilities. The second is tf_gpu, which is configured to run TensorFlow 1.14 using GPU acceleration. This package will take longer to install so is provided separately. See the YAML files in the conda folder.

Instructions

Database

Provided in SQLite format. Contains metadata regarding articles, images, and figure captions up to the end of 2018.

Downloading data

See dataset_method.md.

Creating database

See sqlite_method.md.

Image credits for paper

See image_credits.md.

Plots

Scripts for running plots found in the sqlite-scripts folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme_paper.org

readme_paper.org

README

Contents

Replication Data for Images of the arXiv: reconfiguring large scientific image datasets

Computer Setup Instructions

Computer Specs

OS

Hardware

Installing software

Metha

SQLite (command line)

Python SQLite

DBBrowser for SQLite (optional)

Other software

Environments

Instructions

Database

Downloading data

Creating database

Image credits for paper

Plots

Files

readme_paper.org

Latest commit

History

readme_paper.org

File metadata and controls

README

Contents

Replication Data for Images of the arXiv: reconfiguring large scientific image datasets

Computer Setup Instructions

Computer Specs

OS

Hardware

Installing software

Metha

SQLite (command line)

Python SQLite

DBBrowser for SQLite (optional)

Other software

Environments

Instructions

Database

Downloading data

Creating database

Image credits for paper

Plots