Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Diversity of ISCB Honorees

This is a data analysis repository for the study at


Datasets are stored in the data directory. This repository uses Git LFS to store large / binary datasets. Make sure to have Git LFS installed locally before cloning the repository, if you'd like to download the datasets. You can also download datasets directly from the GitHub website by clicking "Raw".

The source code saves large files using XZ compression (denoted by an .xz extension). Since not all users are familiar with XZ-compression, we have also created gzip exports of all XZ-compressed files (with the convert-xz-to-gzip.bash script). These files are placed alongside their XZ source in the data directory. The source code pipelines use XZ compression since gzip encodes a timestamp causing non-deterministic output files.


This repository has a corresponding Docker image with the required dependencies. See environment for the Docker image specification.

Note that the following Docker commands have a --mount argument to give the Docker container access to files in this repository. Therefore, any changes to the repository content created while running the Docker container will persist in this directory after the container is stopped.

The Docker image is automatically built and published by a GitHub Action. Even though this repository is public, GitHub requires authentication to download from its package registry. Therefore, you will need a GitHub account to pull the image.

Use the following steps to authenticate your local docker with your GitHub. Go to and create a new personal access token, selecting only the read:packages scope. You can name the token anything, for example "docker login read-only token". Then run the following command, substituting your username and token from above:

docker login --username USERNAME --password TOKEN


For interactive development in Python notebooks, run the following command:

# This command must be run with the repository root as your working directory.
# Requires docker version >= 17.06.
docker run \
  --name iscb-diversity \
  --detach --rm \
  --env JUPYTER_TOKEN=ksbegpqzrurktbkikyo \
  --publish 8899:8888 \
  --mount type=bind,source="$(pwd)",target=/user/jupyter \

Then navigate to the following URL in your browser: http://localhost:8899?token=ksbegpqzrurktbkikyo

You should see a Jupyter Notebook landing page where you can open, edit, and run any of the notebooks.

When you are done, you shutdown the Jupyter notebook server and remove the Docker container by running docker stop iscb-diversity in a new terminal.

Similarly, for the R notebooks:

# This command must be run with the repository root as your working directory.
# Requires docker version >= 17.06.
docker run \
  --name iscb-diversity-r \
  --detach --rm \
  --publish 8787:8787 \
  --env DISABLE_AUTH=true \
  --mount type=bind,source="$(pwd)",target=/home/rstudio/repo \

Navigate to http://localhost:8787 and you should be logged into RStudio as the rstudio user. When you are done, shutdown the RStudio server and remove the Docker container by running docker stop iscb-diversity-r.

GitHub Pages

The docs directory is used as the GitHub Pages source for To regenerate outputs in the docs directory, run the following command

python utils/ --nbviewer --readme

Edit utils/ to change the template for docs/


The entire repository is released under the CC BY 4.0 License available in All code files and snippets are additionally released under the BSD 3-Clause License available in