Skip to content

Commit

Permalink
Merge pull request #23 from mwalmsley/main
Browse files Browse the repository at this point in the history
Bring old dev branch up-to-date
  • Loading branch information
mwalmsley committed Jul 7, 2022
2 parents 3f926dd + a3f88d5 commit 397f618
Show file tree
Hide file tree
Showing 136 changed files with 4,168 additions and 7,159 deletions.
4 changes: 4 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.git/
data/example_images/
data/pretrained_models/

3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ dmypy.json
*.csv

wandb_api.txt
wandb/

*.pdf
*.png
Expand All @@ -152,4 +153,4 @@ checkpoint*

.vscode

data/example_images/advanced/images
data/example_images/advanced/images
24 changes: 24 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
env:
- EXTRA_DEPENDENCIES=pytorch,tensorflow
# - EXTRA_DEPENDENCIES=pytorch
# tests need both pytorch and tensorflow as they check they agree mathematically
# TODO add skip option for tests so I can consider single build versions
language: python
python:
# - "3.8" # tf 2.8 requires Python 3.7 and above, pytest requires pluggy 1.0.0 which requires python 3.8
- "3.9"
# command to install dependencies
before_install:
- python --version
- pip install -U pip
- python setup.py install
install:
# let's check all three permutations resolve/install okay
- pip install -U pluggy>=1.0.0 # required to avoid a cryptic error when running tests, requires py>=3.8
- pip install .[$EXTRA_DEPENDENCIES]

# command to run tests
script:
- pytest

# see https://docs.travis-ci.com/user/languages/python/
51 changes: 51 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
cff-version: 1.1.0
message: "Please cite the following works when using this software: https://ui.adsabs.harvard.edu/abs/2022MNRAS.509.3966W and https://doi.org/10.5281/zenodo.6483175"
authors:
- family-names: Walmsley
given-names: Mike
orcid: https://orcid.org/0000-0002-6408-4181
- family-names: Lintott
given-names: Chris
- family-names: Géron
given-names: Tobias
- family-names: Kruk
given-names: Sandor
- family-names: Krawczyk
given-names: Coleman
- family-names: Willett
given-names: Kyle W.
- family-names: Bamford
given-names: Steven
- family-names: Kelvin
given-names: Lee S.
- family-names: Fortson
given-names: Lucy
- family-names: Gal
given-names: Yarin
- family-names: Keel
given-names: William
- family-names: Masters
given-names: Karen L.
- family-names: Mehta
given-names: Vihang
- family-names: Simmons
given-names: Brooke D.
- family-names: Smethurst
given-names: Rebecca
- family-names: Smith
given-names: Lewis
- family-names: Baeten
given-names: Elisabeth M.
- family-names: Macmillan
given-names: Christine
title: "Zoobot: Deep learning galaxy morphology classifier"
version: 0.0.3
date-released: 2022-04-01
identifiers:
- type: "ascl-id"
value: "2203.027"
- type: "doi"
value: 10.5281/zenodo.6483176
- type: "bibcode"
value: "2022ascl.soft03027W"
abstract: "Zoobot classifies galaxy morphology with Bayesian CNN. Deep learning models were trained on volunteer classifications; these models were able to both learn from uncertain volunteer responses and predict full posteriors (rather than point estimates) for what volunteers would have said. The code reproduces and improves Galaxy Zoo DECaLS automated classifications, and can be finetuned for new tasks."
22 changes: 22 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM python:3.7-slim

ENV LANG=C.UTF-8

WORKDIR /usr/src/zoobot

RUN apt-get update && apt-get -y upgrade && \
apt-get install --no-install-recommends -y \
build-essential \
git && \
apt-get clean && rm -rf /var/lib/apt/lists/*

# install dependencies
COPY README.md .
COPY setup.py .
RUN pip install -U .[pytorch]
# install the zoobot locally as a package
# COPY setup.py .
# RUN pip install -e .

# install package
COPY . .
14 changes: 14 additions & 0 deletions Dockerfile.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM tensorflow/tensorflow:2.8.0

# if you have a supported nvidia GPU and https://github.com/NVIDIA/nvidia-docker
# FROM tensorflow/tensorflow:2.8.0-gpu

WORKDIR /usr/src/zoobot

# install dependencies but remove tensorflow as it's in the base image
COPY README.md .
COPY setup.py .
RUN pip install -U .[tensorflow]

# install package
COPY . .
54 changes: 38 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Zoobot

[![Documentation Status](https://readthedocs.org/projects/zoobot/badge/?version=latest)](https://zoobot.readthedocs.io/en/latest/?badge=latest)
[![Documentation Status](https://readthedocs.org/projects/zoobot/badge/?version=latest)](https://zoobot.readthedocs.io/)
[![Build Status](https://app.travis-ci.com/mwalmsley/zoobot.svg?branch=main)](https://app.travis-ci.com/mwalmsley/zoobot)
[![DOI](https://zenodo.org/badge/343787617.svg)](https://zenodo.org/badge/latestdoi/343787617)
<a href="https://ascl.net/2203.027"><img src="https://img.shields.io/badge/ascl-2203.027-blue.svg?colorB=262255" alt="ascl:2203.027" /></a>

Zoobot classifies galaxy morphology with deep learning. This code will let you:

Expand Down Expand Up @@ -32,32 +35,43 @@ training_config.train_estimator(

You can finetune Zoobot with a free GPU using this [Google Colab notebook](https://colab.research.google.com/drive/1miKj3HVmt7NP6t7xnxaz7V4fFquwucW2?usp=sharing). To install locally, keep reading.

Install using git and pip:
Download the code using git:

# I recommend using a virtual environment, see below
git clone git@github.com:mwalmsley/zoobot.git
pip install -r zoobot/requirements.txt
pip install -e zoobot

And then install Zoobot using pip, specifying either the pytorch dependencies, the tensorflow dependencies, or both:

pip install -e zoobot[pytorch] # pytorch dependencies
pip install -e zoobot[tensorflow] # tensorflow dependencies
pip install -e zoobot[pytorch,tensorflow] # both

I recommend installing in a virtual environment like anaconda. For example, `conda create --name zoobot python=3.7`, then `conda activate zoobot`.
Do not install directly with anaconda itself (e.g. `conda install tensorflow`). Anaconda currently installs tensorflow 2.0.0, which is too old for the latest features used here.
Use pip instead, as above.
Do not install directly with anaconda itself (e.g. `conda install tensorflow`) as Anaconda may install older versions.
Use pip instead, as above. Python 3.7 or greater is required.

The `main` branch is for stable-ish releases. The `dev` branch includes the shiniest features but may change at any time.

To get started, see the [documentation](https://zoobot.readthedocs.io/).
To get started, see the [documentation](https://zoobot.readthedocs.io/). For pretrained model weights, precalculated representations, catalogues, and so forth, see the [data notes](https://zoobot.readthedocs.io/data_notes.html) in particular.

I also include some working examples for you to copy and adapt:

- [decals_dr5_to_shards.py](https://github.com/mwalmsley/zoobot/blob/main/decals_dr5_to_shards.py) (only necessary to train from scratch)
- [train_model.py](https://github.com/mwalmsley/zoobot/blob/main/train_model.py) (similarly)
- [make_predictions.py](https://github.com/mwalmsley/zoobot/blob/main/make_predictions.py)
- [finetune_minimal.py](https://github.com/mwalmsley/zoobot/blob/main/finetune_minimal.py)
- [finetune_advanced.py](https://github.com/mwalmsley/zoobot/blob/main/finetune_advanced.py)
- [gz_decals_data_release_analysis_demo.ipynb](https://github.com/mwalmsley/zoobot/blob/main/gz_decals_data_release_analysis_demo.ipynb) (to better understand Zoobot's statistical outputs)
- [tensorflow/examples/decals_dr5_to_shards.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/decals_dr5_to_shards.py) (only necessary to train from scratch)
- [tensorflow/examples/train_model_on_shards.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/train_model_on_shards.py) (only necessary to train from scratch)
- [tensorflow/examples/make_predictions.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/make_predictions.py)
- [tensorflow/examples/finetune_minimal.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/finetune_minimal.py)
- [tensorflow/examples/finetune_advanced.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/tensorflow/examples/finetune_advanced.py)
- [pytorch/examples/train_model_on_catalog.py](https://github.com/mwalmsley/zoobot/blob/main/zoobot/pytorch/examples/train_model_on_catalog.py) (only necessary to train from scratch)

I also include some examples which record how the models in W+22a (the GZ DECaLS data release) were trained:
- [replication/tensorflow/train_model_on_decals_dr5_splits.py](https://github.com/mwalmsley/zoobot/blob/main/replication/tensorflow/train_model_on_decals_dr5_splits.py)
- [replication/pytorch/train_model_on_decals_dr5_splits.py](https://github.com/mwalmsley/zoobot/blob/main/replication/pytorch/train_model_on_decals_dr5_splits.py)

There's also the [gz_decals_data_release_analysis_demo.ipynb](https://github.com/mwalmsley/zoobot/blob/main/gz_decals_data_release_analysis_demo.ipynb), which describes Zoobot's statistical predictions. When trained from scratch, it predicts the parameters for distributions, not simple class labels!

Latest features:
### Latest features

- PyTorch version! Integrates with PyTorch Lightning and WandB. Multi-GPU support. Trains on jpeg images, rather than TFRecords, and does not yet have a finetuning example script.
- Train on colour (3-band) images: Add --color (American-friendly) to `train_model.py`
- Select which EfficientNet variant to train using the `get_effnet` arg in `define_model.py` - or replace with a func. returning your own architecture!
- New `predict_on_dataset.py` and `save_predictons.py` modules with useful functions for making predictions on large sets of images. Predictions are now saved to .hdf5 by default, which is much more convenient than csv for multi-forward-pass predictions. If using .hdf5, `reformat_predictions.py` is no longer needed.
Expand All @@ -66,11 +80,19 @@ Latest features:
- Support for Weights and Biases (wandb)
- Worked examples for custom representations
- [Colab notebook](https://colab.research.google.com/drive/1miKj3HVmt7NP6t7xnxaz7V4fFquwucW2?usp=sharing) for GZ predictions and fine-tuning
- Schemas (questions and answers for the decision trees) extended to include DECaLS DR1/2 and DR8, in various combinations. See `zoobot.label_metadata.py`.
- Schemas (questions and answers for the decision trees) extended to include DECaLS DR1/2 and DR8, in various combinations. See `zoobot.shared.label_metadata.py`.
- Test time augmentations are now off by default but can be enabled with `--test-time-augs` on `train_model.py`
- `create_shards.py` has been refactored. Use the new example script `decals_dr5_to_shards.py` to replicate Zoobot on DECaLS, and `create_shards.py` for general creation of TFRecords from catalogs. `decals_dr5_to_shards.py` now includes train/val/test splits, which it should have had in the first place.
- `zoobot/data_utils/image_datasets.py` will optionally check if the image paths provided really exist (slightly slower, but sometimes useful). `tfrecord_datasets` and `image_datasets` now serve equivalent purposes.

Contributions are welcome and will be credited in any future work.

If you use this repo for your research, please cite [the paper](https://arxiv.org/abs/2102.08414).
### Replication

For replication of the GZ DECaLS classifier see /replicate. This contains slurm scripts to:
- Create training TFRecords equivalent to those used to train the published classifier
- Train the classifier itself (by calling `zoobot/tensorflow/examples/train_model.py`)

### Citing

If you use this repo for your research, please cite [the paper](https://arxiv.org/abs/2102.08414) and the [code](https://doi.org/10.5281/zenodo.6483175) (via Zenodo).
2 changes: 0 additions & 2 deletions data/pretrained_models/decals_dr_train_set_only_m0/checkpoint

This file was deleted.

0 comments on commit 397f618

Please sign in to comment.