Skip to content

Commit

Permalink
Merge pull request #12 from profjsb/dataset_sim_decam
Browse files Browse the repository at this point in the history
Added DECam model and API for data generator training
  • Loading branch information
profjsb committed Oct 2, 2019
2 parents c10fb38 + f84b63f commit 5045d79
Show file tree
Hide file tree
Showing 20 changed files with 536 additions and 122 deletions.
2 changes: 2 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ v0.1.1, v0.1.2, July 21, 2019 -- documentation for PyPi
v0.1.3, v0.1.4, July 22, 2019 -- arXiv release version

v0.1.5, August 13, 2019 -- added new module: deepCR.train

v0.2.0, September 28, 2019 -- added DECam model and additional feature for training API
56 changes: 24 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,35 @@

Identify and remove cosmic rays from astronomical images using trained convolutional neural networks.

Documentation and tutorials: https://deepcr.readthedocs.io/

This is the installable package which implements the methods described in the paper: Zhang & Bloom (2019), submitted.
This package is implements the method described in the paper:
> [deepCR: Cosmic Ray Rejection with Deep Learning](https://arxiv.org/abs/1907.09500)\
> Keming Zhang & Joshua Bloom\
> _arXiv:1907.09500; ApJ in press_
If you use this package, please cite the paper above and consider including a
link to this repository.

Code to reproduce benchmarking results in the paper is at: https://github.com/kmzzhang/deepCR-paper
[Documentation and tutorials](deepcr.readthedocs.io)

If you use this package, please cite Zhang & Bloom (2019): https://arxiv.org/abs/1907.09500 and consider including a
link to this repository.
[Currently available models](https://deepcr.readthedocs.io/en/latest/model_zoo.html)

Note: the current release includes only model for HST ACS/WFC.

<img src="https://raw.githubusercontent.com/profjsb/deepCR/master/imgs/postage-sm.jpg" wdith="90%">

### New for v0.2.0

[DECam](https://deepcr.readthedocs.io/en/latest/model_zoo.html#decam) deepCR model now available!

```python
from deepCR import deepCR
decam_model = deepCR(mask='decam', device='CPU')
```
Note 1: Model is trained on g-band images but is expected to work on
other filters as well. We are working on benchmarking on different filters
but before that's done please proceed with caution working with other filters.

Note 1: Inpainting model is TBA for DECam.

### Installation

```bash
Expand All @@ -29,7 +45,7 @@ Or you can install from source:
```bash
git clone https://github.com/profjsb/deepCR.git
cd deepCR/
python setup.py install
pip install .
```

### Quick Start
Expand Down Expand Up @@ -82,30 +98,6 @@ mask, cleaned_image = mdl.clean(image, threshold = 0.5, parallel = True, n_jobs=

Note that this won't speed things up if you're using GPU!

### Currently available models

mask:

ACS-WFC-F606W-2-4

ACS-WFC-F606W-2-32(*)

inpaint:

ACS-WFC-F606W-2-32(*)

ACS-WFC-F606W-3-32

Recommended models are marked in (*). Larger number indicate larger capacity.

Input images should come from *_flc.fits* files which are in units of electrons.

### Limitations and Caveats

The currently included models are trained and benchmarked on HST ACS/WFC images in the F606W filter.

Visual inspection shows that these models also work well on filters from F435W to F814W. However, users should use a higher threshold (e.g. 0.9) for short wavelength filters to minimize false detections, if any.

### Contributing

We are very interested in getting bug fixes, new functionality, and new trained models from the community (especially for ground-based imaging and spectroscopy). Please fork this repo and issue a PR with your changes. It will be especially helpful if you add some tests for your changes.
6 changes: 3 additions & 3 deletions deepCR/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
"""
from deepCR.model import deepCR
from deepCR.training import train
from deepCR.evaluate import roc
from deepCR.evaluate import roc, roc_lacosmic

__all__ = ["deepCR", "train", "roc"]
__all__ = ["deepCR", "train", "roc", 'roc_lacosmic']

__version__ = '0.1.5'
__version__ = '0.2.0'
130 changes: 123 additions & 7 deletions deepCR/dataset.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,122 @@
import numpy as np
from torch.utils.data import Dataset

__all__ = ['dataset', 'DatasetSim']


class DatasetSim(Dataset):
def __init__(self, image, cr, sky=None, aug_sky=(0, 0), aug_img=(1, 1), noise=False, saturation=1e5,
n_mask=1, norm=False, percentile_limit=50, part=None, f_val=0.1, seed=1):
""" custom pytorch dataset class to load deepCR-mask training data
:param image: list of complete path to npy arrays, each containing one 2D image
:param cr: list of complete path to npy arrays, each containing one 2D mask
:param sky: np.ndarray [N,] or float, sky background level
:param aug_sky: (float, float). If sky is provided, use random sky background in the range
[aug_sky[0] * sky, aug_sky[1] * sky]. This serves as a regularizers to allow the trained model to adapt to a
wider range of sky background or equivalently exposure time. Remedy the fact that exposure time in the
training set is discrete and limited.
:param aug_img: (float, float). Multiply image (after sky augmentation) by a factor within that range
with a log probability prior.
:param noise: bool. Whether to add Poisson noise to image.
:param saturation: float. maximum pixel value.
:param n_mask: number of mask to stack for each image
:param norm: bool. Whether to scale generated image by the standard deviation and median of
part of the image below percentile_limit.
:param percentile_limit: float. Upper percentile limit to calculate std and median, when norm=True
:param part: either 'train' or 'val'. split by 0.8, 0.2
:param f_val: percentage of dataset reserved as validation set.
:param seed: fix numpy random seed to seed, for reproducibility.
"""

np.random.seed(seed)
self.len_image = len(image)
self.len_mask = len(cr)
self.n_mask = n_mask
self.noise = noise
self.saturation = saturation

assert f_val < 1 and f_val > 0
f_train = 1 - f_val
if part == 'train':
s = np.s_[:int(self.len_image * f_train)]
s_cr = np.s_[:int(self.len_mask * f_train)]
elif part == 'val':
s = np.s_[int(self.len_image * f_train):]
s_cr = np.s_[:int(self.len_mask * f_train)]
else:
s = np.s_[0:]
s_cr = np.s_[0:]

if sky is None:
sky = np.zeros(self.len_image)
elif type(sky) != np.ndarray:
sky = np.array([sky]*self.len_image)

self.image = image[s]
self.cr = cr[s_cr]
self.sky = sky[s]
self.aug_sky = aug_sky
self.aug_img = (np.log10(aug_img[0]), np.log10(aug_img[1]))
self.norm = norm
self.percentile_limit = percentile_limit

self.len_image = len(self.image)
self.len_mask = len(self.cr)

def get_cr(self):
"""
Generate cosmic ray images from stacked dark frames.
Sample self.n_mask number of cr masks and add them together.
:return: (cr_img, cr_mask): sampled and added dark image containing only cr and accumulative cr mask
"""
cr_id = np.random.randint(0, self.len_mask, self.n_mask)
crs = []; masks = []
for i in cr_id:
arr = np.load(self.cr[i])
crs.append(arr[0][None,:])
masks.append(arr[1][None,:])
masks = np.concatenate(masks).sum(axis=0) > 0
crs = np.concatenate(crs).sum(axis=0)
return crs, masks

def get_image(self, i):
data = np.load(self.image[i])
if len(data.shape) == 3:
return data[0], data[1]
else:
return data, np.zeros_like(data)

def __len__(self):
return self.len_image

def __getitem__(self, i):
cr, mask = self.get_cr()
image, ignore = self.get_image(i)
f_bkg_aug = self.aug_sky[0] + np.random.rand() * (self.aug_sky[1] - self.aug_sky[0])
f_img_aug = self.aug_img[0] + np.random.rand() * (self.aug_img[1] - self.aug_img[0])
f_img_aug = 10**f_img_aug
bkg = f_bkg_aug * self.sky[i]
img = (image + bkg) * f_img_aug + cr
scale = img.copy()
scale[scale < 1] = 1.
scale = scale**0.5
if self.noise:
noise = np.random.normal(0, scale, image.shape)
else:
noise = np.zeros_like(image)
img += noise
img[img > self.saturation] = self.saturation

if self.norm:
limit = np.percentile(img, self.percentile_limit)
clip = img[img < limit]
scale = clip.std()
median = np.percentile(clip, 50)
img -= median
img /= scale

return img, mask, ignore


class dataset(Dataset):
def __init__(self, image, mask, ignore=None, sky=None, aug_sky=[0, 0], part=None, f_val=0.1, seed=1):
Expand Down Expand Up @@ -28,16 +144,16 @@ def __init__(self, image, mask, ignore=None, sky=None, aug_sky=[0, 0], part=None
ignore = np.zeros_like(image)

if part == 'train':
slice = np.s_[:int(len * f_train)]
s = np.s_[:int(len * f_train)]
elif part == 'val':
slice = np.s_[int(len * f_train):]
s = np.s_[int(len * f_train):]
else:
slice = np.s_[0:]
s = np.s_[0:]

self.image = image[slice]
self.mask = mask[slice]
self.ignore = ignore[slice]
self.sky = sky[slice]
self.image = image[s]
self.mask = mask[s]
self.ignore = ignore[s]
self.sky = sky[s]

self.aug_sky = aug_sky

Expand Down
102 changes: 89 additions & 13 deletions deepCR/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,42 @@
from tqdm import tqdm as tqdm

from deepCR.util import maskMetric
from deepCR.dataset import dataset
from deepCR.dataset import dataset, DatasetSim
from skimage.morphology import disk, dilation
import astroscrappy.astroscrappy as lac


__all__ = ['roc']
__all__ = ['roc', 'roc_lacosmic']


def _roc(model, data, thresholds):
def _roc_lacosmic(data, sigclip, objlim=2, dilate=None, gain=1):
nROC = sigclip.size
metric = [np.zeros((nROC, 4)), np.zeros((nROC, 4))]
for t in tqdm(range(len(data))):
dat = data[t]
image = dat[0]
msk = dat[1]
ignore = dat[2]
for i in range(nROC):
pdt_mask, cleanArr = lac.detect_cosmics(image, sigclip=sigclip[i], sigfrac=0.3, objlim=objlim,
gain=gain, readnoise=5, satlevel=np.inf, sepmed=False,
cleantype='medmask', niter=4)
pdt_mask *= (1 - image[3]).astype(bool)
metric[0][i] += maskMetric(pdt_mask, msk * (1 - ignore))
if dilate is not None:
pdt_mask = dilation(pdt_mask, dilate)
metric[1][i] += maskMetric(pdt_mask, msk * (1 - ignore))
TP, TN, FP, FN = metric[0][:, 0], metric[0][:, 1], metric[0][:, 2], metric[0][:, 3]
TP1, TN1, FP1, FN1 = metric[1][:, 0], metric[1][:, 1], metric[1][:, 2], metric[1][:, 3]
TPR = TP / (TP + FN)
FPR = FP / (FP + TN)
TPR1 = TP1 / (TP1 + FN1)
FPR1 = FP1 / (FP1 + TN1)

return (TPR * 100, FPR * 100), (TPR1 * 100, FPR1 * 100)


def _roc(model, data, thresholds, dilate=None):
""" internal function called by roc
:param model:
Expand All @@ -17,22 +46,58 @@ def _roc(model, data, thresholds):
:return: tpr, fpr
"""
nROC = thresholds.size
metric = np.zeros((nROC, 4))
metric = [np.zeros((nROC, 4)), np.zeros((nROC, 4))]
for t in tqdm(range(len(data))):
dat = data[t]
pdt_mask = model.clean(dat[0], inpaint=False, binary=False)
msk = dat[1]
ignore = dat[2]
for i in range(nROC):
binary_mask = np.array(pdt_mask > thresholds[i]) * (1 - ignore)
metric[i] += maskMetric(binary_mask, msk * (1 - ignore))
TP, TN, FP, FN = metric[:, 0], metric[:, 1], metric[:, 2], metric[:, 3]
tpr = TP / (TP + FN)
fpr = FP / (FP + TN)
return tpr * 100, fpr * 100
metric[0][i] += maskMetric(binary_mask, msk * (1 - ignore))
if dilate is not None:
binary_mask = dilation(binary_mask, dilate)
metric[1][i] += maskMetric(binary_mask, msk * (1 - ignore))

TP, TN, FP, FN = metric[0][:, 0], metric[0][:, 1], metric[0][:, 2], metric[0][:, 3]
TP1, TN1, FP1, FN1 = metric[1][:, 0], metric[1][:, 1], metric[1][:, 2], metric[1][:, 3]
TPR = TP / (TP + FN)
FPR = FP / (FP + TN)
TPR1 = TP1 / (TP1 + FN1)
FPR1 = FP1 / (FP1 + TN1)

return (TPR * 100, FPR * 100), (TPR1 * 100, FPR1 * 100)


def roc(model, image, mask, ignore=None, sky=None, n_mask=1, seed=1, thresholds=np.linspace(0.001, 0.999, 500),
dilate=False, rad=1):
""" evaluate model on test set with the ROC curve
:param model: deepCR object
:param image: np.ndarray((N, W, H)) image array
:param mask: np.ndarray((N, W, H)) CR mask array
:param ignore: np.ndarray((N, W, H)) bad pixel array incl. saturation, etc.
:param thresholds: np.ndarray(N) FPR grid on which to evaluate ROC curves
:return: np.ndarray(N), np.ndarray(N): TPR and FPR
"""
kernel = None
if dilate:
kernel = disk(rad)
if type(image) == np.ndarray and len(image.shape) == 3:
data = dataset(image, mask, ignore)
elif type(image[0]) == str:
data = DatasetSim(image, mask, sky=sky, n_mask=n_mask, seed=seed)
else:
raise TypeError('Input must be numpy data arrays or list of file paths!')
(tpr, fpr), (tpr_dilate, fpr_dilate) = _roc(model, data, thresholds=thresholds, dilate=kernel)
if dilate:
return (tpr, fpr), (tpr_dilate, fpr_dilate)
else:
return tpr, fpr


def roc(model, image, mask, ignore=None, thresholds=np.linspace(0.001, 0.999, 500)):
def roc_lacosmic(image, mask, sigclip, ignore=None, sky=None, n_mask=1, seed=1, objlim=2, gain=1,
dilate=False, rad=1):
""" evaluate model on test set with the ROC curve
:param model: deepCR object
Expand All @@ -42,6 +107,17 @@ def roc(model, image, mask, ignore=None, thresholds=np.linspace(0.001, 0.999, 50
:param thresholds: np.ndarray(N) FPR grid on which to evaluate ROC curves
:return: np.ndarray(N), np.ndarray(N): TPR and FPR
"""
data = dataset(image=image, mask=mask, ignore=ignore)
tpr, fpr = _roc(model, data, thresholds=thresholds)
return tpr, fpr
kernel = None
if dilate:
kernel = disk(rad)
if type(image) == np.ndarray and len(image.shape) == 3:
data = dataset(image, mask, ignore)
elif type(image[0]) == str:
data = DatasetSim(image, mask, sky=sky, n_mask=n_mask, seed=seed)
else:
raise TypeError('Input must be numpy data arrays or list of file paths!')
(tpr, fpr), (tpr_dilate, fpr_dilate) = _roc_lacosmic(data, sigclip=sigclip, objlim=objlim, dilate=kernel, gain=gain)
if dilate:
return (tpr, fpr), (tpr_dilate, fpr_dilate)
else:
return tpr, fpr

0 comments on commit 5045d79

Please sign in to comment.