Deblender "overfitting"? #27

lmezini · 2018-01-25T15:10:12Z

We think that deblender may be overfitting images by including some of the noise belonging to the original image, in the model. Here are two images where we see this effect (with background rms = 10). The residual from after the model has been subtracted from the original image looks a bit smoothed out.

We use metacalibration on the 'Original - Model of Neighbor' image to measure shear and we also calculate any bias in the process. We measured a bias of -0.0215581 +/- 0.00180595 from a set of 1e6 images with background rms = 10. In a control test with 1e6 images and a much lower background rms of 0.001, we measured a smaller bias of -0.00534828 +/- 9.46265e-05. Finally, we also ran another control in which we didn't use deblender and ran metacalibration on images of only a single object in the center. We used 1e6 images and a background rms = 10 for this test and measured a smaller bias of 0.000828241 +/- 0.00177186.

Do you have any suggestions as to how we can prevent this 'overfitting'?

esheldon · 2018-01-25T15:13:14Z

Also, do you developers have other ideas for what might be going wrong, or tests we could do?

pmelchior · 2018-01-25T22:49:06Z

The model is non-parametric. Without PSF (de)convolution, this means every pixel assumes the best-fit value that is allowed under the constraints (normally symmetry and monotonicity), which is very likely achievable despite the noise: you get the closes approximation of the data (signal+noise) that satisfies the constraints. This is not a flaw of the model, it's simply its nonparametric characteristic.

To suppress noise effects one can apply

a partial deconvolution with a difference kernel of ~1/2 with of the science PSF width, which couples neighboring pixels together
a l0 sparsity penalty, which imposes a minimum threshold on any pixel to be significant (cuts the outskirts)
a l1 sparsity penalty, which compresses the spuriously high fluctuation (reducing the effect of the Poisson noise in the center)

The second and third option will result in different biases: l0 will reduce the total flux, l1 will reduce the inner profile slopes (and probably the flux). Since our default set up is for photometry, we don't use a sparsity penalty. But it might be valuable for shapes. I would try something like constraints['l0'] = bg_rms.sum().

Also, applying the PSF deconvolution makes the code slower, but is useful particularly for compact sources. It should also reduce the per-pixel noise sensitivity.

esheldon · 2018-01-26T14:34:39Z

Thanks Peter. I have some questions

why is bg_rms an array in your example? Should it not be the variance of the background, implied by rms ? If we have constant noise bg_rms=constant, does this imply we should use constraints['l0'] = bg_rms/sqrt(npixel) ?
How would we implement this "difference kernel"? How is it defined? Why use half the size of the true psf?

I see this code in the example:

pdiff = [PSF[b] for b in range(B)]
psf = scarlet.transformations.GammaOp(shape, pdiff)
blend = scarlet.Blend(sources, img, bg_rms=bg_rms, psf=psf)

Is PSF[b] an image, and how do we construct it from the true psf?

pmelchior · 2018-01-26T15:09:19Z

bg_rms is the per-band sky RMS (not per-pixel). If there are several bands, the shape model is the sum over all (if the SED is flat), thus the cutoff should be roughly bg_rms.sum(). In detail, one would like to take the SED into account, do sum in quadrature, and may even want to worry about the radial profile, but the flat sum is simple and works.
We use a linearized form of the PSF convolution, which means that the PSF and the deconvolved object need to be well-sampled. Plus, a full deconvolution leads to artifacts and inverse noise problems, so the suggestion is to deconvolve partially with a difference kernel. Because that difference kernel still needs to be well-sampled, we recommend having its width set to ~1/2 of the actual PSF.

The PSF difference kernels are given as images and can be constructed any way you like. We have a Galsim implementation that I created (scarlet/psf_match.py), which does the job but very likely is inferior to what others could do who do difference kernels for a living. Since you work in one band, it should be sufficient to simply run the function getDiffKernel from my code.

esheldon · 2018-01-26T15:30:27Z

I still didn't follow the bg rms thing. So if we have one band and we added noise this way

im += rng.normal(scale=bg_rms, size=im.shape)

Then would we set constraints['l0'] = bg_rms?

I also didn't understand how to actually construct the difference kernel, but I'll revisit that later.

pmelchior · 2018-01-26T17:10:30Z

Yes, that's right. In one band, set the cutoff at the noise level, or thereabout.

lmezini · 2018-02-05T13:50:17Z

We did a new run using the same set up as before except this time we included the constraint, constraints['l0'] = bg_rms. Our bias measurement was initially -0.0215581±0.00180595, but now it is -0.0404289 +/- 0.00179187. We would like to try using the 'l1' sparsity penalty next. Could you explain how to implement this? Thank you.

pmelchior · 2018-02-05T15:10:01Z

I'm a bit confused by your findings, or maybe what you're trying to do. It seems that the bulk of the bias is coming from the noise (see your first message). This should be taken out by metacal, at least if the objects were isolated. This means that it's the combination of noise and blending that somehow breaks metacal. Is this also how you see it?

esheldon · 2018-02-05T15:16:46Z

Some of the noise is being absorbed by the model of the neighbor, so when the neighbor is subtracted the noise model for the new image is wrong. We would have a similar issue if we use the model for the central object and added some noise to it (rather than subtracting the neighbor from the original image).

For metacal to work the noise model must be known very accurately.

This is my current working explanation.

pmelchior · 2018-02-05T15:59:44Z

I suspect that the problem is in the setup. You're using scarlet model to define the neighbor, subtract it, and the measure the reference object. When you make the neighbor more compact (by increasing the l0 sparsity penalty), there's more light left over that affects the reference. We have done tests with the same approach for photometry, and they're not pretty. It much better to instead determine the shape directly from the scarlet model of the reference object.

If you don't use any PSF model for the scarlet model, you have a WYSIWYG model of the object, with some regularity conditions (symmetry, monotonicity, maybe sparsity), that can be used as input to whatever ellipticity measurement method you've got. There's no uncertainty on the model, so running a model fitter afterwards may be tricky, but moments will work just fine.

Could you try to run a test with isolated and blended sources with this setup? I think they should come out better. However,

For metacal to work the noise model must be known very accurately.

this could be an issue.

esheldon · 2018-02-05T16:04:37Z

It sounds like we might expect some improvement using the model and then adding noise to it (because metacal needs that noise). The noise will still be wrong in the center, but maybe the overall performance will still be better.

pmelchior · 2018-02-05T16:13:55Z

Why? ngmix needs noise, but something like KSB doesn't, and you can metacal the latter, right?

esheldon · 2018-02-05T16:25:25Z

This is a metacal issue actually.

It needs to know the noise model very accurately. In this case there will be noise still in the model but it isn't easy to determine exactly what it is. It absorbs some noise in some areas, maybe not others, etc. The only noise model we can know well in this case is the noise in the original image. This is why we initially subtracted the model of the neighbor from the original image, in order to try to preserve the noise.

If you think using the model of the central is the best approach, then I think we need to try to restore the original noise level as best as possible before feeding into metacal.

pmelchior · 2018-02-05T16:43:06Z

I see. I'm not sure that it is the best approach, but at least for photometry we get much better results as from the neighbor subtraction.

Here's a random thought: if you can estimate the noise and its correlation in the scarlet model, one could re-whiten it to correspond to one in the input image (or a uncorrelated white one for convenience).

esheldon · 2018-02-05T16:45:35Z

Agreed with the random thought: but how to estimate it? It will be spatially variable.

If we knew the true underlying image we could do it easily, but otherwise I think it would be model dependent.

pmelchior · 2018-02-05T16:54:35Z

Check out the images Lorena posted first. Original - Model has a different noise power spectrum as Original. Evaluating O-M at the location of the model components and assuming O is stationary (so that you can measure it someplace else), should get you an estimate of the noise in M.

esheldon · 2018-02-05T17:01:23Z

It will be highly spatially variable from what I can see. How would you propose to deal with that?

pmelchior · 2018-02-05T17:03:59Z

You mean even within the footprint of the model?

esheldon · 2018-02-05T17:08:32Z

Are you suggesting that, within the boundary of the model sub-image, O-M might be stationary noise? If that's the case then I think what you suggest might be worth trying.

pmelchior · 2018-02-05T17:13:15Z

The constraints don't explicitly care about flux, so to first order: yes.

In this case, I would maybe remove the l0 penalty again because it's hard on the outskirts, but the model itself might also have mostly constant noise over the more compact footprint. So, not sure which way to lean.

esheldon · 2018-02-05T17:14:25Z

OK, we'll play around with it. Thanks.

lmezini · 2018-02-08T21:10:17Z

Hi, thanks for all your help. We tried estimating the extra noise and adding it back to Orig-Neighbor in the region where the neighbor was located. Running metacal on 1e6 images like this resulted in a bias of -0.0133093 +/- 0.00195421 when using background rms = 10. This is smaller than before.

We've also been doing some further analysis of Orig-Model in the region associated with the neighboring object. We found this difference for 1e6 iterations and then calculated the average standard deviation for each pixel in this region. This was done for images with background rms 10 and 0.001, which when calculating the shear had a bias of -0.0215581 +/- 0.00180595 and -0.00534828 +/- 9.46265e-05 respectively. The results from these tests are shown in the images.

esheldon · 2018-02-08T21:14:48Z

To clarify this a bit: the image at the bottom shows the standard deviation of original-model as a function of distance from the center of the neighbor.

If the model were not absorbing any of the noise, this would always equal to 10, which is the input noise. So it is absorbing some noise, and it is not stationary, since there is structure here.

pmelchior · 2018-02-09T03:53:46Z

That's fascinating! So, do I understand the numbers here correctly that the model absorbs 25-30% of the noise? Sounds plausible to me. Btw, the structure in the bottom has a variation of <10%, so my statement of being stationary to first order was about right.

@lorena21395 I don't understand the top plot, though. How come the measured STD is larger than the noise RMS of 0.001?

esheldon · 2018-02-09T12:40:40Z

On Thu, Feb 8, 2018 at 10:53 PM, Peter Melchior ***@***.***> wrote: That's fascinating! So, do I understand the numbers here correctly that the model absorbs 25-30% of the noise. Sounds plausible to me. Btw, the structure in the bottom has a variation of <10%, so my statement of being stationary to first order was about right.

Yes. But with putting back in the average absorbed noise over the model area leaves a a 1.4% bias for the noisy case, compared to onlyl 0.5% for the low noise case. It may be due to this variation. And it isn't clear to me how we could predict that variation.

@lorena21395 <https://github.com/lorena21395> I don't understand the top plot, though. How come the measured STD is larger than the noise RMS of 0.001?

We were hoping that you might have ideas! —

…

You are receiving this because you commented. Reply to this email directly, view it on GitHub <#27 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAl8XaYa0_N00W2zAHX9qfxvGTmJQQlHks5tS8FLgaJpZM4Rs_gR> .

-- Erin Scott Sheldon Brookhaven National Laboratory erin dot sheldon at gmail dot com

pmelchior · 2018-02-09T17:40:46Z

zilch

pmelchior · 2018-02-12T21:58:54Z

Are these bias results for isolated or neighbor-subtracted sims?

lmezini · 2018-02-12T22:19:03Z

@pmelchior We isolate the central object by subtracting the model of the neighbor from the simulated image.

pmelchior · 2018-02-12T22:21:57Z

Is the bias still there if you work on isolated galaxies, but estimate the noise in the way you do now.

I'd like to know if the deblender fails at the level of the model or we fail to provide the correct noise estimate.

lmezini · 2018-02-12T22:44:00Z

We did a test where we added noise back to the model of the central object only. We ran this through metacal and measured a bias of -0.0374567 +/- 0.00178735.

We calculated the noise added by doing: sqrt( bg_rms**2 - (orig-model).var() ). Model here refers to the model of both objects.

Is this what you were describing?

pmelchior · 2018-02-13T01:45:25Z

It depends.

Is orig-model evaluated over the entire postage stamp or only over the footprint of model?
Was there only one object in the image?

lmezini · 2018-02-13T02:14:23Z

Yes, orig-model is evaluated over the entire image. There were two objects in the original image.

pmelchior · 2018-02-13T05:36:08Z

Then I have a number of objections:

orig-model should only be evaluated over the footprint of the model. This is where the reduction in noise happens, so currently you're underestimating the noise you add.
I think you're adding the noise to the entire postage stamp. Again, the treatment is only needed where the model is present.
I would really like to isolate the noise estimation part from the deblending part. Using a single object per image would achieve that. I understand that you want to deblend galaxies, but we're trying to understand where the bias is coming from, so getting rid of one problem is useful.

lmezini · 2018-02-16T04:36:18Z

In regards to your last point, we did a control test where we input images of a single galaxy directly into metacal. We specified the noise as the background rms. This results in a 0.000828241 +/- 0.00177186 bias. Is this what you were suggesting?

pmelchior · 2018-02-16T05:05:24Z

Not quite, but in reading what I wrote above I must admit that this wasn't clear. You're using scarlet to model the scene and subtract the neighbor, right? I'd suggest to use the scarlet model as input to metacal. One has to worry about a number of things, but we found that to work better (for fluxes) than the neighbor-subtraction method. It's best if we briefly talk about it with Erin.

esheldon · 2018-02-16T14:33:18Z

I am on vacation until next week, but want to point out we also tried just using the model of the central and got a higher bias

…

On Feb 15, 2018 10:05 PM, "Peter Melchior" ***@***.***> wrote: Not quite, but in reading what I wrote above I must admit that this wasn't clear. You're using scarlet to model the scene and subtract the neighbor, right? I'd suggest to use the scarlet model as input to metacal. One has to worry about a number of things, but we found that to work better (for fluxes) than the neighbor-subtraction method. It's best if we briefly *talk* about it with Erin. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#27 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAl8XT4dX7SCXjW8BfLwl3clPLJsI8IVks5tVQyUgaJpZM4Rs_gR> .

pmelchior · 2018-02-16T14:44:12Z

Erin, let's talk next week at some point, but I'd like to better understand that last statement.

pmelchior added the wontfix label Jan 25, 2018

pmelchior closed this as completed Jun 7, 2018

Deblender "overfitting"? #27

Deblender "overfitting"? #27

Comments

lmezini commented Jan 25, 2018

esheldon commented Jan 25, 2018

pmelchior commented Jan 25, 2018

esheldon commented Jan 26, 2018

pmelchior commented Jan 26, 2018

esheldon commented Jan 26, 2018

pmelchior commented Jan 26, 2018

lmezini commented Feb 5, 2018

pmelchior commented Feb 5, 2018

esheldon commented Feb 5, 2018

pmelchior commented Feb 5, 2018

esheldon commented Feb 5, 2018

pmelchior commented Feb 5, 2018 • edited Loading

esheldon commented Feb 5, 2018

pmelchior commented Feb 5, 2018

esheldon commented Feb 5, 2018

pmelchior commented Feb 5, 2018

esheldon commented Feb 5, 2018

pmelchior commented Feb 5, 2018

esheldon commented Feb 5, 2018

pmelchior commented Feb 5, 2018

esheldon commented Feb 5, 2018

lmezini commented Feb 8, 2018 • edited Loading

esheldon commented Feb 8, 2018

pmelchior commented Feb 9, 2018 • edited Loading

esheldon commented Feb 9, 2018 via email

pmelchior commented Feb 9, 2018

pmelchior commented Feb 12, 2018

lmezini commented Feb 12, 2018

pmelchior commented Feb 12, 2018

lmezini commented Feb 12, 2018

pmelchior commented Feb 13, 2018

lmezini commented Feb 13, 2018

pmelchior commented Feb 13, 2018

lmezini commented Feb 16, 2018

pmelchior commented Feb 16, 2018

esheldon commented Feb 16, 2018 via email

pmelchior commented Feb 16, 2018

pmelchior commented Feb 5, 2018 •

edited

Loading

lmezini commented Feb 8, 2018 •

edited

Loading

pmelchior commented Feb 9, 2018 •

edited

Loading