Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ringing in low magnitude galsim sims #222

Open
herjy opened this issue Nov 5, 2020 · 30 comments
Open

Ringing in low magnitude galsim sims #222

herjy opened this issue Nov 5, 2020 · 30 comments

Comments

@herjy
Copy link
Collaborator

herjy commented Nov 5, 2020

I recently noticed a significant ringing in the residuals of fits to very bright galaxies simulated with galsim.
I intially thought that mono was not the culprit, but it seems that the proxes are making things a lot worse.
Switching ALL constraints off makes things a lot better, but I'm not convinced that there isn't something wrong. Then each constraint, except for CenterOnConstraint does somnething awful.
No constraint & CenterOnConstraint:
Screen Shot 2020-11-05 at 6 04 23 PM
Positivity only:
Screen Shot 2020-11-05 at 6 05 17 PM
Normalization("max") only:
Screen Shot 2020-11-05 at 6 06 54 PM
monotonic only:
Screen Shot 2020-11-05 at 6 08 06 PM

@herjy
Copy link
Collaborator Author

herjy commented Nov 5, 2020

It hints again at an issue with band-limitedness. In particular, i looks like the edges create some issues.

@herjy
Copy link
Collaborator Author

herjy commented Nov 5, 2020

Note, I get the same results with gaussians generated fron scarlet.GaussianPSF

@pmelchior
Copy link
Owner

Good example. This is the same problem as in #215. As we have said there, it is unlikely that errors in the diff kernel itself have enough power to do that. With this new test, it's indeed more likely that undersampling is responsible.

Here's what I see. Both positivity and monotonicity become important in the outskirts and can change pixel values there even if the gradients are bandlimited (which they are). Thus, their artifacts look similar. But it seems that both have a step within the significant region of the source. I have no idea why because the proxes don't do anythere there.

Normalization and center on don't do anything for that bright a source (and in the former case don't affect the band limit). So, their effects should be much weaker, if any. What we see are minor checkerboard patterns at the edges of the box, plus some ringing towards the center. These could come from backpropagating noise. Whenever you have some power at the edge, Gibbs will kick in and create artifacts.

So, in both cases I think the proxes cause the solutions to move past the band limit set by the model kernel. Can you check the spectrum of the solutions and the model kernel?

@pmelchior
Copy link
Owner

I would try two separate things, with the same set up as above:

  • Increase the minimum box size to something like 75 pixels.
  • Run the same with the real-space convolution instead of FFT

@pmelchior
Copy link
Owner

Also, what was the initial guess for this model?

@herjy
Copy link
Collaborator Author

herjy commented Nov 9, 2020

Sorry, I missed your posts. Already tried the boxes, nothing changed. Convolutions are already in real space. spectrum and initial guess look fine (no ringing in the morph).

@pmelchior
Copy link
Owner

Another idea. Can you initialize both spectrum and morphology at the truth (i.e. the shape convolved with the model PSF). Then let the optimizer run. It should be converged within a few steps, but if some error is in the code, or noise is too important, it can walk away from the true minimum.

@herjy
Copy link
Collaborator Author

herjy commented Nov 10, 2020

Initialisations at the truth don't converge, which is super weird:
Screen Shot 2020-11-10 at 1 34 58 PM

Though I checked that the sources projected in the the image plane fit perfectly the data (i.e. the forward path is clear). Actually the final fit to the data is decent and it shows axactly the same power spectrum as the true source. However, it has tiny sharp features that produce the rings even if there are no constraints: (With constraints the sharp features are here but organised as a ring)
Screen Shot 2020-11-10 at 1 40 22 PM
(This [up] is modeled source - true source and below are the residuals )
Screen Shot 2020-11-10 at 1 53 26 PM

This actually makes no sense with regard to the optimisation problem. Indeed, the signal we reconstruct is supposed to be convolved by the model psf and should be band-limited. However, we have nothing that guarantees it.

@herjy
Copy link
Collaborator Author

herjy commented Nov 10, 2020

These tests are performed on gaussian profiles generated with scarlet, not galsim. I just realise that I forgot to put inverse variance weighting. Now that it is on, the results are better but still show some structures that come from these strange edges:
Screen Shot 2020-11-10 at 2 56 39 PM

@herjy
Copy link
Collaborator Author

herjy commented Nov 10, 2020

Typically, this happens to the power spectrum, when it shouldn't:
Screen Shot 2020-11-10 at 3 17 11 PM

@pmelchior
Copy link
Owner

Not so fast. I think this is to be expected. The optimizer will make a step in the direction of the gradient, which is non-zero because the best-fit solution to the noisy image isn't the true solution. The amplitude of these steps are set by us: for ExtendedSource each pixel can vary by as much as 10% of the average pixel values. So, yes, the solver will move to some location by making a step away from the truth, and then attempt to recover the best solution, which looking at the loss curve seems like it works.

However, it cannot simply get rid of pixel power because we don't like it. It's an almost degenerate solution, which you can see when you compare the rendered model with the observation. The fluctuations in the model of the last example are correlated, but they also seem to have sharp edges. Did you use positivity here?

Also, the comparison of the power needs to be made wrt the model PSF, not the true source. Since our model PSFs are very compact, I'm not sure how much excess power there really is in our model.

The important thing for me is that there are no large scale artifacts in this example. The residuals are correlated but don't ring. This to me suggests that our forward path and the gradients are clean, and that the optimizer works as expected when started very close to the optimum.

So, what remains to be determined is:

  1. do these ringing artifacts arise because of the proxes not obeying the band limit of the model PSF, or
  2. do they arise because the optimizer can't find a better solution (namely one without the artifacts) when started at an inferior initial guess, regardless of the proxes.

Given the very first example from above which used only CenterOn (irrelevant here),

I am guessing that it's a combination of 1 and 2. When started at some half-good guess, the gradients (band-limited) will move that solution, and the proxes will chop off what they don't like (not band-limited). Now even the forward path isn't clean anymore, which will lead to artifacts if we use FFTs. But even without, it's possible that the optimizer is now on a trajectory that amplifies some combination of noise pattern and penalty function, to the detriment of the gradient updates. If this is the failure mode, we could apply the proxes on every second iteration to break the cycle.

@herjy
Copy link
Collaborator Author

herjy commented Nov 10, 2020

In the last example: No constraint -> No positivity
In the power spectrum, the true source is convolved by the model psf.

Looking at my last example, the ringing depends on the resolution. I doubled the size of the psf (aka went from Euclid to Rubin) and I saw more rings than I did when I watched Julia Roberts' entire filmography.

I don't see how the optmizer can converge to these solutions, the 2-norm isn't minimised and it rings coherently, which seems to only make things worse, not better.

@pmelchior
Copy link
Owner

Could you add the power spectrum of the model PSF? I believe you right that the model has power where it shouldn't, but comparing it to the true galaxy (even convolved) doesn't confirm that. We expect small scale power from the noise, the question is do we get some contribution that isn't band limited by the model PSF.

You doubled the size of which PSF: observed or model?

@herjy
Copy link
Collaborator Author

herjy commented Nov 11, 2020

Here's the full plot:
Screen Shot 2020-11-10 at 8 36 34 PM

I increased the size of the observation psf

@pmelchior
Copy link
Owner

So, we believe now that the rings are not artifacts (of some undersampling effect), but local minima of the objective function, which might become traps if the initial guess is quite far off, and the step sizes are too small. This statement is based on tests with relatively large errors on overall amplitude of the SED (but correct morphology), where the rings don't vanish even after 1000 iterations. It also makes sense that this would occur for bright galaxies where the local minima would be more pronounced and not wiped out by noise.

The practical solution to this problem is better initialization (coming soon), but that's not always possible. In the absence of that, we could allow for larger steps if needed (this is the topic of #211); adopt a variable step size schedule (which is common thing in neural net training but not normally used with Adam); or use other trickery to optimize more robustly.

I prefer to fix this problem with better initialization and maybe #211.

@herjy
Copy link
Collaborator Author

herjy commented Nov 12, 2020

Removing the psf "corrections" makes things a lot better. Removing these lines (1) & (2) makes things a lot better, but it is still not perfect. The initialisations are now wrong by a few points (absolute, not relative, I'm assuming this depends on the psf norm) but lower than the final value (instead of higher) which leads to slightly better results but ringing is still there. This is concerning because initialisation in crowded areas, even if improved, might not give a good enough result and lead to artefacts.

On the bright-ish side (and this is a very optimistic way of phrasing things), the effect is much worse on LRO, despite the initialisation yielding the same values as for Observations, which might explain why the performance of joint processing seemed worse in the bright end on previous tests.

@pmelchior
Copy link
Owner

The first line you linked needs to stay in place because otherwise the return value isn't the spectrum of a point source. The second line is indeed problematic for very extended sources. I'm thinking about a way to make these work.

@herjy
Copy link
Collaborator Author

herjy commented Nov 12, 2020

The first one is just as bad for extended sources. It's taking a point measurement where there is an integration. Typically for a test where psfs and sources are gaussians with max at 1 for the source and sum at 1 for the psf, the first line takes the the spectrum from 1 to 46...

@pmelchior
Copy link
Owner

Sure, no question about that, but read the docstring. The return value of this method is the SED of a point source! This needs to be stated (and obeyed) because otherwise the spectrum is completely ill-defined when the PSF has width variations across bands.

@pmelchior
Copy link
Owner

I have an idea for this problem. In weak-lensing land we use moment-based estimates like the galaxy resolution R = 1 - T_psf / T_gal as an indicator how close a source is to the PSF (width). T is the trace of the second moment matrix, and T_gal is measured from the observed=convolved image.

get_pixel_spectrum has two options: one that doesn't correct for the observed PSF, and one that does, assuming that the source is a point source. For very extended galaxies this correction is not necessary because the peak heigh, which we see in the peak pixel, is unaffected by the PSF (think: convolution of a constant gives the same constant). So, we should interpolated between these two regimes as a function of R.

If we allow ourselves to assume a Gaussian shape for the convolved source, then the computation of the peak height is analytic for the moments of the PSF and the moments of the source (either convolved or preconvolved). That's the only option that avoids a forward convolution for every source and every observation. This would give us the correct factor to interpret the spectrum of the peak pixel even for extended galaxies.

@herjy
Copy link
Collaborator Author

herjy commented Jan 19, 2021

The latest changes to initialisation did not improve on this. I still observe artefacts for sources below 20 mags.

@pmelchior
Copy link
Owner

Are you running set_spectra_to_match?

@herjy
Copy link
Collaborator Author

herjy commented Jan 19, 2021

Yup! Actually, going down to mag 19 seems ok. But below that artefacts ring a lot, especially for well resolved sources. The cutoff is not as sharp as it used to be from the experiments I've made, but the problem is still there. I'll run my sims and see what happens, but I am affraid that the bright end will still be affected.

@fred3m
Copy link
Collaborator

fred3m commented Jan 19, 2021

I ran some tests late last week and I think that undersampling may still be a part of this. The discussion is a bit lengthy, so we can discuss in our meeting tomorrow and I'll update this issue after that (if necessary).

@pmelchior
Copy link
Owner

Then it isn't the mechanism we identified. Remy's report indicated that the mismatch in the spectrum amplitude is driving the ringing. This mismatch is now absent. We also concluded that FFT artifacts (e.g. from undersampling the models) are orders of magnitude too small to explain our findings. Let's talk about this tomorrow.

@herjy
Copy link
Collaborator Author

herjy commented Jan 26, 2021

And for some reason, this is worse for multi-resolution. This is sort of understandable if the issue comes from an issue of likelihood exploration but still, YUCK!
The shape of the feature is very strange and it is concerning that the optimizer does not pick up on that (nvm the colour):
Multi-resolution:
Screen Shot 2021-01-26 at 2 10 17 PM
Single resolution:
Screen Shot 2021-01-26 at 2 16 27 PM
Screen Shot 2021-01-26 at 2 16 22 PM

@pmelchior
Copy link
Owner

What's the difference between the top and bottom row in each case?

@pmelchior
Copy link
Owner

And how is the second set not also multi-resolution? There are clearly different numbers of pixels involved?

@herjy
Copy link
Collaborator Author

herjy commented Jan 27, 2021

Top 2 rows are multi-resolution. Bottom rows are two independent single resolution runs. (Euclid and Rubin)

@pmelchior
Copy link
Owner

pmelchior commented Jan 27, 2021

This is a different version of the earlier problem: massive mismatch of the initial model. In this case it happened since the morphology of the model in multi-resolution is determined from the high-res channel, where source 2 is essentially absent.

But the mechanism that prevents the low-resolution to fit is interesting: the subgradients from the proxes null the gradient update. The residuals make for a gradient that wants to decrease the model in the center and increase it in a ring around the center. Because the model is flat, this leads to a center that is lower than its perimeter, which monotonicity will undo. The result is a flat pancake whose mean amplitude is sort of correct to fit the model.

I believe this is the true origin of the problems we keep seeing. Suggest to remedy this by switching the proxes off occassionally. We have experimented with random perturbations of the gradients, or with omissions of grad updates, all meant to prevent spurious features and cyclic update dependencies. I suggest that we modify Blend.fit to bypass the proximal mapping in 10% of the time, or so.

What we need to determine is:

  • does that suffice to break this deadlock
  • does that derail blends where it works find as is
  • is there a better metric (e.g. very strong non-vanishing gradients, cf. Reporting and resetting step sizes #211 ) to determine which parameter would benefit from the prox bypass

I will take a stab at this

@pmelchior pmelchior mentioned this issue May 2, 2022
pmelchior added a commit that referenced this issue May 2, 2022
* created leaky prox (possible remedy for #222)

* moved leaky prox to constraint to allow customization

Co-authored-by: herjy <remyjoseph@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants