Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select bounding box for each peak #14

Closed
pmelchior opened this issue Aug 28, 2017 · 4 comments
Closed

Select bounding box for each peak #14

pmelchior opened this issue Aug 28, 2017 · 4 comments
Assignees
Labels

Comments

@pmelchior
Copy link
Owner

Currently the model of each component covers all pixels in the image. That's overkill for the evaluation of the likelihood gradients. Instead one could work with a subset of pixels, loosely based on previously detected footprints.

One can think of a few ways of doing that:

  1. Each component exists only in its box, so that constraints only have to check for that box (we'd have to insist that the models are zero outside of that bounding box otherwise each component can do whatever it pleases outside of its own box). For the all procedures that require the actual model, e.g. likelihood gradients, the components need to be translated accordingly. That means that S is internally always represented by a vector (not a matrix), which we already partially implement in the dot products of block-diagonal constraint matrices, and the generation of the model is more involved. The downside is that we need to build the constraint matrices separately for each component (or work with fixed box size increments to reduce that number). It seems to me that this will increase initialization time but reduce runtime.
  2. Use a sparse matrix form for S that allows us to quickly update a set of predefined elements (for every component: those inside of the box, the rest is zero) as long as we don't change the structure of the matrix. The models would automatically have the right format, and the changes to the code would be minimal. There is probably a slightly larger runtime cost. This approach seems fairly straightforward to me, but I can't seem to find a way to do this within scipy.sparse.

In both cases, we'd automatically prevent the degeneracy between two remote objects having the same color.

@fred3m
Copy link
Collaborator

fred3m commented Aug 29, 2017

I have an idea about how to implement number 1, outlined in this ipython notebook, which uses a real HSC image to illustrate how this might work. It doesn't actually solve for the solution, but shows how one might go about choosing the bounding boxes for each object.

The basic idea is that the SED in neighboring pixels changes very little, and in fact differ very little from a peak except in blended regions. So we can mask out the image except for the region that encloses a peak, and then allow that region to grow by some reasonable factor to ensure that all of the flux is enclosed. This works well for the complicated blend I tested it on but more testing needs to be done to test the validity of the method. For example, some of the objects are given significantly larger regions because nearby sources have similar colors, but the important thing is that (at least in this example) none of the objects appear to be missing any flux. It would be interesting to see how this method works on spiral galaxies that have multiple colors.

@pmelchior
Copy link
Owner Author

That looks quite good already. I don't fully understand the logic that decides the size/shape of the bounding box. However, this is not really what this ticket is about. Even if we had bounding boxes, we wouldn't have a mechanism to use them.

@pmelchior
Copy link
Owner Author

pmelchior commented Sep 26, 2017

Here's a different thought that should solve all of the problems we've had about separable objects
run bSDMM with a extended set of variables [S1,S2,...,SK,A], where Si is the spatial part of i-th object. This way they can be treated independently (in whatever boxes we want to have) and the convergence can be checked for any of them separately. This last part should be very good for diagnostic purposes.

This requires a few changes.

  1. deblender calls proxmin.bsdmm directly.
  2. a new class for building the model and calculating likelihood gradients.
  3. initialization of constraints for S needs to be split.

Second item will be most important. There are several redundant computations for the likelihood gradients. By pulling those together in a class, we can store the results of previous calculations and control when updates to internal variables (such as the fully assembled S matrix, or the A matrix) are done. It would also be the interface for the user to inspect and render the deblender outputs.

So, the action of the deblend method would be to take a list/tree of peaks (with constraints, i.e. a PeakCatalog) and to turn it into a list/tree of children with flags plus a mixing matrix. I think we can alter the currently half-useful Deblend class to incorporate the necessary functions (). I would however rename the class to Blend because it is the model of the blended scene. And that should be returned from a call to deblend.

Lastly, we already have a class Step_AS that acts in a similar way by storing A and S to have quicker estimates of the Lipschitz constant. We should fold that class in with Blend.

@pmelchior
Copy link
Owner Author

Another piece of logic for item 2), and how we deal with the boxes:

  • We treat each component separately: [S1,...,Sk]
  • Put it into the full matrix S
  • Compute the residual R=(Y-APS)
  • Compute updated S' = S - 1/L_S A^T(Y-APS)
  • Slice out the Sk from S' and report those individually as result of prox_f(Xk), which would then go into the prox_g.

The question is: which of the elements do we need to store/update for the A update: A - 1/L_a (Y-APS) (PS)^T? Note that the convolved sources appear in the last term. Formally, all occurrence of S needs to be replace with S', so we'd have to compute the model twice per iteration. I think that the differences in gradient direction (with and without update of S) will be rather modest, at least after a few iterations. Given that the convolution operation to build the entire model is expensive, this suggest that we keep either PS or R fixed for the updates of S and A.

The simplest thing would this be to use the old S matrix for the A update. This is a deviation from the coordinate descent method, but we'd only have to build the model APS once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants