DM-24556: Add normalize method to Defects #200

plazas · 2020-05-05T14:08:13Z

No description provided.

mfisherlevine · 2020-05-05T14:26:09Z

python/lsst/meas/algorithms/defects.py

+
+        Parameters
+        ----------
+        region : `lsst.geom.Box2I


Missing closing backtick

mfisherlevine · 2020-05-05T14:28:07Z

python/lsst/meas/algorithms/defects.py

+        self.maskPixels(mi, maskName="BAD")
+        normalizedDefects = Defects.fromMask(mi, "BAD")
+
+        return normalizedDefects


I wonder, given that this is returning an object, whether it should be changed to be a class-method? Either than, or make it normalize the defects themselves.
But I think I'd like to defer to someone who knows more about both this object and good programming in general, so pinging @timj for his opinion on this.

It can't be a classmethod because you are using self.

@mfisherlevine and I were wondering if it should or should not be a class method? Does it matter in this case?

I don't understand. If it's a class method then you would have to pass in the defects you want to normalize as a parameter:

normalized = Defects.normalize(myDefects)

It seems more natural to me for you to be normalizing the defects you already have:

normalized = myDefects.normalize()

I'm okay with it return a new Defects rather than modifying the existing one. We should consider whether the defects should always be normalized when being saved using writeText or writeFits. That seems like a question that can only be answered from the science case -- maybe you do sometimes want a way to see how you constructed the defects.

It seems surprising to me that this returns a new object rather than modifying in place; compare (this implementation of) Defects.normalize() with e.g. list.sort().

If this is really the behaviour we want (why?), then consider:

Making this clear in the docstring. This is not just “recalculating defect bounding boxes”, it is “returning a normalized copy” (see the docstring for Defects.transpose()).

Personally, I'd call the method “Defects.normalized()” to help make the semantics clearer, but that's not a hard requirement (and may not be consistent with the existing transpose() API).

By my reading of the code (not having actually tried it), defects.normalize() != defects in general (the bounding boxes may have changed). If that's the case, I don't think you can automatically normalize on write, regardless of the science case: if would be super weird if defects.readFits gave me back something different from the data I'd just written.

I was torn about the mutability. I had assumed initially that normalize would change in place (so after the fromMask call you'd replace the bounding boxes in self with the new bounding boxes). People tend to prefer new objects be returned though. You make a good case that transpose returns a new object so there is precedent for that. The counter argument is that before and after calling normalize the effective defects are the same. You are right that transpose not being transposed makes prior art for this not being called normalized. As I said, I'm torn between new object and mutability (it's not like Defects are immutable though given that you can add new bounding boxes at any time).

Regarding defects != defects.normalize -- at some level they probably should compare equal. It depends on the semantics of what we mean by equality. In most cases "convert to the same pixel mask" is what we mean by equality and that's not what happens at the moment. We could make that be true by comparing the normalized versions.

I'm fine with write writing whatever it has without normalizing.

I'd be quite tempted to maintain it as an invariant that the defects are normalized, rather than exposing it to the user as part of the API. Perhaps with a special context for bulk-updates. Something like (excuse my toy API):

class Defects(object): def __init__(self): # ... self._bulk_mode = False def _normalize(self): # ... def append(self, defect): self._defectList.append(defect) if not self._bulk_mode: self._normalize() @contextmanager def bulk_mode(self): self._bulk_mode = True try: yield finally: self._bulk_mode = False self._normalize()

Perhaps that would be too disruptive a change at this point, though.

That's a really interesting idea. Having Defects always be normalized behind the scenes would make equality much more useful and the context manager stops it being slow in a loop (and we'd have to use the context manager when reading from file).

timj

Two main comments:

The Defects know their region of interest so they should work out that themselves.
There are no unit tests. At the very least create two Defects from a single rectangular region where you split it in different ways: normalize then check that the Defects compare equal.

timj · 2020-05-05T14:39:24Z

python/lsst/meas/algorithms/defects.py

+        self.maskPixels(mi, maskName="BAD")
+        normalizedDefects = Defects.fromMask(mi, "BAD")
+
+        return normalizedDefects


It can't be a classmethod because you are using self.

timj · 2020-05-05T14:39:44Z

python/lsst/meas/algorithms/defects.py

+        """
+        mi = lsst.afw.image.MaskedImageF(region)
+        self.maskPixels(mi, maskName="BAD")
+        normalizedDefects = Defects.fromMask(mi, "BAD")


return this directly. There is no need for the normalizedDefects variable.

python/lsst/meas/algorithms/defects.py

timj · 2020-05-06T22:36:42Z

python/lsst/meas/algorithms/defects.py

+
+        Returns
+        -------
+        normalizedDefects : `lsst.meas.algorithms.Defect`


Defects not Defect (and I don't think you need to qualify it with lsst.meas.algorithms

timj · 2020-05-06T22:38:11Z

tests/test_interp.py

+        newDefects = defects.normalize()
+        newDefects2 = defects2.normalize()
+
+        self.assertEqual(len(newDefects), len(newDefects2))


You just need to self.assertEqual(newDefects, newDefects2)

No need to compare length and the individual defects since __eq__ already does that for you.

Thanks, I'll change that. This would compare newDefects and newDefects2, but I think I should still leave the lines below where each set is compared to the list expectedDefects.

jdswinbank · 2020-05-07T17:20:44Z

python/lsst/meas/algorithms/defects.py

@@ -151,6 +151,33 @@ def __eq__(self, other):
    def __str__(self):
        return "Defects(" + ",".join(str(d.getBBox()) for d in self) + ")"

+    def normalize(self):
+        """Recalculate defects bounding boxes to use minimal set.


I'm not sure this is true.

Specifically, I don't think that the normalized bounding boxes are necessarily a minimal set.

You can check this yourself, based on DefectsTestCase.testLsstTextFile:

with lsst.utils.tests.getTempFilePath(".txt") as tmpFile: with open(tmpFile, "w") as fh: print("""# X0 Y0 width height 996 0 56 24 0 4156 2048 20 0 0 17 4176 1998 4035 50 141 1023 0 2 4176 2027 0 21 4176 0 4047 37 129 # Some rows without fixed column widths 14 20 2000 50 10 10 10 10 """, file=fh) defects = algorithms.Defects.readLsstDefectsFile(tmpFile)

At this point:

>>> len(defects) 9 >>> len(defects.normalize()) 11

Based on a quick visual inspection, I think the normalized set is equivalent to the inputs, but it's obviously not “minimal”.

Thoughts on whether this is actually a problem? I suspect the answer is “no”; we shouldn't assert that it is minimal, but it should be less pathological in most cases, and that's probably all we need. Right? Thoughts? @timj @plazas ?

That's very interesting. It implies that Defects.fromMask is not as clever as we thought it would be. It would be a fun exercise to plot the different bounding boxes to see how they differ but that should be a whole new ticket that looks into why fromMask is not optimal. We have no other option in the short term though, so tweaking the documentation is the best option -- it does imply that if we fix fromMask we would get a better but different normalization.

mfisherlevine

It's a lot of lines of code, but I can't actually find anything here to say I don't think! Looks great!

jdswinbank · 2020-05-13T18:57:55Z

python/lsst/meas/algorithms/defects.py

+    metadata : `lsst.daf.base.PropertyList`, optional
+        Metadata to associate with the defects.  Will be copied and
+        overwrite existing metadata.  If not supplied the existing
+        metadata will be reset.


I wrote (well, copied and pasted) this comment, but I'm not sure it makes sense — if I am constructing a new Defects, what existing metadata would be replaced?

Should I add "if any" to "Will be copied and overwrite existing metadata.", in case that we have a new Defects, as you say?

What “any” could there be if you're creating a new object? There is nothing that could be over-written.

This logic avoids the need to store or aggregate lists of values.

plazas requested a review from mfisherlevine May 5, 2020 14:08

mfisherlevine reviewed May 5, 2020

View reviewed changes

timj requested changes May 5, 2020

View reviewed changes

timj reviewed May 6, 2020

View reviewed changes

jdswinbank reviewed May 7, 2020

View reviewed changes

mfisherlevine approved these changes May 13, 2020

View reviewed changes

jdswinbank reviewed May 13, 2020

View reviewed changes

plazas and others added 14 commits May 13, 2020 14:48

Add normalize method for Defects

adc4503

Use a predefined mask plane

754e7ba

Have normalize() in Defects calculate region

282e874

Fix bug in normalize() method

009abb2

Move Defects.normalize() test from cp_pipe

5b63b99

Add more tests for Defects.normalize()

3aa12aa

Fix flake8 error

03b0031

Fix typos in docstring

3ce0269

Compare defects sets directly in test

73e7cd5

Remove incomplete comment.

24179f7

Simplifiy defect region finding.

c5669f9

This logic avoids the need to store or aggregate lists of values.

Hold defects in normalized form.

7e35765

Change 'normalize' to 'normalize_on_init'

a699647

Update doctring of Defects

5247d8c

plazas force-pushed the tickets/DM-24556 branch from e1481f6 to 5247d8c Compare May 13, 2020 19:51

plazas merged commit b97ba8b into master May 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-24556: Add normalize method to Defects #200

DM-24556: Add normalize method to Defects #200

plazas commented May 5, 2020

mfisherlevine May 5, 2020

mfisherlevine May 5, 2020

timj May 5, 2020

plazas May 5, 2020

timj May 5, 2020

jdswinbank May 5, 2020 •

edited

timj May 5, 2020

jdswinbank May 5, 2020 •

edited

timj May 5, 2020

timj left a comment

timj May 5, 2020

timj May 5, 2020

timj May 6, 2020

timj May 6, 2020

plazas May 6, 2020

jdswinbank May 7, 2020

timj May 7, 2020

jdswinbank May 7, 2020

mfisherlevine left a comment

jdswinbank May 13, 2020

plazas May 13, 2020

jdswinbank May 13, 2020

DM-24556: Add normalize method to Defects #200

DM-24556: Add normalize method to Defects #200

Conversation

plazas commented May 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdswinbank May 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdswinbank May 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfisherlevine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdswinbank May 5, 2020 •

edited

jdswinbank May 5, 2020 •

edited