Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-24556: Add normalize method to Defects #200

Merged
merged 14 commits into from May 13, 2020
Merged

DM-24556: Add normalize method to Defects #200

merged 14 commits into from May 13, 2020

Conversation

plazas
Copy link
Contributor

@plazas plazas commented May 5, 2020

No description provided.

@plazas plazas requested a review from mfisherlevine May 5, 2020 14:08

Parameters
----------
region : `lsst.geom.Box2I
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing backtick

self.maskPixels(mi, maskName="BAD")
normalizedDefects = Defects.fromMask(mi, "BAD")

return normalizedDefects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, given that this is returning an object, whether it should be changed to be a class-method? Either than, or make it normalize the defects themselves.
But I think I'd like to defer to someone who knows more about both this object and good programming in general, so pinging @timj for his opinion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be a classmethod because you are using self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfisherlevine and I were wondering if it should or should not be a class method? Does it matter in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. If it's a class method then you would have to pass in the defects you want to normalize as a parameter:

normalized = Defects.normalize(myDefects)

It seems more natural to me for you to be normalizing the defects you already have:

normalized = myDefects.normalize()

I'm okay with it return a new Defects rather than modifying the existing one. We should consider whether the defects should always be normalized when being saved using writeText or writeFits. That seems like a question that can only be answered from the science case -- maybe you do sometimes want a way to see how you constructed the defects.

Copy link
Contributor

@jdswinbank jdswinbank May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems surprising to me that this returns a new object rather than modifying in place; compare (this implementation of) Defects.normalize() with e.g. list.sort().

If this is really the behaviour we want (why?), then consider:

  • Making this clear in the docstring. This is not just “recalculating defect bounding boxes”, it is “returning a normalized copy” (see the docstring for Defects.transpose()).
  • Personally, I'd call the method “Defects.normalized()” to help make the semantics clearer, but that's not a hard requirement (and may not be consistent with the existing transpose() API).

By my reading of the code (not having actually tried it), defects.normalize() != defects in general (the bounding boxes may have changed). If that's the case, I don't think you can automatically normalize on write, regardless of the science case: if would be super weird if defects.readFits gave me back something different from the data I'd just written.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was torn about the mutability. I had assumed initially that normalize would change in place (so after the fromMask call you'd replace the bounding boxes in self with the new bounding boxes). People tend to prefer new objects be returned though. You make a good case that transpose returns a new object so there is precedent for that. The counter argument is that before and after calling normalize the effective defects are the same. You are right that transpose not being transposed makes prior art for this not being called normalized. As I said, I'm torn between new object and mutability (it's not like Defects are immutable though given that you can add new bounding boxes at any time).

Regarding defects != defects.normalize -- at some level they probably should compare equal. It depends on the semantics of what we mean by equality. In most cases "convert to the same pixel mask" is what we mean by equality and that's not what happens at the moment. We could make that be true by comparing the normalized versions.

I'm fine with write writing whatever it has without normalizing.

Copy link
Contributor

@jdswinbank jdswinbank May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be quite tempted to maintain it as an invariant that the defects are normalized, rather than exposing it to the user as part of the API. Perhaps with a special context for bulk-updates. Something like (excuse my toy API):

class Defects(object):
    def __init__(self):
        # ...
        self._bulk_mode = False

    def _normalize(self):
        # ...

    def append(self, defect):
        self._defectList.append(defect)
        if not self._bulk_mode:
            self._normalize()

    @contextmanager
    def bulk_mode(self):
        self._bulk_mode = True
        try:
            yield
        finally:
            self._bulk_mode = False
            self._normalize()

Perhaps that would be too disruptive a change at this point, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really interesting idea. Having Defects always be normalized behind the scenes would make equality much more useful and the context manager stops it being slow in a loop (and we'd have to use the context manager when reading from file).

Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main comments:

  • The Defects know their region of interest so they should work out that themselves.
  • There are no unit tests. At the very least create two Defects from a single rectangular region where you split it in different ways: normalize then check that the Defects compare equal.

self.maskPixels(mi, maskName="BAD")
normalizedDefects = Defects.fromMask(mi, "BAD")

return normalizedDefects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be a classmethod because you are using self.

"""
mi = lsst.afw.image.MaskedImageF(region)
self.maskPixels(mi, maskName="BAD")
normalizedDefects = Defects.fromMask(mi, "BAD")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return this directly. There is no need for the normalizedDefects variable.

python/lsst/meas/algorithms/defects.py Outdated Show resolved Hide resolved

Returns
-------
normalizedDefects : `lsst.meas.algorithms.Defect`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defects not Defect (and I don't think you need to qualify it with lsst.meas.algorithms

newDefects = defects.normalize()
newDefects2 = defects2.normalize()

self.assertEqual(len(newDefects), len(newDefects2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need to self.assertEqual(newDefects, newDefects2)

No need to compare length and the individual defects since __eq__ already does that for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll change that. This would compare newDefects and newDefects2, but I think I should still leave the lines below where each set is compared to the list expectedDefects.

@@ -151,6 +151,33 @@ def __eq__(self, other):
def __str__(self):
return "Defects(" + ",".join(str(d.getBBox()) for d in self) + ")"

def normalize(self):
"""Recalculate defects bounding boxes to use minimal set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is true.

Specifically, I don't think that the normalized bounding boxes are necessarily a minimal set.

You can check this yourself, based on DefectsTestCase.testLsstTextFile:

        with lsst.utils.tests.getTempFilePath(".txt") as tmpFile:
            with open(tmpFile, "w") as fh:
                print("""# X0  Y0  width height
     996        0       56       24
       0     4156     2048       20
       0        0       17     4176
    1998     4035       50      141
    1023        0        2     4176
    2027        0       21     4176
       0     4047       37      129
# Some rows without fixed column widths
14 20 2000 50
10 10 10 10
""", file=fh)

            defects = algorithms.Defects.readLsstDefectsFile(tmpFile)

At this point:

>>> len(defects)
9
>>> len(defects.normalize())
11

Based on a quick visual inspection, I think the normalized set is equivalent to the inputs, but it's obviously not “minimal”.

Thoughts on whether this is actually a problem? I suspect the answer is “no”; we shouldn't assert that it is minimal, but it should be less pathological in most cases, and that's probably all we need. Right? Thoughts? @timj @plazas ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very interesting. It implies that Defects.fromMask is not as clever as we thought it would be. It would be a fun exercise to plot the different bounding boxes to see how they differ but that should be a whole new ticket that looks into why fromMask is not optimal. We have no other option in the short term though, so tweaking the documentation is the best option -- it does imply that if we fix fromMask we would get a better but different normalization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DM-24781.

Copy link
Contributor

@mfisherlevine mfisherlevine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lot of lines of code, but I can't actually find anything here to say I don't think! Looks great!

metadata : `lsst.daf.base.PropertyList`, optional
Metadata to associate with the defects. Will be copied and
overwrite existing metadata. If not supplied the existing
metadata will be reset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote (well, copied and pasted) this comment, but I'm not sure it makes sense — if I am constructing a new Defects, what existing metadata would be replaced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add "if any" to "Will be copied and overwrite existing metadata.", in case that we have a new Defects, as you say?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What “any” could there be if you're creating a new object? There is nothing that could be over-written.

@plazas plazas merged commit b97ba8b into master May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants