New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-24556: Add normalize method to Defects #200
Conversation
|
||
Parameters | ||
---------- | ||
region : `lsst.geom.Box2I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing closing backtick
self.maskPixels(mi, maskName="BAD") | ||
normalizedDefects = Defects.fromMask(mi, "BAD") | ||
|
||
return normalizedDefects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, given that this is returning an object, whether it should be changed to be a class-method? Either than, or make it normalize the defects themselves.
But I think I'd like to defer to someone who knows more about both this object and good programming in general, so pinging @timj for his opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be a classmethod because you are using self
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfisherlevine and I were wondering if it should or should not be a class method? Does it matter in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. If it's a class method then you would have to pass in the defects you want to normalize as a parameter:
normalized = Defects.normalize(myDefects)
It seems more natural to me for you to be normalizing the defects you already have:
normalized = myDefects.normalize()
I'm okay with it return a new Defects
rather than modifying the existing one. We should consider whether the defects should always be normalized when being saved using writeText or writeFits. That seems like a question that can only be answered from the science case -- maybe you do sometimes want a way to see how you constructed the defects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems surprising to me that this returns a new object rather than modifying in place; compare (this implementation of) Defects.normalize()
with e.g. list.sort()
.
If this is really the behaviour we want (why?), then consider:
- Making this clear in the docstring. This is not just “recalculating defect bounding boxes”, it is “returning a normalized copy” (see the docstring for
Defects.transpose()
). - Personally, I'd call the method “Defects.normalized()” to help make the semantics clearer, but that's not a hard requirement (and may not be consistent with the existing
transpose()
API).
By my reading of the code (not having actually tried it), defects.normalize() != defects
in general (the bounding boxes may have changed). If that's the case, I don't think you can automatically normalize on write, regardless of the science case: if would be super weird if defects.readFits
gave me back something different from the data I'd just written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was torn about the mutability. I had assumed initially that normalize
would change in place (so after the fromMask call you'd replace the bounding boxes in self
with the new bounding boxes). People tend to prefer new objects be returned though. You make a good case that transpose
returns a new object so there is precedent for that. The counter argument is that before and after calling normalize the effective defects are the same. You are right that transpose
not being transposed
makes prior art for this not being called normalized
. As I said, I'm torn between new object and mutability (it's not like Defects
are immutable though given that you can add new bounding boxes at any time).
Regarding defects != defects.normalize -- at some level they probably should compare equal. It depends on the semantics of what we mean by equality. In most cases "convert to the same pixel mask" is what we mean by equality and that's not what happens at the moment. We could make that be true by comparing the normalized versions.
I'm fine with write writing whatever it has without normalizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be quite tempted to maintain it as an invariant that the defects are normalized, rather than exposing it to the user as part of the API. Perhaps with a special context for bulk-updates. Something like (excuse my toy API):
class Defects(object):
def __init__(self):
# ...
self._bulk_mode = False
def _normalize(self):
# ...
def append(self, defect):
self._defectList.append(defect)
if not self._bulk_mode:
self._normalize()
@contextmanager
def bulk_mode(self):
self._bulk_mode = True
try:
yield
finally:
self._bulk_mode = False
self._normalize()
Perhaps that would be too disruptive a change at this point, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really interesting idea. Having Defects always be normalized behind the scenes would make equality much more useful and the context manager stops it being slow in a loop (and we'd have to use the context manager when reading from file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two main comments:
- The Defects know their region of interest so they should work out that themselves.
- There are no unit tests. At the very least create two
Defects
from a single rectangular region where you split it in different ways: normalize then check that the Defects compare equal.
self.maskPixels(mi, maskName="BAD") | ||
normalizedDefects = Defects.fromMask(mi, "BAD") | ||
|
||
return normalizedDefects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be a classmethod because you are using self
.
""" | ||
mi = lsst.afw.image.MaskedImageF(region) | ||
self.maskPixels(mi, maskName="BAD") | ||
normalizedDefects = Defects.fromMask(mi, "BAD") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return this directly. There is no need for the normalizedDefects
variable.
|
||
Returns | ||
------- | ||
normalizedDefects : `lsst.meas.algorithms.Defect` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defects
not Defect
(and I don't think you need to qualify it with lsst.meas.algorithms
tests/test_interp.py
Outdated
newDefects = defects.normalize() | ||
newDefects2 = defects2.normalize() | ||
|
||
self.assertEqual(len(newDefects), len(newDefects2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just need to self.assertEqual(newDefects, newDefects2)
No need to compare length and the individual defects since __eq__
already does that for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll change that. This would compare newDefects
and newDefects2
, but I think I should still leave the lines below where each set is compared to the list expectedDefects
.
@@ -151,6 +151,33 @@ def __eq__(self, other): | |||
def __str__(self): | |||
return "Defects(" + ",".join(str(d.getBBox()) for d in self) + ")" | |||
|
|||
def normalize(self): | |||
"""Recalculate defects bounding boxes to use minimal set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is true.
Specifically, I don't think that the normalized bounding boxes are necessarily a minimal set.
You can check this yourself, based on DefectsTestCase.testLsstTextFile
:
with lsst.utils.tests.getTempFilePath(".txt") as tmpFile:
with open(tmpFile, "w") as fh:
print("""# X0 Y0 width height
996 0 56 24
0 4156 2048 20
0 0 17 4176
1998 4035 50 141
1023 0 2 4176
2027 0 21 4176
0 4047 37 129
# Some rows without fixed column widths
14 20 2000 50
10 10 10 10
""", file=fh)
defects = algorithms.Defects.readLsstDefectsFile(tmpFile)
At this point:
>>> len(defects)
9
>>> len(defects.normalize())
11
Based on a quick visual inspection, I think the normalized set is equivalent to the inputs, but it's obviously not “minimal”.
Thoughts on whether this is actually a problem? I suspect the answer is “no”; we shouldn't assert that it is minimal, but it should be less pathological in most cases, and that's probably all we need. Right? Thoughts? @timj @plazas ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very interesting. It implies that Defects.fromMask
is not as clever as we thought it would be. It would be a fun exercise to plot the different bounding boxes to see how they differ but that should be a whole new ticket that looks into why fromMask
is not optimal. We have no other option in the short term though, so tweaking the documentation is the best option -- it does imply that if we fix fromMask we would get a better but different normalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DM-24781.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lot of lines of code, but I can't actually find anything here to say I don't think! Looks great!
metadata : `lsst.daf.base.PropertyList`, optional | ||
Metadata to associate with the defects. Will be copied and | ||
overwrite existing metadata. If not supplied the existing | ||
metadata will be reset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote (well, copied and pasted) this comment, but I'm not sure it makes sense — if I am constructing a new Defects
, what existing metadata would be replaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add "if any" to "Will be copied and overwrite existing metadata.", in case that we have a new Defects
, as you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What “any” could there be if you're creating a new object? There is nothing that could be over-written.
This logic avoids the need to store or aggregate lists of values.
No description provided.