New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-21380: Add a galaxy photometric repeatability metric to validate_drp #107
Conversation
These are virtually identical to PA1 but deliberately kept separate in case of future divergence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some scattered comments.
I think the major is asking whether some of the functions can be generalized instead of basically writing an almost-copy for the galaxy work.
The secondary is a request to clearly explain what's going on with the different bins.
config/hscConfigGalaxies.py
Outdated
# these commissioning data do not have the correct header info to apply the stray light correction | ||
config.processCcd.isr.doStrayLight = False | ||
# Run CModel | ||
config.processCcd.calibrate.measurement.plugins.names |= [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to just add these to the overall configs. HSC, DECam, and CFHT.
But also do so in the lsst_ci
repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running CModel significantly increases the processing time (by an extra 50% for the Hsc example) - are we okay with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. I remember the discussion now.
Yes, I think that's fine. But check with @SimonKrughoff .
For the CI cases, it still seems like it's the build time that dominates. But @SimonKrughoff is responsible for running these things.
@@ -190,6 +194,17 @@ def _loadAndMatchCatalogs(repo, dataIds, matchRadius, | |||
'PSF magnitude')) | |||
mapper.addOutputField(Field[float]('base_PsfFlux_magErr', | |||
'PSF magnitude uncertainty')) | |||
if not skipGalaxies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you re-phrase this as an interactive check to see if the necessary columns are there in the data?
safeSnr : `float`, optional | ||
Minimum median SNR for a match to be considered "safe". | ||
goodSnr : float, optional | ||
Minimum median SNR for a match to be considered "good"; default 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think goodSnr should default to at least 5. There are a lot of 3-sigma detections in ~billion pixels.
safeMatches = goodMatches.where(safeFilter) | ||
def snrFilter(cat): | ||
# Note that this also implicitly checks for psfSnr being non-nan. | ||
snr = np.median(cat.get(snrKey)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're worried about SNR potentially being NaN, then you should calculate the median with np.nanmedian
.
np.median
does what you expect with Inf, because Inf is sortable. But if there is a NaN in the array np.median
will return NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
a = np.array([1])
b = np.array([0])
a/b
is np.array([Inf])
b/b
is np.array([NaN])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was already in the code but I accidentally shifted it up one line; it was referring to the comparison to snrMin/Max. So the existing behavior is to get a NaN median if any of the SNRs are NaN and hence return False. Do we want to tolerate some fraction of the SNRs being NaN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think it's fine to not expect NaNs and to not tolerate any.
The executed code is fine. The comment likely just needs to be moved.
python/lsst/validate/drp/validate.py
Outdated
@@ -289,6 +290,8 @@ def runOneFilter(repo, visitDataIds, metrics, brightSnr=100, | |||
PsfShape measurements). | |||
verbose : bool, optional | |||
Output additional information on the analysis steps. | |||
skipGalaxies : bool, optional | |||
Whether to skip processing galaxies even if model measurements are available; default False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to support this option. If we can calculate them we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can since there should always be a slot_modelFlux
, but if CModel hasn't been run then that defaults to base_GaussianFlux
, which isn't especially useful.
np.random.shuffle(copy) | ||
return copy[0] - copy[1] | ||
a, b = random.sample(range(len(array)), 2) | ||
return array[a] - array[b] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good improvement.
Perhaps even simpler:
return array[a] - array[b] | |
a, b = random.sample(array, 2) | |
return a - b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, that only works on sets or sequences, not numpy arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. I hadn't fully thought that through.
@@ -270,8 +273,7 @@ def getRandomDiffRmsInMmags(array): | |||
>>> print(rms) | |||
212.132034 | |||
""" | |||
# For scalars, math.sqrt is several times faster than numpy.sqrt. | |||
return (1000/math.sqrt(2)) * getRandomDiff(array) | |||
return thousandDivSqrtTwo * getRandomDiff(array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much does this optimization matter?
bin.src/validateDrp.py
Outdated
@@ -73,6 +73,8 @@ | |||
help='Skip making plots of performance.') | |||
parser.add_argument('--level', type=str, default='design', | |||
help='Level of SRD requirement to meet: "minimum", "design", "stretch"') | |||
parser.add_argument('--skipGalaxies', dest='skipGalaxies', default=False, action='store_true', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--skipGalaxies
is likely ambiguous if you're just thinking about the KPMs. You might be like "Well, duh, of course I don't want galaxies in my stellar photometry KPMs".
Is it actually necessary to have this option? When computing things in {{calcnonsrc}}, can you just log a simple message that these metrics aren't available and then skip them?
python/lsst/validate/drp/validate.py
Outdated
_reduceSources(matchedDataset, matchedDataset._matchedCatalog, extended=source == 'Gal', | ||
nameFluxKey=name_flux, goodSnr=snr, goodSnrMax=snr*2, | ||
safeSnr=snr*2, safeSnrMax=snr*4) | ||
for bin_offset in range(2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
for bin_offset in [0, 1]:
would be more clear.
python/lsst/validate/drp/validate.py
Outdated
name_flux_all.append('slot_ModelFlux') | ||
for name_flux in name_flux_all: | ||
key_model_mag = matchedDataset._matchedCatalog.schema.find(f"{name_flux}_mag").key | ||
bin_base = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert of paragraph explaining about the binning. This is pretty different from the way other metrics are calculated so deserves a bit of a special call out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Two minor comments.
I haven't run this latest version. But I trust that you have.
@@ -2,3 +2,7 @@ | |||
config.processCcd.isr.doAttachTransmissionCurve = False | |||
# these commissioning data do not have the correct header info to apply the stray light correction | |||
config.processCcd.isr.doStrayLight = False | |||
# Run meas_modelfit to compute CModel fluxes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you explicitly
import lsst.meas.modelfit
here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will just in case; obs_subaru just happens to be configured to load it by default (in config/*.py).
@@ -190,6 +194,17 @@ def _loadAndMatchCatalogs(repo, dataIds, matchRadius, | |||
'PSF magnitude')) | |||
mapper.addOutputField(Field[float]('base_PsfFlux_magErr', | |||
'PSF magnitude uncertainty')) | |||
if not skipNonSrd: | |||
# Needed because addOutputField(... 'slot_ModelFlux_mag') will add a field with that literal name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" Need to check for column existence through an alias because addOutputField(...
..."
f468359
to
a28823f
Compare
Add/modify code to compute metrics defined in verify_metrics