Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-21380: Add a galaxy photometric repeatability metric to validate_drp #107

Merged
merged 11 commits into from Dec 2, 2019

Conversation

taranu
Copy link
Contributor

@taranu taranu commented Oct 24, 2019

Add/modify code to compute metrics defined in verify_metrics

Copy link
Contributor

@wmwv wmwv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some scattered comments.

I think the major is asking whether some of the functions can be generalized instead of basically writing an almost-copy for the galaxy work.
The secondary is a request to clearly explain what's going on with the different bins.

# these commissioning data do not have the correct header info to apply the stray light correction
config.processCcd.isr.doStrayLight = False
# Run CModel
config.processCcd.calibrate.measurement.plugins.names |= [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just add these to the overall configs. HSC, DECam, and CFHT.
But also do so in the lsst_ci repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running CModel significantly increases the processing time (by an extra 50% for the Hsc example) - are we okay with that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I remember the discussion now.

Yes, I think that's fine. But check with @SimonKrughoff .

For the CI cases, it still seems like it's the build time that dominates. But @SimonKrughoff is responsible for running these things.

@@ -190,6 +194,17 @@ def _loadAndMatchCatalogs(repo, dataIds, matchRadius,
'PSF magnitude'))
mapper.addOutputField(Field[float]('base_PsfFlux_magErr',
'PSF magnitude uncertainty'))
if not skipGalaxies:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-phrase this as an interactive check to see if the necessary columns are there in the data?

safeSnr : `float`, optional
Minimum median SNR for a match to be considered "safe".
goodSnr : float, optional
Minimum median SNR for a match to be considered "good"; default 3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think goodSnr should default to at least 5. There are a lot of 3-sigma detections in ~billion pixels.

safeMatches = goodMatches.where(safeFilter)
def snrFilter(cat):
# Note that this also implicitly checks for psfSnr being non-nan.
snr = np.median(cat.get(snrKey))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're worried about SNR potentially being NaN, then you should calculate the median with np.nanmedian.
np.median does what you expect with Inf, because Inf is sortable. But if there is a NaN in the array np.median will return NaN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If

a = np.array([1])
b = np.array([0])

a/b is np.array([Inf])
b/b is np.array([NaN])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was already in the code but I accidentally shifted it up one line; it was referring to the comparison to snrMin/Max. So the existing behavior is to get a NaN median if any of the SNRs are NaN and hence return False. Do we want to tolerate some fraction of the SNRs being NaN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think it's fine to not expect NaNs and to not tolerate any.

The executed code is fine. The comment likely just needs to be moved.

@@ -289,6 +290,8 @@ def runOneFilter(repo, visitDataIds, metrics, brightSnr=100,
PsfShape measurements).
verbose : bool, optional
Output additional information on the analysis steps.
skipGalaxies : bool, optional
Whether to skip processing galaxies even if model measurements are available; default False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to support this option. If we can calculate them we should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can since there should always be a slot_modelFlux, but if CModel hasn't been run then that defaults to base_GaussianFlux, which isn't especially useful.

np.random.shuffle(copy)
return copy[0] - copy[1]
a, b = random.sample(range(len(array)), 2)
return array[a] - array[b]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement.

Perhaps even simpler:

Suggested change
return array[a] - array[b]
a, b = random.sample(array, 2)
return a - b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, that only works on sets or sequences, not numpy arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I hadn't fully thought that through.

@@ -270,8 +273,7 @@ def getRandomDiffRmsInMmags(array):
>>> print(rms)
212.132034
"""
# For scalars, math.sqrt is several times faster than numpy.sqrt.
return (1000/math.sqrt(2)) * getRandomDiff(array)
return thousandDivSqrtTwo * getRandomDiff(array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this optimization matter?

@@ -73,6 +73,8 @@
help='Skip making plots of performance.')
parser.add_argument('--level', type=str, default='design',
help='Level of SRD requirement to meet: "minimum", "design", "stretch"')
parser.add_argument('--skipGalaxies', dest='skipGalaxies', default=False, action='store_true',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--skipGalaxies is likely ambiguous if you're just thinking about the KPMs. You might be like "Well, duh, of course I don't want galaxies in my stellar photometry KPMs".

Is it actually necessary to have this option? When computing things in {{calcnonsrc}}, can you just log a simple message that these metrics aren't available and then skip them?

_reduceSources(matchedDataset, matchedDataset._matchedCatalog, extended=source == 'Gal',
nameFluxKey=name_flux, goodSnr=snr, goodSnrMax=snr*2,
safeSnr=snr*2, safeSnrMax=snr*4)
for bin_offset in range(2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

for bin_offset in [0, 1]:

would be more clear.

name_flux_all.append('slot_ModelFlux')
for name_flux in name_flux_all:
key_model_mag = matchedDataset._matchedCatalog.schema.find(f"{name_flux}_mag").key
bin_base = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert of paragraph explaining about the binning. This is pretty different from the way other metrics are calculated so deserves a bit of a special call out.

Copy link
Contributor

@wmwv wmwv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Two minor comments.
I haven't run this latest version. But I trust that you have.

@@ -2,3 +2,7 @@
config.processCcd.isr.doAttachTransmissionCurve = False
# these commissioning data do not have the correct header info to apply the stray light correction
config.processCcd.isr.doStrayLight = False
# Run meas_modelfit to compute CModel fluxes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you explicitly

import lsst.meas.modelfit

here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will just in case; obs_subaru just happens to be configured to load it by default (in config/*.py).

@@ -190,6 +194,17 @@ def _loadAndMatchCatalogs(repo, dataIds, matchRadius,
'PSF magnitude'))
mapper.addOutputField(Field[float]('base_PsfFlux_magErr',
'PSF magnitude uncertainty'))
if not skipNonSrd:
# Needed because addOutputField(... 'slot_ModelFlux_mag') will add a field with that literal name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" Need to check for column existence through an alias because addOutputField(... ..."

@taranu taranu merged commit a28823f into master Dec 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants