New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-36797: Add RhoStatistics #47
Conversation
badcca1
to
3b9f385
Compare
# chi-type ellipticity has a shear response of 2, so we need to | ||
# divide by 2 so that the rho-stats do not depend on the | ||
# ellipticity-type. | ||
responsitivity = 2.0 if self.ellipticityType == "chi" else 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of responsitivity
is new and was not present in analysis_drp
. This ensures that the rho statistics always corresponds to the additive contamination to shear-shear correlation function regardless of the ellipticity type we choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only right in the limit that e is small. That's probably ok for this purpose, but the distortion responsivity is actually 2-<|e|^2>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The e
there is intrinsic ellipticity, isn't it? The point source is essentially round, so its intrinsic ellipticity is zero. Any way, this is also set to match what was used with the reGauss
shapes in the HSC weak lensing analysis code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's the PSF ellipticity. Distortion version, to be explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For HSC validation, the responsitivity
has been set to 2 exactly. Given that the next big validation run will be on HSC, I'd like to keep it this way for consistency. We can make it more accurate later on, but for LSST, we'd likely be using shear estimator, and not distortion, so it shouldn't matter much any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you want to validate that you get the same (slightly) wrong answer... OK, I guess. Wouldn't be my choice though. I'd calculate the ratio you expect to be different because of this as part of the validation, with comments saying why the HSC values are a little off.
3d7816c
to
d023606
Compare
bands = ListField[str]( | ||
doc="The bands to apply the flags in, takes precedence if band supplied in kwargs", | ||
default=["i"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what you want the default behaviour to be? If you want it to default to the band the plot is being made in then set this to [].
|
||
def setDefaults(self): | ||
self.selectWhenTrue = ["{band}_calib_psf_used"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically for this we have been using the generic flag selector with this option set, do we need a whole new selector? Are we imagining that there will be more columns needing to be set here? If so then it would be good to update all the other places that plot psf sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go with FlagSelector(selectWhenTrue=["{band}_calib_psf_used"])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would replace the entire psf selector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would replace the entire psf selector
for dataInfo in dataId: | ||
plotInfo[dataInfo.name] = dataId[dataInfo.name] | ||
|
||
return plotInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the plotInfo function in plotUtils not work for this? They look pretty much identical except that the standard one has more information in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't because inputs[connectionName]
is a list here, and it is so because this task reads in multiple datasets (of the same type, objectTable_tract
in this case) whereas every other task reads in only one dataset of a given type. Handling both cases in the base function seemed a bit ugly, given that running rho statistics is the only exception.
Also, the parsePlotInfo
in plotUtils.py seems unused. There's a duplicate implementation in tasks/base.py
which is the default one. We should remove the former perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought that this would come up more regularly, it might be good to generalise the plotUtils one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, generalize the parsePlotInfo
in utils, remove the method with the same name from tasks/base.py
and use the utility function both here and in tasks/base.py
? That sounds like a separate ticket to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DM-37419
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think not duplicating code unnesecarily is really another ticket. Having two of these will mean that some plots might end up displaying different things and the plot info generation will get out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already code duplication in the main
branch. The ticket is to deduplicate that. And this plotInfo
will be a bit different for this task because of multiple tracts in it, and even more so when DM-37330 lands. It is okay in the object-oriented paradigm for simple and short methods like these to be overridden for special cases, without worrying about partial code duplication. I strongly prefer that over the base method handling all possible cases. Any additions to the base method, that applies to this task as well, has to be added here as well. If we miss that, we will catch it in one of the metrics meetings and can quickly add it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reduced some duplication by refactoring the parsePlotInfo
in the base class. The only duplication is the if/else branch about inputs
being None
. I am not sure why we need that in the first place.
A separate config class is used instead | ||
of constructing a `~lsst.pex.config.DictField` so that mixed types can be | ||
supported and the config can be validated at the beginning of the | ||
execution. The ``toDict`` provides a minimal dictionary that override only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override -> overrides
of constructing a `~lsst.pex.config.DictField` so that mixed types can be | ||
supported and the config can be validated at the beginning of the | ||
execution. The ``toDict`` provides a minimal dictionary that override only | ||
the default values and excludes the key-values pairs when the item is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values -> value
"of the other three." | ||
), | ||
optional=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if three of them aren't set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming it gets through to TreeCorr, then TreeCorr would raise an exception in that case. Not sure if that's what you want here or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it wouldn't get to treecorr
because an exception would be raised here:
analysis_tools/python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py
Lines 323 to 330 in 041b14c
req_params = (self.nbins, self.bin_size, self.min_sep, self.max_sep) | |
num_req_params = sum(param is not None for param in req_params) | |
if num_req_params != 3: | |
msg = ( | |
"You must specify exactly three of ``nbins``, ``bin_size``, ``min_sep`` and ``max_sep``" | |
f" in treecorr_config. {num_req_params} parameters were set instead." | |
) | |
raise FieldValidationError(self.__class__.bin_size, self, msg) |
This duplicates what treecorr
does, but that is desirable because we want misconfiguration to be captured at config time, not at runtime, which is what would happen if we let treecorr
handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw the catch below, makes sense now.
) | ||
|
||
split_method = ChoiceField[str]( | ||
doc=("How to split the cells in the tree when building the tree " "structure."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd formatting, tree " "structure."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, black
seems to have converted two-line docs into single line ones (rightly so) but without removing the extra quotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is present in a couple of places.
msg = ( | ||
"You must specify exactly three of ``nbins``, ``bin_size``, ``min_sep`` and ``max_sep``" | ||
f" in treecorr_config. {num_req_params} parameters were set instead." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this answered my question, would it be useful though to have defaults set for the parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a reasonable defaults can be set that is agnostic to the data/problem at hand. I believe that is why treecorr
does not have defaults for these either. The default values for all other fields correspond to the default values of the corresponding quantities in treecorr
. So for the sake of consistency, I'd like to leave them without any defaults.
allowed={scale: scale for scale in ("linear", "log", "symlog")}, | ||
) | ||
|
||
xLinthresh = Field[float]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent camel case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xLinThresh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the others as well.
), | ||
default=None, | ||
optional=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you want num_threads to be settable either. Probably that should be set based on something in the DM configuration elsewhere, rather than let the user set it explicitly here, but I don't know enough to say what to recommend on that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed a bunch of fields incl. num_threads
, because it is easier to add fields and even backport them, than to remove fields.
|
||
.. math:: | ||
|
||
\rho_0(\theta) &= \left\langle \frac{\delta T_{PSF}}{T_{PSF}}(x) \frac{\delta T_{PSF}}{T_{PSF}}(x+\theta) \right\rangle # noqa: E501, W505 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rho0 is traditionally what people call <e*_psf e_psf>. So I think using this name for a different statistic is likely to be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rho6 then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any publication where rho0 is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I've seen it in anything published. But the SLAC people who do PSF work use that notation pretty regularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marco used it here: https://arxiv.org/pdf/2011.03408.pdf Figure 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quantity not involving any PSF residuals, I'm unhappy with the use of rho0
to mean ellipticity correlations. I could go with rho6
although I suspect that'd prove to be a bad choice as well. I could name it something like mu0
but that go lead this plot becoming undiscoverable :-|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just call it dT_auto or something more direct like that, rather than some code letter/number? Also, what's the intended use case for it? The mean <dT>
(one-point function) is relevant for WL systematics, but I don't think this combination is directly applicable to WL. Does LSS care about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto-correlation function of dT/T is what was called as rho3
in Melchior et al. (2015), which you denoted as rho'3
statistic in your 2016 paper. The reason why it is interesting here, not necessarily from a scientific standpoint, but from a pipelines standpoint is that, for reasons not clear to us, we see a significant offset in PSF size on coadds. This serves as a good diagnostic tool to see how our planned changes to PSF coaddition improves the situation. This complements our other tools involving scatter plots and other one-point calculations from it.
Perhaps, I'll just follow your notation call it rho3prime
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. Peter and I disagreed at first about how dT errors propagate to xi+. Originally he thought his rho3 * <|e|^2>
was the right systematic error on xi+. But I convinced him that the e terms couldn't be brought out of the angle brackets. They needed to stay as what I ended up calling rho3. <e dT/T, e dT/T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, <e dT/T, e dT/T>
make sense scientifically. For pipeline validation, the uncorrelated nature of e
on large scales suppresses the problem that rho3prime
or rho3alt
(what I had called rho0
) would show. That's why I want to have this statistic, at least until we are confident we have resolved the issue with PSF coaddition.
\rho_5(\theta) &= \left\langle (e^*_{PSF}(x) (e_{PSF}\frac{\delta T_{PSF}}{T_{PSF}}(x+\theta)) \right\rangle # noqa: E501, W505 | ||
|
||
The definition of ellipticity used in [1]_ correspond to ``epsilon``-type ellipticity, which is typically | ||
smaller by a factor of 4 than using ``chi``-type ellipticity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epsilon and chi are not standard names for what you're referring to here. The more typical names are shear vs distortion. epsilon is sometimes used to refer to shear, and chi is sometimes distortion. But not by any stretch universally. So you should at least give the names in addition to (or instead of) the Greek letters.
doc="The type of ellipticity to calculate", | ||
allowed={ | ||
"chi": "Distortion, defined as (Ixx - Iyy)/(Ixx + Iyy)", | ||
"epsilon": ("Shear, defined as (Ixx - Iyy)/" "(Ixx + Iyy + 2*sqrt(Ixx*Iyy - Ixy**2))"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not actually the "definition" of shear. That's just one way to estimate shear. Shear is defined by the elements in the 2x2 transformation matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as per comment above, I'd prefer the "type" names be distortion and shear, rather than semi-arbitrary Greek letters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to defer this to another ticket (DM-37325) and hence to another PR, since this notation is present elsewhere in the code as well. Adding a TODO comment for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding a vote in for understandable names not random greek letters. Remember that non WL people will see this, like me, and it would be great if it was understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the expression for each statistic in the y-axis label.
# chi-type ellipticity has a shear response of 2, so we need to | ||
# divide by 2 so that the rho-stats do not depend on the | ||
# ellipticity-type. | ||
responsitivity = 2.0 if self.ellipticityType == "chi" else 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only right in the limit that e is small. That's probably ok for this purpose, but the distortion responsivity is actually 2-<|e|^2>.
# Calculate the cross-correlation | ||
xy.process(catA, catB) | ||
|
||
_LOG.debug("Correlated %d pairs out of %d possible pairs", xy.npairs, len(ra) * (len(ra) - 1) / 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to give an error, although maybe it's just a logging warning. xy.npairs
is an array, not an int. It's the number of pairs for each bin of the output correlation. It's also a bit weird to report this as being how many were correlated out of some total possible. The ones that weren't correlated were at different separations, so were they "possible"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. I meant to put in sum(xy.npairs)
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total number here is to give some indication about whether we have explore sufficient range in the scales or not. As we plan to run this on a regular cadence as we collect data, the angular scales that we set might evolve and this is intended to guide that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the total of npairs, either as an absolute number or as compared to N*(N-1)/2, would say much about how well you're covering scales though. Maybe S/N of the first and last bin? If the S/N of either gets largish, it could mean that it would be useful to extend the range.
self.process.calculateActions.stars.treecorr.min_sep = 0.1 | ||
self.process.calculateActions.stars.treecorr.max_sep = 100.0 | ||
self.process.calculateActions.stars.treecorr.sep_units = "arcmin" | ||
self.process.calculateActions.stars.treecorr.metric = "Arc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use Arc. It's just slower and doesn't add anything useful compared to the default Euclidean.
cc10c09
to
d56b045
Compare
d56b045
to
6791aa1
Compare
507775f
to
5c84f97
Compare
7735201
to
df85018
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I believe I have addressed all the comments, or created TODO ticket items to work on. This is certainly a minimum viable thing that I'd like to get this in the weekly that gets tagged tonight. @rmjarvis do I have your approval?
for dataInfo in dataId: | ||
plotInfo[dataInfo.name] = dataId[dataInfo.name] | ||
|
||
return plotInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reduced some duplication by refactoring the parsePlotInfo
in the base class. The only duplication is the if/else branch about inputs
being None
. I am not sure why we need that in the first place.
doc="The type of ellipticity to calculate", | ||
allowed={ | ||
"chi": "Distortion, defined as (Ixx - Iyy)/(Ixx + Iyy)", | ||
"epsilon": ("Shear, defined as (Ixx - Iyy)/" "(Ixx + Iyy + 2*sqrt(Ixx*Iyy - Ixy**2))"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the expression for each statistic in the y-axis label.
Sure. Looks fine to me. |
df85018
to
280a136
Compare
`addPlotInfo` does not handle `plotInfo` that is None. Fix type hints to reflect that, and in `makePlot` methods that call `addPlotInfo` without checking if `plotInfo` is not None.
280a136
to
a2eea8b
Compare
This adds the plots to pipelines/coaddQualityCore and tests/testPlots pipelines.
fc334e9
to
1ce733d
Compare
No description provided.