DM-36797: Add RhoStatistics #47

arunkannawadi · 2022-12-02T13:19:14Z

No description provided.

arunkannawadi · 2022-12-06T16:19:22Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+        # chi-type ellipticity has a shear response of 2, so we need to
+        # divide by 2 so that the rho-stats do not depend on the
+        # ellipticity-type.
+        responsitivity = 2.0 if self.ellipticityType == "chi" else 1.0


The introduction of responsitivity is new and was not present in analysis_drp. This ensures that the rho statistics always corresponds to the additive contamination to shear-shear correlation function regardless of the ellipticity type we choose.

This is only right in the limit that e is small. That's probably ok for this purpose, but the distortion responsivity is actually 2-<|e|^2>.

The e there is intrinsic ellipticity, isn't it? The point source is essentially round, so its intrinsic ellipticity is zero. Any way, this is also set to match what was used with the reGauss shapes in the HSC weak lensing analysis code.

No, it's the PSF ellipticity. Distortion version, to be explicit.

For HSC validation, the responsitivity has been set to 2 exactly. Given that the next big validation run will be on HSC, I'd like to keep it this way for consistency. We can make it more accurate later on, but for LSST, we'd likely be using shear estimator, and not distortion, so it shouldn't matter much any way.

So, you want to validate that you get the same (slightly) wrong answer... OK, I guess. Wouldn't be my choice though. I'd calculate the ratio you expect to be different because of this as part of the validation, with comments saying why the HSC values are a little off.

sr525 · 2022-12-12T16:48:17Z

python/lsst/analysis/tools/actions/vector/selectors.py

+    bands = ListField[str](
+        doc="The bands to apply the flags in, takes precedence if band supplied in kwargs",
+        default=["i"],
+    )


Is this what you want the default behaviour to be? If you want it to default to the band the plot is being made in then set this to [].

sr525 · 2022-12-12T16:50:37Z

python/lsst/analysis/tools/actions/vector/selectors.py

+
+    def setDefaults(self):
+        self.selectWhenTrue = ["{band}_calib_psf_used"]
+


Typically for this we have been using the generic flag selector with this option set, do we need a whole new selector? Are we imagining that there will be more columns needing to be set here? If so then it would be good to update all the other places that plot psf sources.

I could go with FlagSelector(selectWhenTrue=["{band}_calib_psf_used"])

It would replace the entire psf selector

sr525 · 2022-12-12T17:02:19Z

python/lsst/analysis/tools/tasks/objectTableSurveyAnalysis.py

+            for dataInfo in dataId:
+                plotInfo[dataInfo.name] = dataId[dataInfo.name]
+
+        return plotInfo


Does the plotInfo function in plotUtils not work for this? They look pretty much identical except that the standard one has more information in it.

It won't because inputs[connectionName] is a list here, and it is so because this task reads in multiple datasets (of the same type, objectTable_tract in this case) whereas every other task reads in only one dataset of a given type. Handling both cases in the base function seemed a bit ugly, given that running rho statistics is the only exception.

Also, the parsePlotInfo in plotUtils.py seems unused. There's a duplicate implementation in tasks/base.py which is the default one. We should remove the former perhaps?

I would have thought that this would come up more regularly, it might be good to generalise the plotUtils one.

So, generalize the parsePlotInfo in utils, remove the method with the same name from tasks/base.py and use the utility function both here and in tasks/base.py? That sounds like a separate ticket to me.

I don't think not duplicating code unnesecarily is really another ticket. Having two of these will mean that some plots might end up displaying different things and the plot info generation will get out of sync.

There is already code duplication in the main branch. The ticket is to deduplicate that. And this plotInfo will be a bit different for this task because of multiple tracts in it, and even more so when DM-37330 lands. It is okay in the object-oriented paradigm for simple and short methods like these to be overridden for special cases, without worrying about partial code duplication. I strongly prefer that over the base method handling all possible cases. Any additions to the base method, that applies to this task as well, has to be added here as well. If we miss that, we will catch it in one of the metrics meetings and can quickly add it here as well.

I've reduced some duplication by refactoring the parsePlotInfo in the base class. The only duplication is the if/else branch about inputs being None. I am not sure why we need that in the first place.

sr525 · 2022-12-12T17:04:08Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+    A separate config class is used instead
+    of constructing a `~lsst.pex.config.DictField` so that mixed types can be
+    supported and the config can be validated at the beginning of the
+    execution. The ``toDict`` provides a minimal dictionary that override only


override -> overrides

sr525 · 2022-12-12T17:04:30Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+    of constructing a `~lsst.pex.config.DictField` so that mixed types can be
+    supported and the config can be validated at the beginning of the
+    execution. The ``toDict`` provides a minimal dictionary that override only
+    the default values and excludes the key-values pairs when the item is None.


values -> value

sr525 · 2022-12-12T17:05:43Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+            "of the other three."
+        ),
+        optional=True,
+    )


What happens if three of them aren't set?

Assuming it gets through to TreeCorr, then TreeCorr would raise an exception in that case. Not sure if that's what you want here or not.

But it wouldn't get to treecorr because an exception would be raised here:

analysis_tools/python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

Lines 323 to 330 in 041b14c

req_params = (self.nbins, self.bin_size, self.min_sep, self.max_sep)

num_req_params = sum(param is not None for param in req_params)

if num_req_params != 3:

msg = (

"You must specify exactly three of ``nbins``, ``bin_size``, ``min_sep`` and ``max_sep``"

f" in treecorr_config. {num_req_params} parameters were set instead."

)

raise FieldValidationError(self.__class__.bin_size, self, msg)

This duplicates what treecorr does, but that is desirable because we want misconfiguration to be captured at config time, not at runtime, which is what would happen if we let treecorr handle it.

I saw the catch below, makes sense now.

sr525 · 2022-12-12T17:06:56Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+    )
+
+    split_method = ChoiceField[str](
+        doc=("How to split the cells in the tree when building the tree " "structure."),


Odd formatting, tree " "structure."

Ah, black seems to have converted two-line docs into single line ones (rightly so) but without removing the extra quotations.

This is present in a couple of places.

sr525 · 2022-12-12T17:09:12Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+            msg = (
+                "You must specify exactly three of ``nbins``, ``bin_size``, ``min_sep`` and ``max_sep``"
+                f" in treecorr_config. {num_req_params} parameters were set instead."
+            )


I guess this answered my question, would it be useful though to have defaults set for the parameters?

I don't think a reasonable defaults can be set that is agnostic to the data/problem at hand. I believe that is why treecorr does not have defaults for these either. The default values for all other fields correspond to the default values of the corresponding quantities in treecorr. So for the sake of consistency, I'd like to leave them without any defaults.

sr525 · 2022-12-12T17:29:22Z

python/lsst/analysis/tools/actions/plot/xyPlot.py

+        allowed={scale: scale for scale in ("linear", "log", "symlog")},
+    )
+
+    xLinthresh = Field[float](


Inconsistent camel case

xLinThresh?

For the others as well.

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

rmjarvis · 2022-12-12T18:09:23Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+        ),
+        default=None,
+        optional=True,
+    )


I'm not sure if you want num_threads to be settable either. Probably that should be set based on something in the DM configuration elsewhere, rather than let the user set it explicitly here, but I don't know enough to say what to recommend on that one.

Removed a bunch of fields incl. num_threads, because it is easier to add fields and even backport them, than to remove fields.

rmjarvis · 2022-12-12T18:13:15Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+
+    .. math::
+
+       \rho_0(\theta) &= \left\langle \frac{\delta T_{PSF}}{T_{PSF}}(x) \frac{\delta T_{PSF}}{T_{PSF}}(x+\theta) \right\rangle  # noqa: E501, W505


rho0 is traditionally what people call <e*_psf e_psf>. So I think using this name for a different statistic is likely to be confusing.

Is there any publication where rho0 is defined?

I don't know if I've seen it in anything published. But the SLAC people who do PSF work use that notation pretty regularly.

Marco used it here: https://arxiv.org/pdf/2011.03408.pdf Figure 8.

As a quantity not involving any PSF residuals, I'm unhappy with the use of rho0 to mean ellipticity correlations. I could go with rho6 although I suspect that'd prove to be a bad choice as well. I could name it something like mu0 but that go lead this plot becoming undiscoverable :-|

Maybe just call it dT_auto or something more direct like that, rather than some code letter/number? Also, what's the intended use case for it? The mean <dT> (one-point function) is relevant for WL systematics, but I don't think this combination is directly applicable to WL. Does LSS care about this?

The auto-correlation function of dT/T is what was called as rho3 in Melchior et al. (2015), which you denoted as rho'3 statistic in your 2016 paper. The reason why it is interesting here, not necessarily from a scientific standpoint, but from a pipelines standpoint is that, for reasons not clear to us, we see a significant offset in PSF size on coadds. This serves as a good diagnostic tool to see how our planned changes to PSF coaddition improves the situation. This complements our other tools involving scatter plots and other one-point calculations from it.

Perhaps, I'll just follow your notation call it rho3prime or something.

Ah, yes. Peter and I disagreed at first about how dT errors propagate to xi+. Originally he thought his rho3 * <|e|^2> was the right systematic error on xi+. But I convinced him that the e terms couldn't be brought out of the angle brackets. They needed to stay as what I ended up calling rho3. <e dT/T, e dT/T>.

Right, <e dT/T, e dT/T> make sense scientifically. For pipeline validation, the uncorrelated nature of e on large scales suppresses the problem that rho3prime or rho3alt (what I had called rho0) would show. That's why I want to have this statistic, at least until we are confident we have resolved the issue with PSF coaddition.

rmjarvis · 2022-12-12T18:16:10Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+       \rho_5(\theta) &= \left\langle (e^*_{PSF}(x) (e_{PSF}\frac{\delta T_{PSF}}{T_{PSF}}(x+\theta)) \right\rangle  # noqa: E501, W505
+
+    The definition of ellipticity used in [1]_ correspond to ``epsilon``-type ellipticity, which is typically
+    smaller by a factor of 4 than using ``chi``-type ellipticity.


epsilon and chi are not standard names for what you're referring to here. The more typical names are shear vs distortion. epsilon is sometimes used to refer to shear, and chi is sometimes distortion. But not by any stretch universally. So you should at least give the names in addition to (or instead of) the Greek letters.

rmjarvis · 2022-12-12T18:19:27Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+        doc="The type of ellipticity to calculate",
+        allowed={
+            "chi": "Distortion, defined as (Ixx - Iyy)/(Ixx + Iyy)",
+            "epsilon": ("Shear, defined as (Ixx - Iyy)/" "(Ixx + Iyy + 2*sqrt(Ixx*Iyy - Ixy**2))"),


That's not actually the "definition" of shear. That's just one way to estimate shear. Shear is defined by the elements in the 2x2 transformation matrix.

Also, as per comment above, I'd prefer the "type" names be distortion and shear, rather than semi-arbitrary Greek letters.

I'm going to defer this to another ticket (DM-37325) and hence to another PR, since this notation is present elsewhere in the code as well. Adding a TODO comment for now.

Just adding a vote in for understandable names not random greek letters. Remember that non WL people will see this, like me, and it would be great if it was understandable.

I have added the expression for each statistic in the y-axis label.

rmjarvis · 2022-12-12T18:31:16Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+        # chi-type ellipticity has a shear response of 2, so we need to
+        # divide by 2 so that the rho-stats do not depend on the
+        # ellipticity-type.
+        responsitivity = 2.0 if self.ellipticityType == "chi" else 1.0


This is only right in the limit that e is small. That's probably ok for this purpose, but the distortion responsivity is actually 2-<|e|^2>.

rmjarvis · 2022-12-12T18:40:17Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+            # Calculate the cross-correlation
+            xy.process(catA, catB)
+
+        _LOG.debug("Correlated %d pairs out of %d possible pairs", xy.npairs, len(ra) * (len(ra) - 1) / 2)


I would expect this to give an error, although maybe it's just a logging warning. xy.npairs is an array, not an int. It's the number of pairs for each bin of the output correlation. It's also a bit weird to report this as being how many were correlated out of some total possible. The ones that weren't correlated were at different separations, so were they "possible"?

Ah, good catch. I meant to put in sum(xy.npairs) there.

The total number here is to give some indication about whether we have explore sufficient range in the scales or not. As we plan to run this on a regular cadence as we collect data, the angular scales that we set might evolve and this is intended to guide that.

I don't think the total of npairs, either as an absolute number or as compared to N*(N-1)/2, would say much about how well you're covering scales though. Maybe S/N of the first and last bin? If the S/N of either gets largish, it could mean that it would be useful to extend the range.

rmjarvis · 2022-12-12T18:44:03Z

python/lsst/analysis/tools/analysisParts/rhoStatistics.py

+        self.process.calculateActions.stars.treecorr.min_sep = 0.1
+        self.process.calculateActions.stars.treecorr.max_sep = 100.0
+        self.process.calculateActions.stars.treecorr.sep_units = "arcmin"
+        self.process.calculateActions.stars.treecorr.metric = "Arc"


Don't use Arc. It's just slower and doesn't add anything useful compared to the default Euclidean.

arunkannawadi

OK, I believe I have addressed all the comments, or created TODO ticket items to work on. This is certainly a minimum viable thing that I'd like to get this in the weekly that gets tagged tonight. @rmjarvis do I have your approval?

arunkannawadi · 2022-12-21T01:14:24Z

python/lsst/analysis/tools/tasks/objectTableSurveyAnalysis.py

+            for dataInfo in dataId:
+                plotInfo[dataInfo.name] = dataId[dataInfo.name]
+
+        return plotInfo


I've reduced some duplication by refactoring the parsePlotInfo in the base class. The only duplication is the if/else branch about inputs being None. I am not sure why we need that in the first place.

arunkannawadi · 2022-12-21T01:17:00Z

python/lsst/analysis/tools/actions/vector/calcRhoStatistics.py

+        doc="The type of ellipticity to calculate",
+        allowed={
+            "chi": "Distortion, defined as (Ixx - Iyy)/(Ixx + Iyy)",
+            "epsilon": ("Shear, defined as (Ixx - Iyy)/" "(Ixx + Iyy + 2*sqrt(Ixx*Iyy - Ixy**2))"),


I have added the expression for each statistic in the y-axis label.

rmjarvis · 2022-12-21T17:20:57Z

@rmjarvis do I have your approval?

Sure. Looks fine to me.

`addPlotInfo` does not handle `plotInfo` that is None. Fix type hints to reflect that, and in `makePlot` methods that call `addPlotInfo` without checking if `plotInfo` is not None.

This adds the plots to pipelines/coaddQualityCore and tests/testPlots pipelines.

rmjarvis self-requested a review December 2, 2022 18:01

arunkannawadi force-pushed the tickets/DM-36797 branch 2 times, most recently from badcca1 to 3b9f385 Compare December 6, 2022 16:06

arunkannawadi marked this pull request as ready for review December 6, 2022 16:17

arunkannawadi commented Dec 6, 2022

View reviewed changes

arunkannawadi force-pushed the tickets/DM-36797 branch 2 times, most recently from 3d7816c to d023606 Compare December 6, 2022 21:49

sr525 reviewed Dec 12, 2022

View reviewed changes

rmjarvis requested changes Dec 12, 2022

View reviewed changes

arunkannawadi force-pushed the tickets/DM-36797 branch from cc10c09 to d56b045 Compare December 20, 2022 18:53

Fix typos in AnalysisPlot documentation

24beedb

arunkannawadi force-pushed the tickets/DM-36797 branch from d56b045 to 6791aa1 Compare December 20, 2022 18:57

Add the preamble to shapeSizeFractional

0d7cd5f

arunkannawadi force-pushed the tickets/DM-36797 branch 9 times, most recently from 507775f to 5c84f97 Compare December 21, 2022 03:30

arunkannawadi force-pushed the tickets/DM-36797 branch 2 times, most recently from 7735201 to df85018 Compare December 21, 2022 16:34

arunkannawadi commented Dec 21, 2022

View reviewed changes

rmjarvis approved these changes Dec 21, 2022

View reviewed changes

arunkannawadi force-pushed the tickets/DM-36797 branch from df85018 to 280a136 Compare December 21, 2022 20:44

arunkannawadi added 5 commits December 21, 2022 12:44

Split up parsePlotInfo in base task

46b9a92

Fix type hints in addPlotInfo

899ac07

`addPlotInfo` does not handle `plotInfo` that is None. Fix type hints to reflect that, and in `makePlot` methods that call `addPlotInfo` without checking if `plotInfo` is not None.

Add a new survey analysis task

45d1c17

Add rho statistics calculation

dfc36dd

Add an XY plot type

d93efa0

arunkannawadi force-pushed the tickets/DM-36797 branch from 280a136 to a2eea8b Compare December 21, 2022 20:46

arunkannawadi added 3 commits December 21, 2022 17:33

Add rhoStatisticsPlot plot type

a286952

Add routines to plot rho statistics

24576d6

Add rho statistics plots to the pipelines

1ce733d

This adds the plots to pipelines/coaddQualityCore and tests/testPlots pipelines.

arunkannawadi force-pushed the tickets/DM-36797 branch from fc334e9 to 1ce733d Compare December 22, 2022 01:34

arunkannawadi merged commit 490ba8e into main Dec 22, 2022

arunkannawadi deleted the tickets/DM-36797 branch December 22, 2022 14:46


		def setDefaults(self):
		self.selectWhenTrue = ["{band}_calib_psf_used"]

	req_params = (self.nbins, self.bin_size, self.min_sep, self.max_sep)
	num_req_params = sum(param is not None for param in req_params)
	if num_req_params != 3:
	msg = (
	"You must specify exactly three of ``nbins``, ``bin_size``, ``min_sep`` and ``max_sep``"
	f" in treecorr_config. {num_req_params} parameters were set instead."
	)
	raise FieldValidationError(self.__class__.bin_size, self, msg)


		.. math::

		\rho_0(\theta) &= \left\langle \frac{\delta T_{PSF}}{T_{PSF}}(x) \frac{\delta T_{PSF}}{T_{PSF}}(x+\theta) \right\rangle # noqa: E501, W505

DM-36797: Add RhoStatistics #47

DM-36797: Add RhoStatistics #47

Conversation

arunkannawadi commented Dec 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmjarvis Dec 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arunkannawadi Dec 14, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sr525 Dec 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmjarvis Dec 20, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arunkannawadi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmjarvis commented Dec 21, 2022

rmjarvis Dec 12, 2022 •

edited

arunkannawadi Dec 14, 2022 •

edited

sr525 Dec 12, 2022 •

edited

rmjarvis Dec 20, 2022 •

edited