Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25132: Add position covariances to reference catalog converter #299

Merged
merged 3 commits into from Dec 14, 2022

Conversation

cmsaunders
Copy link
Contributor

This adds support for position covariance parameters in the reference catalog converter. It also adds a function for Gaia specifically to set the position covariance.

loader = ReferenceObjectLoader([dataRef.dataId for dataRef in self.datasetRefs],
self.handles,
name="testrefcat",
config=loaderConfig)"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you had to override setUp (to make a v1-style refcat), but why did you have to override this method? It should "just work" as-is, I'd think.

Also, you have a commented-out block here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching the commented-out block. I had actually decided I didn't want to inherit from ReferenceObjectLoaderLoadTests, because the other tests don't need to be rerun for version one, but then I didn't change the class definition. I pushed the fix.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an ambitious PR, but it feels like coherence has suffered as a result -- the code, documentation, and unit tests don't quite line up. (And some of my comments are therefore mutually exclusive, depending on which way you decide to go.)

The two issues I'd like to see addressed before I sign off on this are:

  1. The API seems to require a lot of familiarity with the code to be usable: ConvertRefcatManager seems to have hidden assumptions about a specific subset of the Gaia catalog built in, and there are a lot of interlocking preconditions that will surprise users (and, in the case of config.coordinate_covariance, are neither enforced nor checked at runtime).
  2. While described as a deprecation, the new API will break existing calls, particularly those that expect properMotionErrDim to guarantee that error columns are present in the output catalog (not to mention any clients of the old astrometric error columns). Is there an RFC that agrees to just break everything without a deprecation period (I couldn't find one), and if so why are you talking about deprecation at all?

Other than that, most of my comments are style nitpicks. Thanks for taking this on!

doc/lsst.meas.algorithms/reference-catalog-schema.rst Outdated Show resolved Hide resolved
python/lsst/meas/algorithms/convertReferenceCatalog.py Outdated Show resolved Hide resolved
@@ -369,30 +370,35 @@ def makeMinimalSchema(filterNameList, *, addCentroid=False,

- If 2 or 3: add fields "coord_raErr" and "coord_decErr".
- If 3: also add field "coord_radecErr".

Ignored if ``fullPositionCovariance=True``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there plans to deprecate coordErrDim? It seems a bit odd to treat it differently from properMotionErrDim.

Comment on lines 384 to 387
fullPositionCovariance : `bool`
Include a full coordinate+PM+parallax covariance matrix. Requires
that `addProperMotion` and `addParallax` are set.
Ignores the `coordErrDim` and `properMotionErrDim` parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are addProperMotion and addParallax necessary? It seems to me the more natural behavior would be to create whichever submatrix is appropriate for the astrometric parameters in the output catalog. It might even require less code than the current flag-based approach.

IIRC not all Gaia stars are 5-parameter solutions (known quasars used to be 4-parameter?), and other catalogs might simply not report insignificant measurements despite the biases that introduces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right that addProperMotion and addParallax should go away, too.

Our logic here is that either you have PM+parallax+all covariance (Gaia), or you don't care about any position errors beyond ra/dec (anything non-Gaia). Gaia is the only relevant astrometric catalog (until we make our own, and we won't need this code for that), and whether or not a given gaia star has the full 5-parameter covariance is irrelevant to whether the refcat should have it (since it needs to be able to hold all the covariances).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that you're adding lots of API and maintenance complexity for the sole purpose of reducing functionality compared to if you just did things the simple way.

But even if you are starting from a Gaia catalog (and this is supposed to be the generic converter, not the Gaia specialization), why should users be required to translate parallax in order to get covariances for the other parameters? I can see why LSST needs proper motions, but parallax seems irrelevant to data processing, so keeping it as an independent flag makes sense.

Comment on lines 396 to 398
warnings.warn("properMotionErrDim is deprecated and has no effect; "
"use `fullPositionCovariance` to include all position errors in the schema. "
"It will be removed after release 25.0.", category=FutureWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we have a 24.1, 25 will be the first release to contain the deprecation. Therefore, standard policy is that it can only be removed after release 26.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems like a bad idea to "deprecate" a feature by removing all functionality -- such a silent failure is potentially more harmful than just forbidding its use outright. Why not keep the old system, and make it mutually exclusive with the new system?

Copy link
Contributor

@parejkoj parejkoj Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's no use for the current system: these hooks were in place to allow someone to put those error fields in the refcat, but we didn't actually have any code that used the =3 case. Trying to maintain the above functionality (which uses different names for the fields) along with the new full covariance seemed like excessive work for no gain. The only relevant astrometric catalog is Gaia, and until this ticket we couldn't actually load the relevant error fields for Gaia; I'm positive that removing that old "functionality" breaks nobody's code.

That said, I think you're right further above that we should just remove it directly instead of deprecating it.

tests/convertReferenceCatalogTestBase.py Outdated Show resolved Hide resolved
tests/convertReferenceCatalogTestBase.py Outdated Show resolved Hide resolved
@@ -293,6 +297,12 @@ def testGetMetadataCircle(self):
self.assertEqual(metadata['JEPOCH'], epoch.jyear)


class ConvertTestcaseManager(ConvertRefcatManager):
"""We need to be using a special-case manager to pass tests related to
covariance, but we don't actually need it to do anything.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need covariance handling in order to pass tests related to covariance, what exactly are you testing?


def test_properMotion(self):
"""Test proper motion correction.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable for this test to diverge from ReferenceObjectLoaderLoadTests.test_properMotion? If not, I recommend factoring the test assertions into a function that can be called by both.

Comment on lines 210 to 281
with self.subTest(name="manager"):
config = makeConvertConfig(withRaDecErr=True, withParallax=True, withPm=True)
config.coordinate_covariance = True
with self.assertRaises(FieldValidationError):
config.validate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this block be moved to it's own function? It's very confusing that name suddenly does not mean "column name".

@cmsaunders
Copy link
Contributor Author

After spending a few days thinking about how to simplify and streamline this, I think there are a few root changes that would help a lot. I am going to start with changing how the reference catalog schema is made in ReferenceObjectLoader.makeMinimalSchema, to simplify the changes and so that we don't have to add the prefix 'coord_' to all the error field, which caused a lot of downstream complications. Second, I am going to rearrange how the refcat converter triggers adding coordinate covariance and where errors are raised, which I think can be simplified.
I think these changes will make the logic more straightforward, and hopefully allow me to remove a lot of workarounds in the test suite.

@cmsaunders
Copy link
Contributor Author

I've implemented the changes proposed in my last comment, and I think the result is significantly clearer. Please disregard the status of the commits. I made atomized commits to facilitate easier squashing, pending approval of the changes.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I still can't approve these changes. While the flag simplification could potentially make the documented API less confusing for users, the fact that the code doesn't match the documentation makes it impossible for me to tell what the intended behavior was (all three interpretations seem a bit odd). I think better test coverage would help.

I haven't seen anything addressing my other objection, that this PR is making breaking changes to both the API and the catalog schema and calling it a deprecation. Are you authorized to break everything (in which case no deprecation is needed), or are you trying to create a transition period (in which case backwards-compatibility needs to be preserved)? The current approach gives you the worst of both worlds.

- If 3: also add field "pm_radecErr".
Deprecated (see DM-?????): Use ``fullPositionCovariance`` when proper motions
are provided, as they will be correlated with the other position
paramters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
paramters.
parameters.

@@ -370,16 +370,17 @@ def makeMinimalSchema(filterNameList, *, addCentroid=False,

- If 2 or 3: add fields "coord_raErr" and "coord_decErr".
- If 3: also add field "coord_radecErr".

Ignored if ``fullCoordinateCovariance=True``.
Ignored if ``fullCoordinateCovariance=True``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this extra indent is incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the code does not match this documentation; if coordErrDim and fullCoordinateCovariance are both set, it tries to add the (differently named) columns from both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this line has an extra indent because it only applies to "If 3...".
I don't think your second comment is correct. The only column that is in common is the coord_ra/coord_dec covariance. If fullCoordinateCovariance is True, then the diagonalOnly option at line 454 will be True, so the covariance will not be added there. You are correct that the two paths to adding ra/dec covariance end up with different names, but I don't see a way to avoid this.

Copy link
Member

@kfindeisen kfindeisen Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistake. This is not the first time that I'd gotten confused about what diagonalOnly does, and I missed this one when I was cleaning up.

As for the indent, if it's supposed to be a continuation of "If 3", then it probably needs to align with the "I". Though I'll admit I haven't checked what the built docs look like.

schema.addField(
field="parallax_flag",
type="Flag",
doc="Set if parallax or parallax error is bad",
)
if fullCoordinateCovariance:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's a three-way inconsistency here:

  • The docs say that fullCoordinateCovariance requires addProperMotion and addParallax.
  • Input validation allows all combinations of the three flags (with a warning).
  • In the code, fullCoordinateCovariance is processed if addParallax is set (regardless of the setting of addProperMotion), and silently does nothing otherwise.

Which of the three is the intended behavior?

I recommend covering all 8 combinations of addProperMotion, addParallax, and fullCoordinateCovariance in unit tests; that will make sure the logic is correct. I see only the two combinations where all three are True and all three are False; the remaining 6 are untested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there were a couple errors on my part here. First, the indentation of this block of code was wrong. It should not have been inside the if addParallax block. Second, the documentation for fullCoordinateCovariance wasn't fixed to reflect the refactoring. (It now reads "Include the off-diagonal elements of the full coordinate+PM+parallax covariance matrix. The diagonal elements are provided by setting coordErrDim, addProperMotion, and addParallax.")
None of the three fields require each other now. The second of your bullet points above is now the only one that is true.

@@ -107,7 +107,7 @@ def testMakeMinimalSchema(self):
self.assertEqual("photometric" in refSchema, addIsPhotometric)
self.assertEqual("coord_raErr" in refSchema, coordErrDim > 0)
self.assertEqual("coord_decErr" in refSchema, coordErrDim > 0)
self.assertEqual("coord_ra_dec_Cov" in refSchema, coordErrDim == 3)
self.assertEqual("coord_ra_coord_dec_Cov" in refSchema, coordErrDim == 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that the column name for ra-dec covariance is different depending on whether it was added through coordErrDim or fullCoordinateCovariance (let alone both)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently yes (though no, they can't both be set). I addressed this in another comment, we can discuss options.

Comment on lines 152 to 156
coordErrDim=-1)
# Again, just test a sample of the fields.
self.assertIn("coord_raErr", schema)
self.assertIn("coord_ra_dec_Cov", schema)
self.assertIn("coord_pm_ra_parallax_Cov", schema)
self.assertIn("coord_ra_coord_dec_Cov", schema)
self.assertIn("pm_ra_parallax_Cov", schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test should be passing after these changes...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this comment was not made on the latest version of the code...

"If true, a custom `ConvertRefcatManager` class must exist to compute the output covariances.",
# TODO: We need to check that the manager config is overridden because the default convert manager
# will not work for this.
"If true, a custom `~ConvertRefcatManager` class must exist to compute the output covariances.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this link will resolve. Did you mean

Suggested change
"If true, a custom `~ConvertRefcatManager` class must exist to compute the output covariances.",
"If true, a custom `~lsst.meas.algorithms.ConvertRefcatManager` class must exist to compute the output covariances.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I thought this would work based on some docs I was reading. I'll try to get the documentation built to see.

@@ -507,8 +507,11 @@ def makeMinimalSchema(filterNameList, *, addCentroid=False,
doc="Set if parallax or parallax error is bad",
)
if fullCoordinateCovariance:
if not (addProperMotion and addParallax):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addParallax is guaranteed True within this branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, so this is not in the if addParallax branch anymore.

@@ -296,7 +296,7 @@ def setUpClass(cls):

# override some field names.
config = convertReferenceCatalogTestBase.makeConvertConfig(withRaDecErr=True, withMagErr=True,
withPm=True, withParallax=True)
withPm=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for this change? The commit message doesn't give any context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a consequence of changing ReferenceObjectLoader.makeMinimalSchema so that addProperMotion adds the proper motion parameters and their errors. I removed the argument withPmErr from convertReferenceCatalogTestBase.makeConvertConfig because the proper motion errors are no longer added separately (so withPm adds both), so it is also removed here. I can include this in the commit message when I rebase and squash the messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this change removes withParallax? 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withParallax=True is from one of the earlier commits on this PR. Including it was necessary because of the side-effects of how ReferenceObjectLoader.makeMinimalSchema worked before I refactored the ticket. Parallax is not actually tested in this class.

Comment on lines 117 to 113
self.assertEqual("parallax_flag" in refSchema, addParallax)
# We test setting thse these all together, below.
self.assertNotIn("epoch", refSchema)
self.assertNotIn("pm_ra", refSchema)
self.assertNotIn("pm_dec", refSchema)
self.assertNotIn("pm_flag", refSchema)
self.assertNotIn("parallax", refSchema)
self.assertNotIn("parallax_flag", refSchema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should not be in the schema because the default for addParallax and addProperMotion is False, but I can remove this check if it seems confusing.

A comment that you are testing the defaults would help.

@@ -446,7 +450,7 @@ def makeMinimalSchema(filterNameList, *, addCentroid=False,
prefix="coord",
names=["ra", "dec"],
units=["rad", "rad"],
diagonalOnly=(coordErrDim == 2),
diagonalOnly=((coordErrDim == 2) or (fullCoordinateCovariance)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this produce coord_ra_dec_Cov, not coord_ra_coord_dec_Cov as advertised? In fact, it seems that if coordErrDim > 0 and fullCovarianceMatrix, then you get both columns.

Unfortunately, I don't see an easy fix short of modifying addFields to support empty prefixes; the code will inject a leading underscore if you just pass prefix="".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct that this produces coord_ra_dec_Cov, not coord_ra_coord_dec_Cov (which is the way fullCoordinateCovariance names the field). I think the only solution would be to just have a separate addFields command to add the covariance. We can discuss during the meeting today.

if fullCoordinateCovariance:
if not (addProperMotion and addParallax):
warnings.warn("Adding proper motion and parallax covariances without proper motion and "
"parallax parameters and errors.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bad idea; why would we ever want someone to be able to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that no one will want to do this, but I don't think it's worth forcing a failure if they do, hence the warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely want this to be a failure, not a warning. No one wanting to do this means it should fail.

units = ["rad", "rad", "rad/year", "rad/year", "rad"]
for i in range(5):
i_field = fields[i]
i_unit = units[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this nested for-loop is an improvement over using the CovarianceMatrixXKey?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that the resulting names match the existing schema much more closely, so we don't have to add hundreds of lines elsewhere.

@parejkoj
Copy link
Contributor

I'm happy to file an RFC if one of the objections is that this is an API-breaking change for the refcat converter; I expect that RFC to have no objections at all, because the code dealing with position errors had no users other than me.

@cmsaunders
Copy link
Contributor Author

Thank you for the second look at the PR. I'll try to address the major points from your review now, and will go through the in-line comments in the next day or so:

  • I did miss addressing the deprecation of properMotionErrDim issue in the changes to the PR, so I apologize for that. As @parejkoj commented, it is highly unlikely that anyone else is using this and would object to removing it. We can do an RFC, and if it is an option to completely remove it without going through the deprecation process, that is obviously preferable. If not, I can put back the original behavior.
  • Since the recent commits, the changes in the reference catalog schema have been reduced to just the names of the coord_ra/coord_dec and pm_ra/pm_dec covariance, neither of which were actually present in any reference catalogs used in our processing. I don't know if an RFC is appropriate for changing the names, but I don't think there is any way to keep the names the same and still have overall naming consistency.
  • I do see some places where the documentation is inconsistent and can be improved, but to avoid endless going back and forth over language and the general approach to the ticket, maybe we can find a time to meet and sort out an acceptable solution.

@kfindeisen
Copy link
Member

kfindeisen commented Nov 30, 2022

I'm happy to file an RFC if one of the objections is that this is an API-breaking change for the refcat converter; I expect that RFC to have no objections at all, because the code dealing with position errors had no users other than me.

I did miss addressing the deprecation of properMotionErrDim issue in the changes to the PR, so I apologize for that. As @parejkoj commented, it is highly unlikely that anyone else is using this and would object to removing it. We can do an RFC, and if it is an option to completely remove it without going through the deprecation process, that is obviously preferable. If not, I can put back the original behavior.
Since the recent commits, the changes in the reference catalog schema have been reduced to just the names of the coord_ra/coord_dec and pm_ra/pm_dec covariance, neither of which were actually present in any reference catalogs used in our processing. I don't know if an RFC is appropriate for changing the names, but I don't think there is any way to keep the names the same and still have overall naming consistency.

I think a new RFC would be appropriate but won't insist on it, since that would add at least a week's delay to getting this merged. The main reason I mentioned it was that I assumed there already was one (you need to RFC deprecations, after all) and I just couldn't find it.

For me, the core issue here is choosing whether you are breaking things now or following the deprecation process, and committing to that choice. If you don't think anybody would be affected by just deleting/renaming the old API outright, then I'm willing to take your word for it.

- ``coord_dec_pm_dec_Cov``: covariance between coord_dec and pm_dec (rad^2/year)
- ``coord_dec_parallax_Cov``: covariance between coord_dec and parallax (rad^2/year)
- ``pm_ra_parallax_Cov``: covariance between pm_ra and parallax (rad^2/year)
- ``pm_dec_parallax_Cov``: covariance between pm_dec and parallax (rad^2/year)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure the units documented above are all correct? I guess the only one that's different is pm_ra_pm_dec_Cov (rad^2/year^2), which we did get right above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that some of the old units were wrong -- coord_ra_parallax_Cov should be rad^2, not rad^2/year -- so I've changed this.

Row from catalog being converted.
"""
raise NotImplementedError("There is no default method for setting the covariance. A function must be"
" defined in the special-case manager.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A function ..." -> "Override this method in a subclass specialized for your dataset."

catalog.

There is no generic way to determine covariance. A specific function
must override this in the special-case manager.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not "a specific function". "Override this method in a subclass specialized for your dataset."

----------
filterNameList : `list` of `str`
List of filter names. Used to create <filterName>_flux fields.
addIsPhotometric : `bool`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addCentroid is missing from the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what copying and pasting gets you--it was missing in makeMinimalSchema too.

# Add all the off-diagonal covariance terms
fields = ["coord_ra", "coord_dec", "pm_ra", "pm_dec", "parallax"]
units = ["rad", "rad", "rad/year", "rad/year", "rad"]
for i in range(5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you doing these loops with indices? for field, unit in zip(itertools.combinations(fields, r=2), itertools.combinations(units, r=2)): and then e.g. {field[0]}_{field[1]}_Cov, should give you all the entries, no double loop or indices necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use this because I'm not really familiar with itertools. I'm fine with changing this.

if self.full_position_information:
# Since coordinate_covariance is True, this will only pass for the
# "None" case
assertAllOrNone("full_position_information",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment above: won't this only pass for the "All" case, since all those fields must be set if full_position_information is True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should be "all"...somehow it got reversed between my brain and fingers.


inTempDir1.cleanup()
inTempDir2.cleanup()
dataTempDir.cleanup()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path cleanups need to happen in tearDown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what was already there, and these aren't class variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should probably make them so, otherwise if the test fails, it won't cleanup.

Copy link
Member

@kfindeisen kfindeisen Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative that should make @parejkoj happy while keeping them isolated to the test that actually uses them is to use the TemporaryDirectory calls as (triply-nested 😬) context managers, or to put the cleanup in a finally block. I certainly don't support having tearDown handle something that wasn't introduced in setUp, since such code will be hard to keep correct/in sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put all the context managers inside the same with, so you don't have to nest them. That would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parejkoj , could you give me a code snippet for that?

Copy link
Contributor

@parejkoj parejkoj Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with tempfile.TemporaryDirectory() as path1, with tempfile.TemporaryDirectory() as path2:, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work. It looks like you aren't supposed to repeat with, but even after fixing that, I'm not getting a TemporaryDirectory() object with this--I just get a string with the directory name. I'm just going to go back to the option of making these class variables and adding setup and teardown functions.

Copy link
Contributor

@parejkoj parejkoj Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes, you don't repeat with. When used as a context manager, the directory name is the target of the with, and then you just use that. Cleanup happens when the context manager exits.

See the docs:

https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryDirectory

catalog = self._createFakeCatalog(nOld=10, nNew=0)

with self.assertRaises(NotImplementedError):
self.worker._setCoordinateCovariance(catalog[0], self.fakeInput[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this test accomplishes much: we've defined it as not implemented in the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this test was requested in earlier comments.

self.assertEqual(refSchema['coord_raErr'].asField().getUnits(), "rad")
self.assertEqual(refSchema['coord_ra_coord_dec_Cov'].asField().getUnits(), "rad rad")
self.assertEqual(refSchema['pm_raErr'].asField().getUnits(), "rad/year")
self.assertEqual(refSchema['pm_dec_parallax_Cov'].asField().getUnits(), "rad/year rad")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these fail with how you've reworked the units handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already fixed.

self.assertEqual(refSchema['pm_dec_parallax_Cov'].asField().getUnits(), "rad/year rad")

def testCheckFluxUnits(self):
"""Test that we can identify old style fluxes in a schema."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to still live in the loader tests, it doesn't have anything to do with the converter.

class ConvertGaiaManagerTests(convertReferenceCatalogTestBase.ConvertReferenceCatalogTestBase,
lsst.utils.tests.TestCase):
"""Unittests of methods of ConvertGaiaManager
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is redundant.

@cmsaunders cmsaunders force-pushed the tickets/DM-25132 branch 2 times, most recently from b73e64c to acbeb2f Compare December 9, 2022 22:47
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one minor bug and some style comments. Thank you for your patience through the multiple iterations!

Comment on lines 512 to 514
inputUnits = [self.coord_err_unit, self.coord_err_unit,
self.properMotionUnit, self.properMotionUnit,
self.parallaxUnit]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in the wrong order -- should be parallax first, then proper motion. I recommend moving this line up next to inputParams to make it easier to check.

(Unit tests don't pick it up because properMotionUnit = parallaxUnit there).

addIsResolved : `bool`
If True then add field "resolved".
addIsVariable : `bool`
If True then add field "variable".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing the "Is" from these parameter names if the field names don't use it. Otherwise, it looks like a typo, and readers may waste time confirming that the documented field names are correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameter names are copied from the original makeMinimalSchema. I'm fine with changing them since we're changing so much already, unless @parejkoj knows of some historical reason for having "Is" in the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a little more, I think keeping the "Is" would be better, since these correspond to adding boolean fields. To me "addVariable", for example, makes me wonder if some scalar parameter is added, whereas "addIsVariable" more clearly implies that the resulting flag with correspond to "Is Variable" or "Is Not Variable".

schema.addField(
field=f"{i_field}_{j_field}_Cov",
type="F",
doc=f"uncertainty covariance between {i_field} and {j_field}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Uncertainty covariance" isn't standard statistics terminology; I'd recommend just saying "Covariance".

full_position_information = pexConfig.Field(
dtype=bool,
doc="Include epoch, proper motions, parallax, and covariances between sky coordinates, proper motion,"
" and parallax in the schema. If true, a custom `~ConvertRefcatManager` class must exist to"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken link? At this point, it might be better to stop trying to link it and just go with

Suggested change
" and parallax in the schema. If true, a custom `~ConvertRefcatManager` class must exist to"
" and parallax in the schema. If true, a custom ``ConvertRefcatManager`` class must exist to"

@@ -262,6 +423,18 @@ def assertAllOrNone(*names):
raise ValueError(
'"epoch_name" must be specified if "pm_ra/dec_name" or "parallax_name" are specified')

# Need all the error field names set if we are including covariances.
if self.full_position_information:
# Since coordinate_covariance is True, this will only pass for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Since coordinate_covariance is True, this will only pass for the
# Since full_position_information is True, this will only pass for the

- There are no off-diagonal covariance terms, such as covariance
between RA and Dec, or between PM RA and PM Dec. Support for such
covariance would have to be added to to the config, including consideration
of the units in the input catalog.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this with an assumption that if any covariance terms are available (even just ra-dec), the input catalog schema also has all five astrometric parameters and the covariances between all of them? There's probably a less confusing way of wording it than that.

self.assertEqual("photometric" in refSchema, addIsPhotometric)
self.assertEqual("photometric" in refSchema, addIsPhotometric)

# In the default schema none of the following should be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If fullPositionInformation is left at default"? The schemas in this test are already non-default in other ways.

config.validate()
del config

names = basicNames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant, now that there's no longer a distinction between basic and extended names.

Comment on lines 351 to 353
coordConvert = (self.worker.coord_err_unit).to(u.radian)
pmConvert = (self.worker.config.pm_scale * u.milliarcsecond).to_value(u.radian)
parallaxConvert = (self.config.parallax_scale * u.milliarcsecond).to_value(u.radian)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be self.worker.config, or self.config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed them to all be self.worker.config. As written, this doesn't fail because self.worker.config is self.config, but obviously it ought to be consistent.

cmsaunders and others added 2 commits December 13, 2022 17:23
This removes the function `ReferenceObjectLoader.makeMinimalSchema` and
adds the free function `_makeSchema` to `convertReferenceCatalog.py`.
The new function adds the possibility to include 5-d covariance in the
reference catalog. This option is turned on using the
`ConvertRefererenceCatalogConfig` option `full_position_information`.

Update refcat schema docs to reflect the correct covariance matrix names.
@cmsaunders cmsaunders merged commit a26ab52 into main Dec 14, 2022
@cmsaunders cmsaunders deleted the tickets/DM-25132 branch December 14, 2022 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants