Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38850: Make trailedAssociatorTask #173

Merged
merged 3 commits into from Sep 28, 2023
Merged

DM-38850: Make trailedAssociatorTask #173

merged 3 commits into from Sep 28, 2023

Conversation

bsmartradio
Copy link
Contributor

@bsmartradio bsmartradio commented Jun 29, 2023

Make trailedAssociatorTask which filters out trails whose lengths are above 0.416 arcseconds/second in length. The trailed sources are currently dropped once they are filtered out. More complexity for filtering will be added at a later date.

@bsmartradio bsmartradio force-pushed the tickets/DM-38850 branch 2 times, most recently from 56c586d to 62f1e4b Compare June 29, 2023 18:38
@parejkoj
Copy link
Contributor

parejkoj commented Jul 7, 2023

The pull request should have the ticket number in it: https://developer.lsst.io/work/flow.html#make-a-pull-request

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run this? I think it would have failed a linter, at least. There were a lot of typos and leftover print/import statements that were clearly from debugging; please try to have those cleaned up before you send it for review.

There are no tests specific to the new Task itself: please add a file to test the new Task output.

Is this really worth having a separate Task, instead of just adding a config, branch, and method to AssociationTask? It's just ~3 lines of code; do we expect that we will be significantly expanding functionality compared to what you've implemented here, and are there any other places where we need this as a Task?

python/lsst/ap/association/association.py Show resolved Hide resolved
python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
@bsmartradio bsmartradio changed the title Make trailedAssociatorTask DM-38850: Make trailedAssociatorTask Jul 21, 2023
@bsmartradio bsmartradio force-pushed the tickets/DM-38850 branch 3 times, most recently from 43b2a7d to d7c4173 Compare August 21, 2023 18:39
@bsmartradio bsmartradio force-pushed the tickets/DM-38850 branch 6 times, most recently from 3dedc44 to 2c8f5ae Compare August 23, 2023 21:37
@bsmartradio
Copy link
Contributor Author

@bsmartradio
Copy link
Contributor Author

@parejkoj Are you happy with the changes? If so I can rebase onto main and merge the changes.

python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
python/lsst/ap/association/trailedSourceFilter.py Outdated Show resolved Hide resolved
python/lsst/ap/association/trailedSourceFilter.py Outdated Show resolved Hide resolved
python/lsst/ap/association/trailedSourceFilter.py Outdated Show resolved Hide resolved
python/lsst/ap/association/trailedSourceFilter.py Outdated Show resolved Hide resolved
- ``trailed_dia_sources`` : DIASources that have trailed more
than 0.416 arcseconds/second*exposure_time. (`pandas.DataFrame`)
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I fixed in one place but not another. I've fixed them all now.

python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
python/lsst/ap/association/association.py Outdated Show resolved Hide resolved
if len(diaTrailedResult.trailedDiaSources) > 0:
print("Trailed sources cleaned.")
else:
print("No trailed sources to clean.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you removed print statements, but didn't add a log.info: I think we absolutely want to have a log statement about how many sources were removed.

python/lsst/ap/association/trailedSourceFilter.py Outdated Show resolved Hide resolved
"""A simple implementation of source association task for ap_verify.
"""

__all__ = ["TrailedSourceFilterTask", "TrailedSourceFilterConfig"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KT has gone the other way on my dev guide suggestion, so lets not touch anything else right now while we sort that out.

lsst-dm/dm_dev_guide#632

python/lsst/ap/association/trailedSourceFilter.py Outdated Show resolved Hide resolved
result : `lsst.pipe.base.Struct`
Results struct with components.

- ``"dia_sources"`` : DiaSource table that is free from unwanted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how they should be documented: https://developer.lsst.io/python/numpydoc.html#struct-types

Please file a ticket to go through all of ap_assocation and fix the Struct docstrings, if you know some are wrong.

Boolean mask for DIASources which are greater than the
cutoff length.
"""
diffIm_time = diffIm.getInfo().getVisitInfo().getExposureTime()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference images come from visits. Given we can take 2 snaps and there is a gap between the snaps, is the relevant time here the exposure time of a snap or the duration of the visit including the gap (which is greater than the exposure time)?

(also, if diffim is an exposure then the docstring for diffIm is wrong).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a policy and/or actual code that defines what the "exposure time" of a snap-combined image is? The docs for VisitInfo just say:

get exposure duration (shutter open time); (sec)

Since we can't yet run CharacterizeImageTask or CalibrateImageTask on multi-snap visits, I'm pretty sure that this case is untestable either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it's defined to be the sum of the exposure time of the snaps. Unlike ObservationInfo, VisitInfo doesn't record the start and end time, only the midpoint (which for two snaps might be a time where no data are being taken).

Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you marked some comments resolved without making the requested changes (e.g. excessive newlines, only needing the exposure time not the full exposure), and didn't incorporate some others (e.g. referring to the default value in docstrings, making the check_dia_source_trail method private)? I've flagged some of them with eyes, but please check for others that were missed.

tests/test_trailedSourceFilter.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
tests/test_association_task.py Outdated Show resolved Hide resolved
Comment on lines 102 to 109
for test_obj_id, expected_obj_id in zip(
results.matchedDiaSources["diaObjectId"].to_numpy(),
[1, 2, 3, 4]):
self.assertEqual(test_obj_id, expected_obj_id)
for test_obj_id, expected_obj_id in zip(
results.unAssocDiaSources["diaObjectId"].to_numpy(),
[0]):
self.assertEqual(test_obj_id, expected_obj_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you write these like self.assertEqual(results.matchedDiaSources["diaObjectId"], [1,2,3,4]? Or at worst use np.testing.assert_arrays_equal? The loops make it hard to follow what is being tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() for the arrays, and the original person who wrote the tests I followed likely did it in this way to get around this problem. I looked up ways around this, and this popped up as one of the suggested ways of comparing values in an array for unit testing. The other option seems to be self.assertTrue(np.array_equal(results.matchedDiaSources["diaObjectId"].values, [1,2], equal_nan=True)), which I've swapped to since it seems a tad more explicit in what its doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use np.testing.assert_arrays_equal then, instead. That will give explicit information which values mismatch. These cannot be NaN, so there's no need for NaN-safe comparisons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note, would it add clarity if I changed the other unit tests to follow this format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would definitely be helpful, yes! Please do it on a separate commit, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped to np.testing and also made a separate commit updating the prior unit tests

tests/test_trailedSourceFilter.py Outdated Show resolved Hide resolved
tests/test_trailedSourceFilter.py Show resolved Hide resolved
tests/test_trailedSourceFilter.py Outdated Show resolved Hide resolved

results = trailedSourceFilterTask.run(self.diaSources, self.exposure)

self.assertEqual(len(results.diaSources),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these tests should also have an assert on the contents of the other array returned in the struct (the sources that were filtered). Better to test on which ones were included than just the length: I think this is the first 3 in the first list, and the last two in the second? Similarly in test_run_short_max_trail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests similar to self.assertTrue(np.array_equal(results.matchedDiaSources["diaObjectId"].values, [1,2], equal_nan=True)) to check that the output arrays are what is expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to np.testing.

tests/test_trailedSourceFilter.py Outdated Show resolved Hide resolved
@bsmartradio bsmartradio force-pushed the tickets/DM-38850 branch 7 times, most recently from 1af91a5 to 19a1c29 Compare September 19, 2023 20:43
Update unit tests in test_association_task.py to use np.testing.assert_array_equal for testing array equality.
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small comments, and I unresolved and put eyes on a few others that it looks like you missed. Clean these up, and you're good to go.

diaTrailedResult = self.trailedSourceFilter.run(diaSources, exposure_time)
matchResult = self.associate_sources(diaObjects, diaTrailedResult.diaSources)

self.log.warning("%i DIASources exceed maxTrailLength, dropping "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a warning? Nothing went wrong, since the intent was to remove sources. log.info is probably fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that the other log.info that drops a source was a warning. However, you are right since dropping the source makes it work as intended so it should be a log.info.

assocResults = self.associator.run(diaSourceTable,
loaderResult.diaObjects)
assocResults = self.associator.run(diaSourceTable, loaderResult.diaObjects,
exposure_time=diffIm.getInfo().getVisitInfo().getExposureTime())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the properties:

Suggested change
exposure_time=diffIm.getInfo().getVisitInfo().getExposureTime())
exposure_time=diffIm.visitInfo.exposureTime)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped to using properties.

"""Config class for TrailedSourceFilterTask.
"""

maxTrailLength = pexConfig.Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I missed this earlier: if we're going with snake_case throughout, we should make the configs also snake_case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've swapped to snake case in just trailedSourceFilter.py and left it camel in association.py

Comment on lines 92 to 93
Creates a mask for sources with lengths greater than 0.416
arcseconds/second multiplied by the exposure time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, don't mention the default value, since it's configurable.

Suggested change
Creates a mask for sources with lengths greater than 0.416
arcseconds/second multiplied by the exposure time.
Return a mask of sources with lengths greater than ``config.maxTrailLength`` multiplied by the exposure time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped the wording to what you've suggested.

Plus several edits from review
@bsmartradio
Copy link
Contributor Author

@bsmartradio bsmartradio merged commit 9b2d7c2 into main Sep 28, 2023
2 checks passed
@bsmartradio bsmartradio deleted the tickets/DM-38850 branch September 28, 2023 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants