Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-27476: Update raw ingest to use JSON metadata files #357

Merged
merged 34 commits into from Mar 12, 2021
Merged

Conversation

timj
Copy link
Member

@timj timj commented Feb 22, 2021

@timj timj changed the title DM-27476: Update raw ingest for JSON metadata files DM-27476: Update raw ingest to use JSON metadata files Feb 22, 2021
Copy link
Collaborator

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition of a yaml camera for testing: that makes me a bit more secure that yaml camera is viable long term.

I'm concerned about the broad exception clauses. For example, under what conditions do you expect reading an index file to fail? Are all such cases ones where you want the exception masked as a log message, or are there only a few that would be reasonable? I'm assuming we are in control of the generation of the index files, so we should be in control of ensuring they are properly formatted, and also of being told up front when they are not.

I found a few cases where I think the code would fail as written, so I'm concerned about test coverage; also, there are a lot of if branches here. Maybe some of those failures are being masked by the failFast config being false by default and the broad except Exception clauses? Have you looked at the coverage output for the new ingest code to see how many of the branches are being touched?

There are a lot of sentences in comments without periods, and sometimes without starting capitals. I only commented on a few at the start of the review, but please go through and try to fix them all.

python/lsst/obs/base/ingest.py Outdated Show resolved Hide resolved
python/lsst/obs/base/ingest.py Outdated Show resolved Hide resolved
python/lsst/obs/base/ingest.py Outdated Show resolved Hide resolved
python/lsst/obs/base/ingest.py Show resolved Hide resolved
python/lsst/obs/base/ingest.py Outdated Show resolved Hide resolved
tests/test_ingest.py Outdated Show resolved Hide resolved
tests/test_ingest.py Show resolved Hide resolved
tests/test_ingest.py Outdated Show resolved Hide resolved
tests/test_yamlCamera.py Show resolved Hide resolved
from lsst.obs.base.yamlCamera import makeCamera


class YamlCameraTestCase(unittest.TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the idea is to flesh this out later with more tests of yaml camera? Do we have any tickets for that as part of the overall yaml camera work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no real idea about YAML camera. I'm doing the minimum possible to get a test so that I could use it in the visit definition test. I imagine that @czwa might have a plan since he's moving HSC to YAML camera.

@timj
Copy link
Member Author

timj commented Mar 10, 2021

@parejkoj I think I've dealt with all your comments. I've added some tests so now ingest.py has more than 98% coverage. The remaining bit needs real data with metadata translation (which I'm not able to do and which is tested by other obs packages).

timj added 25 commits March 10, 2021 13:09
If we are symlinking into the datastore we also need to
symlink in the sidecar file since we might be relying on the
sidecar file to extract metadata.
Sidecar files let us ingest arbitrary files without needing
to write a full metadata translator.
This requires that we use the dummy camera geom YAML camera
which also means some minor changes to detector naming in
the JSON sidecar.
Since in some contexts we have a schemeless URI being compared
with a file URI.
Rather than having two for loops reading index files, only have one.
This can make a difference when using pytest vs using python
to run the test.
This unifies the partitioning done from reading normal files
and reporting failures.
Copy link
Collaborator

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanups; this is much nicer. The explicit testing of a variety of failure modes is good, too. Remaining comments are mostly spelling/punctuation related.

Still some punctuation-less sentences: please check your comments for missing periods.

"""
pass


def _log_msg_counter(noun: Union[int, Iterable]) -> Tuple[int, str]:
"""Count the iterable and return the count and plural modifier.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying do it on this ticket, but this is the kind of thing that might be nice to live base or somewhere we can use it more broadly (until someone wants to make it smarter with es plurals...).

python/lsst/obs/base/ingest.py Outdated Show resolved Hide resolved
configuration item is `True`. If an error is encountered the
`_on_metadata_failure()` method will be called. If no exceptions
result and an error was encountered the returned object will have
a null-instrument class and no datasets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for being explicit about this behavior here.

python/lsst/obs/base/ingest.py Outdated Show resolved Hide resolved
python/lsst/obs/base/ingest.py Outdated Show resolved Hide resolved
tests/test_ingest.py Outdated Show resolved Hide resolved
tests/test_ingest.py Outdated Show resolved Hide resolved
tests/test_ingest.py Outdated Show resolved Hide resolved
tests/test_ingest.py Outdated Show resolved Hide resolved
self.task.run([os.path.join(INGESTDIR, "indexed_data", "bad_implied", "dataset_2.yaml")])

def testCallbacks(self):
"""Test the callbacks for failures."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this block of tests might be clearer if they were all separate testX methods (maybe with a setup_callbacks() to prep things), instead of trying to clear the appropriate callback lists each time. I'll leave it up to you whether that's worth changing.

@timj timj merged commit 7cf335f into master Mar 12, 2021
@timj timj deleted the tickets/DM-27476 branch March 12, 2021 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants