New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-27476: Update raw ingest to use JSON metadata files #357
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition of a yaml camera for testing: that makes me a bit more secure that yaml camera is viable long term.
I'm concerned about the broad exception clauses. For example, under what conditions do you expect reading an index file to fail? Are all such cases ones where you want the exception masked as a log message, or are there only a few that would be reasonable? I'm assuming we are in control of the generation of the index files, so we should be in control of ensuring they are properly formatted, and also of being told up front when they are not.
I found a few cases where I think the code would fail as written, so I'm concerned about test coverage; also, there are a lot of if
branches here. Maybe some of those failures are being masked by the failFast
config being false
by default and the broad except Exception
clauses? Have you looked at the coverage output for the new ingest code to see how many of the branches are being touched?
There are a lot of sentences in comments without periods, and sometimes without starting capitals. I only commented on a few at the start of the review, but please go through and try to fix them all.
from lsst.obs.base.yamlCamera import makeCamera | ||
|
||
|
||
class YamlCameraTestCase(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the idea is to flesh this out later with more tests of yaml camera? Do we have any tickets for that as part of the overall yaml camera work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no real idea about YAML camera. I'm doing the minimum possible to get a test so that I could use it in the visit definition test. I imagine that @czwa might have a plan since he's moving HSC to YAML camera.
@parejkoj I think I've dealt with all your comments. I've added some tests so now ingest.py has more than 98% coverage. The remaining bit needs real data with metadata translation (which I'm not able to do and which is tested by other obs packages). |
If we are symlinking into the datastore we also need to symlink in the sidecar file since we might be relying on the sidecar file to extract metadata.
Sidecar files let us ingest arbitrary files without needing to write a full metadata translator.
This requires that we use the dummy camera geom YAML camera which also means some minor changes to detector naming in the JSON sidecar.
Since in some contexts we have a schemeless URI being compared with a file URI.
Rather than having two for loops reading index files, only have one.
This can make a difference when using pytest vs using python to run the test.
This unifies the partitioning done from reading normal files and reporting failures.
Previously we were only doing it if at least one file from the index was good.
Use different combinations to explore all failure modes.
Put them all in one place and use slightly better directory names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanups; this is much nicer. The explicit testing of a variety of failure modes is good, too. Remaining comments are mostly spelling/punctuation related.
Still some punctuation-less sentences: please check your comments for missing periods.
""" | ||
pass | ||
|
||
|
||
def _log_msg_counter(noun: Union[int, Iterable]) -> Tuple[int, str]: | ||
"""Count the iterable and return the count and plural modifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying do it on this ticket, but this is the kind of thing that might be nice to live base or somewhere we can use it more broadly (until someone wants to make it smarter with es plurals...).
configuration item is `True`. If an error is encountered the | ||
`_on_metadata_failure()` method will be called. If no exceptions | ||
result and an error was encountered the returned object will have | ||
a null-instrument class and no datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for being explicit about this behavior here.
self.task.run([os.path.join(INGESTDIR, "indexed_data", "bad_implied", "dataset_2.yaml")]) | ||
|
||
def testCallbacks(self): | ||
"""Test the callbacks for failures.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this block of tests might be clearer if they were all separate testX
methods (maybe with a setup_callbacks()
to prep things), instead of trying to clear the appropriate callback lists each time. I'll leave it up to you whether that's worth changing.
Depends on lsst/astro_metadata_translator#49