DM-27476: Update raw ingest to use JSON metadata files #357

timj · 2021-02-22T19:12:26Z

Depends on lsst/astro_metadata_translator#49

parejkoj

Thanks for the addition of a yaml camera for testing: that makes me a bit more secure that yaml camera is viable long term.

I'm concerned about the broad exception clauses. For example, under what conditions do you expect reading an index file to fail? Are all such cases ones where you want the exception masked as a log message, or are there only a few that would be reasonable? I'm assuming we are in control of the generation of the index files, so we should be in control of ensuring they are properly formatted, and also of being told up front when they are not.

I found a few cases where I think the code would fail as written, so I'm concerned about test coverage; also, there are a lot of if branches here. Maybe some of those failures are being masked by the failFast config being false by default and the broad except Exception clauses? Have you looked at the coverage output for the new ingest code to see how many of the branches are being touched?

There are a lot of sentences in comments without periods, and sometimes without starting capitals. I only commented on a few at the start of the review, but please go through and try to fix them all.

python/lsst/obs/base/ingest.py

tests/test_ingest.py

tests/test_yamlCamera.py

parejkoj · 2021-03-05T19:55:33Z

tests/test_yamlCamera.py

+from lsst.obs.base.yamlCamera import makeCamera
+
+
+class YamlCameraTestCase(unittest.TestCase):


I assume the idea is to flesh this out later with more tests of yaml camera? Do we have any tickets for that as part of the overall yaml camera work?

I have no real idea about YAML camera. I'm doing the minimum possible to get a test so that I could use it in the visit definition test. I imagine that @czwa might have a plan since he's moving HSC to YAML camera.

timj · 2021-03-10T01:54:46Z

@parejkoj I think I've dealt with all your comments. I've added some tests so now ingest.py has more than 98% coverage. The remaining bit needs real data with metadata translation (which I'm not able to do and which is tested by other obs packages).

If we are symlinking into the datastore we also need to symlink in the sidecar file since we might be relying on the sidecar file to extract metadata.

Sidecar files let us ingest arbitrary files without needing to write a full metadata translator.

This requires that we use the dummy camera geom YAML camera which also means some minor changes to detector naming in the JSON sidecar.

Since in some contexts we have a schemeless URI being compared with a file URI.

Rather than having two for loops reading index files, only have one.

This can make a difference when using pytest vs using python to run the test.

This unifies the partitioning done from reading normal files and reporting failures.

Previously we were only doing it if at least one file from the index was good.

Use different combinations to explore all failure modes.

Put them all in one place and use slightly better directory names.

parejkoj

Thanks for the cleanups; this is much nicer. The explicit testing of a variety of failure modes is good, too. Remaining comments are mostly spelling/punctuation related.

Still some punctuation-less sentences: please check your comments for missing periods.

parejkoj · 2021-03-12T19:22:17Z

python/lsst/obs/base/ingest.py

    """
    pass


+def _log_msg_counter(noun: Union[int, Iterable]) -> Tuple[int, str]:
+    """Count the iterable and return the count and plural modifier.


I'm not saying do it on this ticket, but this is the kind of thing that might be nice to live base or somewhere we can use it more broadly (until someone wants to make it smarter with es plurals...).

python/lsst/obs/base/ingest.py

parejkoj · 2021-03-12T19:23:09Z

python/lsst/obs/base/ingest.py

+        configuration item is `True`.  If an error is encountered the
+        `_on_metadata_failure()` method will be called. If no exceptions
+        result and an error was encountered the returned object will have
+        a null-instrument class and no datasets.


Thank you for being explicit about this behavior here.

python/lsst/obs/base/ingest.py

tests/test_ingest.py

parejkoj · 2021-03-12T19:50:23Z

tests/test_ingest.py

+            self.task.run([os.path.join(INGESTDIR, "indexed_data", "bad_implied", "dataset_2.yaml")])
+
+    def testCallbacks(self):
+        """Test the callbacks for failures."""


I feel like this block of tests might be clearer if they were all separate testX methods (maybe with a setup_callbacks() to prep things), instead of trying to clear the appropriate callback lists each time. I'll leave it up to you whether that's worth changing.

timj changed the title ~~DM-27476: Update raw ingest for JSON metadata files~~ DM-27476: Update raw ingest to use JSON metadata files Feb 22, 2021

timj mentioned this pull request Mar 5, 2021

DM-29071: report success/failure via callbacks in RawIngestTask #360

Merged

parejkoj requested changes Mar 5, 2021

View reviewed changes

timj force-pushed the tickets/DM-27476 branch from 47ef03f to 1ff3038 Compare March 8, 2021 15:48

timj added 25 commits March 10, 2021 13:09

Add pipe base task timer to ingest

30c53b1

Add support for ingest via sidecar files

c0406e5

Put code for calculating formatter from dataId into new method

7cfb53f

Support JSON index files for raw ingest

a86921b

Allow ObservationInfo in index file to override raw metadata index

6b66363

Move DummyCam instrument class to shared location to allow reuse

98c9303

Improve ingest debug message to include whether sidecar was read

30f7316

Stop using realpath since we expect JSON files to be with the symlink

85178a4

Allow ingest test to change dataset type

3ed4bba

Allow for sidecar files for in place test

40d15d8

If we are symlinking into the datastore we also need to symlink in the sidecar file since we might be relying on the sidecar file to extract metadata.

Add proper ingest test using sidecars

7783b4b

Sidecar files let us ingest arbitrary files without needing to write a full metadata translator.

Update datastore class to remove deprecation

bfc398c

Add basic YAML Camera test

e636828

Add test for define-visit

698cce5

This requires that we use the dummy camera geom YAML camera which also means some minor changes to detector naming in the JSON sidecar.

Add ingest test using index files

15e9fd6

Update JSON __CONTENT__ key to match new upstream

b7f2874

Compare file ospaths when comparing URIs

3c32ab6

Since in some contexts we have a schemeless URI being compared with a file URI.

Docstring fixes

b2f8daa

Make some docstring cleanups

df24f80

Rearrange index reading code

e889e2b

Rather than having two for loops reading index files, only have one.

Use absolute path for test dir

c2b076d

This can make a difference when using pytest vs using python to run the test.

use defaultdict rather than setdefault

0c06eb6

Correctly report number of files read from index files

30f39e0

This unifies the partitioning done from reading normal files and reporting failures.

Use helper function for calculating the "s" on plurals in log messages

bda5dbe

Correctly trap bad translation from index file metadata

46deffe

timj added 6 commits March 10, 2021 13:09

Always extend bad file list with bad index file content

e39c55c

Previously we were only doing it if at least one file from the index was good.

Also ensure that bad index files are reported as failures

c121f6c

When reading index files retain the input file order

97387cb

Enhance ingest tests

b88bd53

Use different combinations to explore all failure modes.

Add note concerning the logic when an index is in the file list

f48853d

Reorganize the ingest test data files

aa39223

Put them all in one place and use slightly better directory names.

timj force-pushed the tickets/DM-27476 branch from 8ae8559 to aa39223 Compare March 10, 2021 20:09

timj added 2 commits March 10, 2021 13:17

Make docstrings pydocstyle compliant

1b59383

Fix docstring for ObservationInfo

7c2ac66

parejkoj approved these changes Mar 12, 2021

View reviewed changes

Fix comment issues

31b668e

timj merged commit 7cf335f into master Mar 12, 2021

timj deleted the tickets/DM-27476 branch March 12, 2021 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-27476: Update raw ingest to use JSON metadata files #357

DM-27476: Update raw ingest to use JSON metadata files #357

timj commented Feb 22, 2021 •

edited

parejkoj left a comment

parejkoj Mar 5, 2021

timj Mar 8, 2021

timj commented Mar 10, 2021

parejkoj left a comment

parejkoj Mar 12, 2021

parejkoj Mar 12, 2021

parejkoj Mar 12, 2021

		from lsst.obs.base.yamlCamera import makeCamera


		class YamlCameraTestCase(unittest.TestCase):

DM-27476: Update raw ingest to use JSON metadata files #357

DM-27476: Update raw ingest to use JSON metadata files #357

Conversation

timj commented Feb 22, 2021 • edited

parejkoj left a comment

Choose a reason for hiding this comment

parejkoj Mar 5, 2021

Choose a reason for hiding this comment

timj Mar 8, 2021

Choose a reason for hiding this comment

timj commented Mar 10, 2021

parejkoj left a comment

Choose a reason for hiding this comment

parejkoj Mar 12, 2021

Choose a reason for hiding this comment

parejkoj Mar 12, 2021

Choose a reason for hiding this comment

parejkoj Mar 12, 2021

Choose a reason for hiding this comment

timj commented Feb 22, 2021 •

edited