DM-14197: Make obs_test data ingestible #46

kfindeisen · 2018-05-02T20:10:32Z

This PR makes it possible to ingest obs_test files into an external repository, in order to test code that uses ingestion. Rather than modify the files to be ingestible with default settings, I've added a custom ParseTask and some config override files.

This PR also updates obs_test to the modern copyright format (so that I could use it when adding obs/test/ingestCalibs.py) and corrects some formatting and reproducibility issues in the data readme.

r-owen

If you added a few headers to the existing files could you get rid of TestCalibsParseTask and some of these overrides? If so, I recommend doing that in order to simplify the approach and make the data more realistic and a better fit to standard ingestion. That said, this approach works and is acceptable.

r-owen · 2018-05-02T21:00:11Z

python/lsst/obs/test/ingestCalibs.py

+            The date, or "unknown" if the value cannot be found or converted.
+        """
+        date = md.get("TIME-MID").strip()
+        c = date.find("T")


Consider return date.split("T")[0]. Like your code it will return everything up to, but not including, the first T. If there is no T then it returns the whole date.

Using split in this way is a common Python idiom.

r-owen · 2018-05-02T21:03:01Z

python/lsst/obs/test/ingestCalibs.py

+        """
+        basename = os.path.basename(filename)
+        obstype = "unknown"
+        if "flat" in basename:


Consider:

_knownTypes = frozenset(("bias", "flat", "defect")) # known file types as a class variable def getCalibType(...): .... typename = basename.split("_")[0] if typename in self._knownTypes: return typename return "unknown"

My main justification that this only examines the bit of the file name that actually contains the type. It's no a big deal since the names are so simple. I used a class variable so the set only had to be constructed once, but that is an optimization that is a bit silly for so few values. It would be plenty fast enough to use if typename in ("bias", "flat", "defect"): and not even bother with a set.

If you prefer the simplicity of examining the whole basename, consider eliminating repetition of those names with a loop:

for typename in ("bias", "flat", "defect"): if typename in basename: return typename return "unknown"

r-owen · 2018-05-02T21:30:53Z

config/ingest.py

@@ -0,0 +1,12 @@
+# Config override for lsst.pipe.tasks.IngestTask
+config.parse.translation = {'visit': 'OBSID',
+                            'expTime': 'EXPTIME',


Why is "expTime": "EXPTIME" necessary? Surely header name lookup is case blind?

Columns must be present in translation or translators to be read at all. I tried testing the keywords just now and they are, surprisingly, case-sensitive.

(Technically I don't need the exposure time, but it's one of the few lsstSim columns that can be read without special code support so I figured I might as well leave it in.)

kfindeisen · 2018-05-02T21:44:32Z

The problem is that I'm not sure if I can add the right headers to the files and to assembleLsstChannels.py self-consistently, especially since I don't understand why the files have minimal headers in the first place (the script should have passed through all keywords from the original lsstSim data). Am I missing something obvious?

parejkoj · 2018-05-02T21:48:36Z

The original data probably just didn't have the missing headers.

kfindeisen · 2018-05-02T22:01:48Z

I don't think that's it. While we can't find the original data (another reason I'm reluctant to touch assembleLsstChannels.py), the test flats and biases in obs_lsstSim do have sensible-looking headers.

parejkoj · 2018-05-02T22:08:45Z

Hmm... it does look like the files in obs_lsstSim are of similar vintage. @SimonKrughoff : do you have any memory of this?

r-owen · 2018-05-02T23:28:49Z

I may have stripped the headers, though I'm surprised if so because it doesn't show up in the code.

I think adding OBSTYPE and a few things like that is perfectly safe. I would not change the format for any existing keywords.

SimonKrughoff · 2018-05-02T23:37:51Z

@kfindeisen asked about this before on slack. Unfortunately, those data are ancient and I don't know why they are different.

For approximate self-consistency, assembleLsstChannels has been updated with the ability to add header keywords. Since I don't have the original lsstSim data, however, I have not tested this functionality and added the headers by a different route.

The readme described paths on lsst-dev that no longer exist. The document has been rewritten to reflect that, although I've kept the instructions for the data/utils scripts in case they come in handy in the future.

Modernize copyright.

c15dfeb

kfindeisen requested a review from r-owen May 2, 2018 20:10

r-owen approved these changes May 2, 2018

View reviewed changes

kfindeisen added 2 commits May 3, 2018 11:18

Update data readme.

e45f32c

The readme described paths on lsst-dev that no longer exist. The document has been rewritten to reflect that, although I've kept the instructions for the data/utils scripts in case they come in handy in the future.

kfindeisen force-pushed the tickets/DM-14197 branch from 0199471 to e45f32c Compare May 3, 2018 18:20

kfindeisen merged commit e45f32c into master May 3, 2018

kfindeisen mentioned this pull request May 18, 2018

DM-13849: Convert all ap_verify test data to obs_test #48

Merged

ktlim deleted the tickets/DM-14197 branch August 25, 2018 06:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-14197: Make obs_test data ingestible #46

DM-14197: Make obs_test data ingestible #46

kfindeisen commented May 2, 2018

r-owen left a comment

r-owen May 2, 2018

r-owen May 2, 2018

r-owen May 2, 2018

kfindeisen May 2, 2018

kfindeisen commented May 2, 2018

parejkoj commented May 2, 2018

kfindeisen commented May 2, 2018

parejkoj commented May 2, 2018

r-owen commented May 2, 2018

SimonKrughoff commented May 2, 2018

DM-14197: Make obs_test data ingestible #46

DM-14197: Make obs_test data ingestible #46

Conversation

kfindeisen commented May 2, 2018

r-owen left a comment

Choose a reason for hiding this comment

r-owen May 2, 2018

Choose a reason for hiding this comment

r-owen May 2, 2018

Choose a reason for hiding this comment

r-owen May 2, 2018

Choose a reason for hiding this comment

kfindeisen May 2, 2018

Choose a reason for hiding this comment

kfindeisen commented May 2, 2018

parejkoj commented May 2, 2018

kfindeisen commented May 2, 2018

parejkoj commented May 2, 2018

r-owen commented May 2, 2018

SimonKrughoff commented May 2, 2018