Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-14197: Make obs_test data ingestible #46

Merged
merged 3 commits into from May 3, 2018
Merged

Conversation

kfindeisen
Copy link
Member

This PR makes it possible to ingest obs_test files into an external repository, in order to test code that uses ingestion. Rather than modify the files to be ingestible with default settings, I've added a custom ParseTask and some config override files.

This PR also updates obs_test to the modern copyright format (so that I could use it when adding obs/test/ingestCalibs.py) and corrects some formatting and reproducibility issues in the data readme.

@kfindeisen kfindeisen requested a review from r-owen May 2, 2018 20:10
Copy link
Contributor

@r-owen r-owen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you added a few headers to the existing files could you get rid of TestCalibsParseTask and some of these overrides? If so, I recommend doing that in order to simplify the approach and make the data more realistic and a better fit to standard ingestion. That said, this approach works and is acceptable.

The date, or "unknown" if the value cannot be found or converted.
"""
date = md.get("TIME-MID").strip()
c = date.find("T")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider return date.split("T")[0]. Like your code it will return everything up to, but not including, the first T. If there is no T then it returns the whole date.

Using split in this way is a common Python idiom.

"""
basename = os.path.basename(filename)
obstype = "unknown"
if "flat" in basename:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider:

_knownTypes = frozenset(("bias", "flat", "defect"))  # known file types as a class variable
def getCalibType(...):
    ....
    typename = basename.split("_")[0]
    if typename in self._knownTypes:
        return typename
    return "unknown"

My main justification that this only examines the bit of the file name that actually contains the type. It's no a big deal since the names are so simple. I used a class variable so the set only had to be constructed once, but that is an optimization that is a bit silly for so few values. It would be plenty fast enough to use if typename in ("bias", "flat", "defect"): and not even bother with a set.

If you prefer the simplicity of examining the whole basename, consider eliminating repetition of those names with a loop:

for typename in ("bias", "flat", "defect"):
    if typename in basename:
        return typename
return "unknown"

@@ -0,0 +1,12 @@
# Config override for lsst.pipe.tasks.IngestTask
config.parse.translation = {'visit': 'OBSID',
'expTime': 'EXPTIME',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "expTime": "EXPTIME" necessary? Surely header name lookup is case blind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns must be present in translation or translators to be read at all. I tried testing the keywords just now and they are, surprisingly, case-sensitive.

(Technically I don't need the exposure time, but it's one of the few lsstSim columns that can be read without special code support so I figured I might as well leave it in.)

@kfindeisen
Copy link
Member Author

The problem is that I'm not sure if I can add the right headers to the files and to assembleLsstChannels.py self-consistently, especially since I don't understand why the files have minimal headers in the first place (the script should have passed through all keywords from the original lsstSim data). Am I missing something obvious?

@parejkoj
Copy link
Contributor

parejkoj commented May 2, 2018

The original data probably just didn't have the missing headers.

@kfindeisen
Copy link
Member Author

I don't think that's it. While we can't find the original data (another reason I'm reluctant to touch assembleLsstChannels.py), the test flats and biases in obs_lsstSim do have sensible-looking headers.

@parejkoj
Copy link
Contributor

parejkoj commented May 2, 2018

Hmm... it does look like the files in obs_lsstSim are of similar vintage. @SimonKrughoff : do you have any memory of this?

@r-owen
Copy link
Contributor

r-owen commented May 2, 2018

I may have stripped the headers, though I'm surprised if so because it doesn't show up in the code.

I think adding OBSTYPE and a few things like that is perfectly safe. I would not change the format for any existing keywords.

@SimonKrughoff
Copy link
Contributor

@kfindeisen asked about this before on slack. Unfortunately, those data are ancient and I don't know why they are different.

For approximate self-consistency, assembleLsstChannels has been
updated with the ability to add header keywords. Since I don't have
the original lsstSim data, however, I have not tested this
functionality and added the headers by a different route.
The readme described paths on lsst-dev that no longer exist. The
document has been rewritten to reflect that, although I've kept the
instructions for the data/utils scripts in case they come in handy
in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants