New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-14197: Make obs_test data ingestible #46
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you added a few headers to the existing files could you get rid of TestCalibsParseTask
and some of these overrides? If so, I recommend doing that in order to simplify the approach and make the data more realistic and a better fit to standard ingestion. That said, this approach works and is acceptable.
python/lsst/obs/test/ingestCalibs.py
Outdated
The date, or "unknown" if the value cannot be found or converted. | ||
""" | ||
date = md.get("TIME-MID").strip() | ||
c = date.find("T") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider return date.split("T")[0]
. Like your code it will return everything up to, but not including, the first T
. If there is no T then it returns the whole date
.
Using split
in this way is a common Python idiom.
python/lsst/obs/test/ingestCalibs.py
Outdated
""" | ||
basename = os.path.basename(filename) | ||
obstype = "unknown" | ||
if "flat" in basename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider:
_knownTypes = frozenset(("bias", "flat", "defect")) # known file types as a class variable
def getCalibType(...):
....
typename = basename.split("_")[0]
if typename in self._knownTypes:
return typename
return "unknown"
My main justification that this only examines the bit of the file name that actually contains the type. It's no a big deal since the names are so simple. I used a class variable so the set only had to be constructed once, but that is an optimization that is a bit silly for so few values. It would be plenty fast enough to use if typename in ("bias", "flat", "defect"):
and not even bother with a set.
If you prefer the simplicity of examining the whole basename
, consider eliminating repetition of those names with a loop:
for typename in ("bias", "flat", "defect"):
if typename in basename:
return typename
return "unknown"
@@ -0,0 +1,12 @@ | |||
# Config override for lsst.pipe.tasks.IngestTask | |||
config.parse.translation = {'visit': 'OBSID', | |||
'expTime': 'EXPTIME', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is "expTime": "EXPTIME"
necessary? Surely header name lookup is case blind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Columns must be present in translation
or translators
to be read at all. I tried testing the keywords just now and they are, surprisingly, case-sensitive.
(Technically I don't need the exposure time, but it's one of the few lsstSim columns that can be read without special code support so I figured I might as well leave it in.)
The problem is that I'm not sure if I can add the right headers to the files and to |
The original data probably just didn't have the missing headers. |
I don't think that's it. While we can't find the original data (another reason I'm reluctant to touch |
Hmm... it does look like the files in obs_lsstSim are of similar vintage. @SimonKrughoff : do you have any memory of this? |
I may have stripped the headers, though I'm surprised if so because it doesn't show up in the code. I think adding OBSTYPE and a few things like that is perfectly safe. I would not change the format for any existing keywords. |
@kfindeisen asked about this before on slack. Unfortunately, those data are ancient and I don't know why they are different. |
For approximate self-consistency, assembleLsstChannels has been updated with the ability to add header keywords. Since I don't have the original lsstSim data, however, I have not tested this functionality and added the headers by a different route.
The readme described paths on lsst-dev that no longer exist. The document has been rewritten to reflect that, although I've kept the instructions for the data/utils scripts in case they come in handy in the future.
0199471
to
e45f32c
Compare
This PR makes it possible to ingest
obs_test
files into an external repository, in order to test code that uses ingestion. Rather than modify the files to be ingestible with default settings, I've added a customParseTask
and some config override files.This PR also updates
obs_test
to the modern copyright format (so that I could use it when addingobs/test/ingestCalibs.py
) and corrects some formatting and reproducibility issues in the data readme.