New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-17029: Update LoadReferenceObjectsTask to output fluxes in nanojansky #154
Conversation
8b4dcf0
to
02d3bed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I don't have a lot of substance. The changes you've made look good in general.
|
||
if write: | ||
output.writeFits(filename) | ||
log.info(f"Wrote: {filename}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love f-strings. So compact.
bin.src/convert_refcat_to_nJy.py
Outdated
formatter_class=CustomFormatter) | ||
parser.add_argument("path", | ||
help="Directory containing the reference catalogs to overwrite." | ||
" All files with a `.fits` extension in the directory will be processed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention the directory must have been written by the reference catalog sharding task. This implies master_schema.fits
must exist.
with concurrent.futures.ProcessPoolExecutor(max_workers=args.nprocesses) as executor: | ||
futures = executor.map(process_one, files, itertools.repeat(args.write), itertools.repeat(args.quiet)) | ||
# so that exceptions don't get lost | ||
for future in futures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this helps exceptions not getting lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of how exception handling in futures
behaves. I reworded it as follows: # we have to at least loop over the futures, otherwise exceptions will be lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
configFile.write("\n\n# Updated refcat from version 0 to have nJy flux units\n") | ||
configFile.write("config.format_version=1\n") | ||
if not args.quiet: | ||
print("Added `format_version=1` to config.py") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a Task, and all of these print statements are outside the futures loop, so I think just print
is fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point it's not a Task
.
bin.src/convert_refcat_to_nJy.py
Outdated
|
||
if args.write: | ||
with open(os.path.join(args.path, 'config.py'), 'a') as configFile: | ||
configFile.write("\n\n# Updated refcat from version 0 to have nJy flux units\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried that this could result in having both format_version=1
and format_version=0
in the same file since you don't check and delete if a version already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such thing as format_version=0
in any existing files and there never will be since I introduced that version in this ticket, which also makes version 1 the default (with no way to write version 0).
I guess we could make an explicit check of format_version
part of is_old_schema
, in addition to the check of types? How does one read a config file like this outside of the pex.Config
environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this may be more complicated than you want, but the safest thing is to load the config, set the version and write it back out. This is also presumably future proof.
(untested) code snippet follows:
import lsst.meas.algorithms as meas_alg
config = meas_alg.ingestIndexReferenceTask.DatasetConfig()
config.load(os.path.join(args.path, 'config.py'))
config.format_version = 1
config.save(os.path.join(args.path, 'config.py'))
|
||
Notes | ||
----- | ||
Support for old units in reference catalogs will be removed after 18.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may not happen until after v19, right? Maybe it's safer to say it will be removed after the release of late calendar year 2019.
tests/test_htmIndex.py
Outdated
@@ -198,6 +199,7 @@ def testAgainstPersisted(self): | |||
ex2 = testCat.extract('*') | |||
self.assertEqual(set(ex1.keys()), set(ex2.keys())) | |||
for kk in ex1: | |||
print(ex1[kk], ex2[kk]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to be printed?
tests/test_htmIndex.py
Outdated
loader = LoadIndexedReferenceObjectsTask(butler=dafPersist.Butler(path)) | ||
self.assertEqual(loader.dataset_config.format_version, 0) | ||
result = loader.loadSkyCircle(make_coord(10, 20), 5*lsst.geom.degrees, filterName='a') | ||
# TODO: assert that log.warn messages are emitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO associated with a ticket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It was a todo for me while I was writing it, but I didn't actually implement that part, and I'm not sure how much it matters. Is it worth checking that we emit some kind of message for old catalogs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of place we don't check log messages. If you are not going to do it, I'm fine just removing the TODOs.
tests/test_htmIndex.py
Outdated
loader = LoadIndexedReferenceObjectsTask(butler=dafPersist.Butler(path)) | ||
self.assertEqual(loader.dataset_config.format_version, 1) | ||
result = loader.loadSkyCircle(make_coord(10, 20), 5*lsst.geom.degrees, filterName='a') | ||
# TODO: assert that no log.warn is emitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above re: ticket for TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a suggestion about rewriting the configs, but I'm happy. Don't forget to rebase.
68fb0a0
to
b85550a
Compare
ac9e18e
to
7d430a2
Compare
""" | ||
md = PropertyList() | ||
md.set("REFCAT_FORMAT_VERSION", LATEST_FORMAT_VERSION) | ||
catalog.setMetadata(md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TallJimbo : would this overwrite the rest of the metadata in the catalog? Should I change it to do this first?
md = catalog.getMetadata()
if md is None:
md = PropertyList()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would indeed be safer. I recall thinking that the initial PropertySet
was going to be None
in the cases that mattered here, but I could easily believe that was only the case in the first place I used this function, and then I forgot about it in the next place.
@TallJimbo , @SimonKrughoff : should we re-review after I've added and tweaked Jim's additions to improve gen3 compatibility, or is my tweaking it enough of a review? |
Your updates to my changes look good to me, and generally I feel like the spirit of code review has been met if two people have looked at the code. |
Update units in docstring and makeMinimalSchema flux units. Add format_version to Indexed DatasetConfig, and use setDefaults to override it when Ingesting new Indexed refcats. Add functions to check for and convert old refcat fluxes. Convert old refcats when they are read in, and issue a warning about them. Convert afw flux->mag code to astropy where possible, and multiply by 1e-9 for the fluxErr/magErr code that will be removed in DM-16903.
Uses the code in LoadReferenceObjects.py to check and convert. Update ups table file to reflect new bin.src/ directory and add SConscript. Add bin/ to .gitignore.
The version 0 is "implicit", as all of our version 0 catalogs are: it has no format_version in the config. The version 1 is "explicit", and has nJy flux units. Exclude new test data from flake8. Add tests of persisted version=0 and version=1 refcats.
Add a check to the reference object loader class that is used in conjunction with gen3 middleware to verify the flux units of the catalog.
adb5b0b
to
165f6c3
Compare
No description provided.