Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-17029: Update LoadReferenceObjectsTask to output fluxes in nanojansky #154

Merged
merged 7 commits into from Apr 5, 2019

Conversation

parejkoj
Copy link
Contributor

@parejkoj parejkoj commented Mar 7, 2019

No description provided.

@parejkoj parejkoj force-pushed the tickets/DM-17029 branch 2 times, most recently from 8b4dcf0 to 02d3bed Compare March 8, 2019 02:04
Copy link
Contributor

@SimonKrughoff SimonKrughoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I don't have a lot of substance. The changes you've made look good in general.


if write:
output.writeFits(filename)
log.info(f"Wrote: {filename}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love f-strings. So compact.

formatter_class=CustomFormatter)
parser.add_argument("path",
help="Directory containing the reference catalogs to overwrite."
" All files with a `.fits` extension in the directory will be processed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention the directory must have been written by the reference catalog sharding task. This implies master_schema.fits must exist.

with concurrent.futures.ProcessPoolExecutor(max_workers=args.nprocesses) as executor:
futures = executor.map(process_one, files, itertools.repeat(args.write), itertools.repeat(args.quiet))
# so that exceptions don't get lost
for future in futures:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this helps exceptions not getting lost.

Copy link
Contributor Author

@parejkoj parejkoj Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's part of how exception handling in futures behaves. I reworded it as follows: # we have to at least loop over the futures, otherwise exceptions will be lost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

configFile.write("\n\n# Updated refcat from version 0 to have nJy flux units\n")
configFile.write("config.format_version=1\n")
if not args.quiet:
print("Added `format_version=1` to config.py")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.info no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a Task, and all of these print statements are outside the futures loop, so I think just print is fine here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point it's not a Task.


if args.write:
with open(os.path.join(args.path, 'config.py'), 'a') as configFile:
configFile.write("\n\n# Updated refcat from version 0 to have nJy flux units\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried that this could result in having both format_version=1 and format_version=0 in the same file since you don't check and delete if a version already exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as format_version=0 in any existing files and there never will be since I introduced that version in this ticket, which also makes version 1 the default (with no way to write version 0).

I guess we could make an explicit check of format_version part of is_old_schema, in addition to the check of types? How does one read a config file like this outside of the pex.Config environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this may be more complicated than you want, but the safest thing is to load the config, set the version and write it back out. This is also presumably future proof.

(untested) code snippet follows:

import lsst.meas.algorithms as meas_alg
config = meas_alg.ingestIndexReferenceTask.DatasetConfig()
config.load(os.path.join(args.path, 'config.py'))
config.format_version = 1
config.save(os.path.join(args.path, 'config.py'))


Notes
-----
Support for old units in reference catalogs will be removed after 18.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may not happen until after v19, right? Maybe it's safer to say it will be removed after the release of late calendar year 2019.

@@ -198,6 +199,7 @@ def testAgainstPersisted(self):
ex2 = testCat.extract('*')
self.assertEqual(set(ex1.keys()), set(ex2.keys()))
for kk in ex1:
print(ex1[kk], ex2[kk])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be printed?

loader = LoadIndexedReferenceObjectsTask(butler=dafPersist.Butler(path))
self.assertEqual(loader.dataset_config.format_version, 0)
result = loader.loadSkyCircle(make_coord(10, 20), 5*lsst.geom.degrees, filterName='a')
# TODO: assert that log.warn messages are emitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO associated with a ticket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. It was a todo for me while I was writing it, but I didn't actually implement that part, and I'm not sure how much it matters. Is it worth checking that we emit some kind of message for old catalogs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of place we don't check log messages. If you are not going to do it, I'm fine just removing the TODOs.

loader = LoadIndexedReferenceObjectsTask(butler=dafPersist.Butler(path))
self.assertEqual(loader.dataset_config.format_version, 1)
result = loader.loadSkyCircle(make_coord(10, 20), 5*lsst.geom.degrees, filterName='a')
# TODO: assert that no log.warn is emitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above re: ticket for TODO.

Copy link
Contributor

@SimonKrughoff SimonKrughoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a suggestion about rewriting the configs, but I'm happy. Don't forget to rebase.

"""
md = PropertyList()
md.set("REFCAT_FORMAT_VERSION", LATEST_FORMAT_VERSION)
catalog.setMetadata(md)
Copy link
Contributor Author

@parejkoj parejkoj Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TallJimbo : would this overwrite the rest of the metadata in the catalog? Should I change it to do this first?

    md = catalog.getMetadata()
    if md is None:
        md = PropertyList()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would indeed be safer. I recall thinking that the initial PropertySet was going to be None in the cases that mattered here, but I could easily believe that was only the case in the first place I used this function, and then I forgot about it in the next place.

@parejkoj
Copy link
Contributor Author

parejkoj commented Apr 1, 2019

@TallJimbo , @SimonKrughoff : should we re-review after I've added and tweaked Jim's additions to improve gen3 compatibility, or is my tweaking it enough of a review?

@TallJimbo
Copy link
Member

Your updates to my changes look good to me, and generally I feel like the spirit of code review has been met if two people have looked at the code.

parejkoj and others added 7 commits April 5, 2019 12:49
Update units in docstring and makeMinimalSchema flux units.

Add format_version to Indexed DatasetConfig, and use setDefaults to override
it when Ingesting new Indexed refcats.

Add functions to check for and convert old refcat fluxes.
Convert old refcats when they are read in, and issue a warning about them.

Convert afw flux->mag code to astropy where possible, and multiply by 1e-9
for the fluxErr/magErr code that will be removed in DM-16903.
Uses the code in LoadReferenceObjects.py to check and convert.
Update ups table file to reflect new bin.src/ directory and add SConscript.
Add bin/ to .gitignore.
The version 0 is "implicit", as all of our version 0 catalogs are: it has
no format_version in the config.
The version 1 is "explicit", and has nJy flux units.

Exclude new test data from flake8.

Add tests of persisted version=0 and version=1 refcats.
Add a check to the reference object loader class that is used in
conjunction with gen3 middleware to verify the flux units of the
catalog.
@parejkoj parejkoj merged commit 5930607 into master Apr 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants