DM-17029: Update LoadReferenceObjectsTask to output fluxes in nanojansky #154

parejkoj · 2019-03-07T01:15:30Z

No description provided.

SimonKrughoff

Looks great. I don't have a lot of substance. The changes you've made look good in general.

SimonKrughoff · 2019-03-11T23:43:08Z

bin.src/convert_refcat_to_nJy.py

+
+    if write:
+        output.writeFits(filename)
+        log.info(f"Wrote: {filename}")


I love f-strings. So compact.

SimonKrughoff · 2019-03-11T23:45:25Z

bin.src/convert_refcat_to_nJy.py

+                                     formatter_class=CustomFormatter)
+    parser.add_argument("path",
+                        help="Directory containing the reference catalogs to overwrite."
+                        " All files with a `.fits` extension in the directory will be processed.")


Maybe mention the directory must have been written by the reference catalog sharding task. This implies master_schema.fits must exist.

SimonKrughoff · 2019-03-11T23:47:04Z

bin.src/convert_refcat_to_nJy.py

+    with concurrent.futures.ProcessPoolExecutor(max_workers=args.nprocesses) as executor:
+        futures = executor.map(process_one, files, itertools.repeat(args.write), itertools.repeat(args.quiet))
+        # so that exceptions don't get lost
+        for future in futures:


I don't understand how this helps exceptions not getting lost.

It's part of how exception handling in futures behaves. I reworded it as follows: # we have to at least loop over the futures, otherwise exceptions will be lost

Sounds good.

SimonKrughoff · 2019-03-11T23:48:12Z

bin.src/convert_refcat_to_nJy.py

+            configFile.write("\n\n# Updated refcat from version 0 to have nJy flux units\n")
+            configFile.write("config.format_version=1\n")
+        if not args.quiet:
+            print("Added `format_version=1` to config.py")


log.info no?

It's not a Task, and all of these print statements are outside the futures loop, so I think just print is fine here.

Oh, good point it's not a Task.

SimonKrughoff · 2019-03-11T23:49:11Z

bin.src/convert_refcat_to_nJy.py

+
+    if args.write:
+        with open(os.path.join(args.path, 'config.py'), 'a') as configFile:
+            configFile.write("\n\n# Updated refcat from version 0 to have nJy flux units\n")


I'm a little worried that this could result in having both format_version=1 and format_version=0 in the same file since you don't check and delete if a version already exists.

There is no such thing as format_version=0 in any existing files and there never will be since I introduced that version in this ticket, which also makes version 1 the default (with no way to write version 0).

I guess we could make an explicit check of format_version part of is_old_schema, in addition to the check of types? How does one read a config file like this outside of the pex.Config environment?

I understand this may be more complicated than you want, but the safest thing is to load the config, set the version and write it back out. This is also presumably future proof.

(untested) code snippet follows:

import lsst.meas.algorithms as meas_alg config = meas_alg.ingestIndexReferenceTask.DatasetConfig() config.load(os.path.join(args.path, 'config.py')) config.format_version = 1 config.save(os.path.join(args.path, 'config.py'))

SimonKrughoff · 2019-03-12T00:12:23Z

python/lsst/meas/algorithms/loadReferenceObjects.py

+
+    Notes
+    -----
+    Support for old units in reference catalogs will be removed after 18.0.


I think this may not happen until after v19, right? Maybe it's safer to say it will be removed after the release of late calendar year 2019.

SimonKrughoff · 2019-03-12T00:19:46Z

tests/test_htmIndex.py

@@ -198,6 +199,7 @@ def testAgainstPersisted(self):
        ex2 = testCat.extract('*')
        self.assertEqual(set(ex1.keys()), set(ex2.keys()))
        for kk in ex1:
+            print(ex1[kk], ex2[kk])


Does this really need to be printed?

SimonKrughoff · 2019-03-12T00:20:47Z

tests/test_htmIndex.py

+        loader = LoadIndexedReferenceObjectsTask(butler=dafPersist.Butler(path))
+        self.assertEqual(loader.dataset_config.format_version, 0)
+        result = loader.loadSkyCircle(make_coord(10, 20), 5*lsst.geom.degrees, filterName='a')
+        # TODO: assert that log.warn messages are emitted.


Is this TODO associated with a ticket?

I don't know. It was a todo for me while I was writing it, but I didn't actually implement that part, and I'm not sure how much it matters. Is it worth checking that we emit some kind of message for old catalogs?

There are a lot of place we don't check log messages. If you are not going to do it, I'm fine just removing the TODOs.

SimonKrughoff · 2019-03-12T00:21:19Z

tests/test_htmIndex.py

+        loader = LoadIndexedReferenceObjectsTask(butler=dafPersist.Butler(path))
+        self.assertEqual(loader.dataset_config.format_version, 1)
+        result = loader.loadSkyCircle(make_coord(10, 20), 5*lsst.geom.degrees, filterName='a')
+        # TODO: assert that no log.warn is emitted.


Same as above re: ticket for TODO.

SimonKrughoff

Made a suggestion about rewriting the configs, but I'm happy. Don't forget to rebase.

parejkoj · 2019-04-01T20:59:03Z

python/lsst/meas/algorithms/ingestIndexReferenceTask.py

+    """
+    md = PropertyList()
+    md.set("REFCAT_FORMAT_VERSION", LATEST_FORMAT_VERSION)
+    catalog.setMetadata(md)


@TallJimbo : would this overwrite the rest of the metadata in the catalog? Should I change it to do this first?

md = catalog.getMetadata() if md is None: md = PropertyList()

That would indeed be safer. I recall thinking that the initial PropertySet was going to be None in the cases that mattered here, but I could easily believe that was only the case in the first place I used this function, and then I forgot about it in the next place.

parejkoj · 2019-04-01T23:40:10Z

@TallJimbo , @SimonKrughoff : should we re-review after I've added and tweaked Jim's additions to improve gen3 compatibility, or is my tweaking it enough of a review?

TallJimbo · 2019-04-02T00:09:36Z

Your updates to my changes look good to me, and generally I feel like the spirit of code review has been met if two people have looked at the code.

Update units in docstring and makeMinimalSchema flux units. Add format_version to Indexed DatasetConfig, and use setDefaults to override it when Ingesting new Indexed refcats. Add functions to check for and convert old refcat fluxes. Convert old refcats when they are read in, and issue a warning about them. Convert afw flux->mag code to astropy where possible, and multiply by 1e-9 for the fluxErr/magErr code that will be removed in DM-16903.

Uses the code in LoadReferenceObjects.py to check and convert. Update ups table file to reflect new bin.src/ directory and add SConscript. Add bin/ to .gitignore.

The version 0 is "implicit", as all of our version 0 catalogs are: it has no format_version in the config. The version 1 is "explicit", and has nJy flux units. Exclude new test data from flake8. Add tests of persisted version=0 and version=1 refcats.

Add a check to the reference object loader class that is used in conjunction with gen3 middleware to verify the flux units of the catalog.

parejkoj force-pushed the tickets/DM-17029 branch 2 times, most recently from 8b4dcf0 to 02d3bed Compare March 8, 2019 02:04

SimonKrughoff approved these changes Mar 12, 2019

View reviewed changes

SimonKrughoff approved these changes Mar 14, 2019

View reviewed changes

parejkoj force-pushed the tickets/DM-17029 branch from 68fb0a0 to b85550a Compare March 14, 2019 22:55

parejkoj force-pushed the tickets/DM-17029 branch from ac9e18e to 7d430a2 Compare April 1, 2019 20:41

parejkoj commented Apr 1, 2019

View reviewed changes

parejkoj and others added 7 commits April 5, 2019 12:49

Add script to convert existing refcats to nJy fluxes

eac5476

Uses the code in LoadReferenceObjects.py to check and convert. Update ups table file to reflect new bin.src/ directory and add SConscript. Add bin/ to .gitignore.

Verify ref_cat units in gen3 code

41c2970

Add a check to the reference object loader class that is used in conjunction with gen3 middleware to verify the flux units of the catalog.

Use almost equality: new Conda env has ~1e-16 difference here

b17887b

Put refcat format version in catalog headers, too.

e0ee27f

Cleanups to versioning in the metadata

165f6c3

parejkoj force-pushed the tickets/DM-17029 branch from adb5b0b to 165f6c3 Compare April 5, 2019 19:49

parejkoj merged commit 5930607 into master Apr 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-17029: Update LoadReferenceObjectsTask to output fluxes in nanojansky #154

DM-17029: Update LoadReferenceObjectsTask to output fluxes in nanojansky #154

parejkoj commented Mar 7, 2019

SimonKrughoff left a comment

SimonKrughoff Mar 11, 2019

SimonKrughoff Mar 11, 2019

SimonKrughoff Mar 11, 2019

parejkoj Mar 13, 2019 •

edited

SimonKrughoff Mar 14, 2019

SimonKrughoff Mar 11, 2019

parejkoj Mar 13, 2019

SimonKrughoff Mar 14, 2019

SimonKrughoff Mar 11, 2019

parejkoj Mar 13, 2019

SimonKrughoff Mar 14, 2019

SimonKrughoff Mar 12, 2019

SimonKrughoff Mar 12, 2019

SimonKrughoff Mar 12, 2019

parejkoj Mar 13, 2019

SimonKrughoff Mar 14, 2019

SimonKrughoff Mar 12, 2019

SimonKrughoff left a comment

parejkoj Apr 1, 2019 •

edited

TallJimbo Apr 1, 2019

parejkoj commented Apr 1, 2019

TallJimbo commented Apr 2, 2019

DM-17029: Update LoadReferenceObjectsTask to output fluxes in nanojansky #154

DM-17029: Update LoadReferenceObjectsTask to output fluxes in nanojansky #154

Conversation

parejkoj commented Mar 7, 2019

SimonKrughoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parejkoj Mar 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SimonKrughoff left a comment

Choose a reason for hiding this comment

parejkoj Apr 1, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parejkoj commented Apr 1, 2019

TallJimbo commented Apr 2, 2019

parejkoj Mar 13, 2019 •

edited

parejkoj Apr 1, 2019 •

edited