DM-23728: Cleanup ci_hsc_gen2 to use new convert script instead of custom one #213

parejkoj · 2020-03-25T22:58:21Z

No description provided.

python/lsst/obs/base/script/convertGen2RepoToGen3.py

timj

Looks okay. You can ignore my comment if you wish.

python/lsst/obs/base/script/convertGen2RepoToGen3.py

timj · 2020-03-27T18:09:54Z

python/lsst/obs/base/script/convertGen2RepoToGen3.py

@@ -156,6 +158,9 @@ def convert(gen2root, gen3root, instrumentClass, calibFilterType,
    calibs : `str`, optional
        Path to the gen2 calibration repository to be converted.
        If a relative path, it is assumed to be relative to ``gen2root``.
+    reruns : `list` [`str`], optional
+        List of reruns to convert. They will be placed in the
+        ``instrumentClass.getName()`` collection.


Or is it shared/instrument/converted?

Thanks for catching that.

Each rerun should go into a different collection, not the same one, and their names should be derived from the Gen2 repo names somehow.

@TallJimbo Is there a way we can codify that with some unittests in obs_base, before rolling it out to ci_hsc? If we want to enforce it, we should do so at the deepest level, probably on its own ticket.

I think the problem will become apparent as soon as reruns contain the same output datasets. You can only have one dataset with a given dataId per collection and reruns tend to have many duplicate datasets by definition.

If the script generates the chainName and runName passed to the Rerun from the path, unless explicitly overridden, that should cover it pretty well.

Tests in obs_base are of course possible, but making a useful mock repo to convert is probably at least a week of work on it's own, and with that ratio I'm willing to live with just coverage in ci_hsc.

timj · 2020-03-27T21:31:24Z

python/lsst/obs/base/script/convertGen2RepoToGen3.py

@@ -176,6 +181,9 @@ def convert(gen2root, gen3root, instrumentClass, calibFilterType,

    configure_translators(instrument, calibFilterType, convertRepoConfig.ccdKey)

+    rerunsArg = [Rerun(rerun, runName=f"shared/{instrument.getName()}/converted",


So should the runName be something like rerun/{instrument.getName()}{rerun} ?

Can we please do this as part of another ticket? If we want to enforce this properly, we should do it with actual unittests in obs_base. The break/test/fix cycle with ci_hsc is way too long and cumbersome.

Does the script already let you pass in the Rerun arguments on the command line? It eventually needs to, even if that isn't the default someday, and then we can just use that to hard-code reasonable values (like the same ones we had before? Why are we changing them on this ticket anyhow?) in ci_hsc_gen2.

Previous to this ticket, the convert script did nothing with rerun (none of my testdata had reruns). ci_hsc has one rerun, and I added a commandline option (--reruns=[]) to the script.

Being smarter about the reruns I think warrants some specific unittests (probably built off of a list of gen2 reruns pulled from lsst-dev:/datasets), to ensure it is handling special characters correctly.

If you want to get done with this ticket and move on to something else, I'm fine with deferring that testing to another ticket. I'm not really okay with changing the behavior of this conversion (in terms of the collections it writes) because the command-line script doesn't yet have the flexibility that this and other non-unit-test cases need; that would make this ticket a regression, not a cleanup. If you need to be done with it and don't want to defer the testing to another ticket, I think we just put this ticket on hold until someone else can work on the general-purpose script.

I'm getting a little confused now. This ticket adds a --reruns command line option, which is what we want, and I thought the debate we were having was solely over what collection those reruns should go into. We know the collection has to be different for each rerun and I don't understand why we can't do something like I suggested at the top of this thread. I don't understand why choosing a rerun name has to be blocked on unit tests since we know that the script will fail as currently written if there are multiple reruns because by definition we know that the butler won't let identical dataIds be used in the same run.

@TallJimbo: I'm confused again: the existing ci_hsc_gen2 convert code uses runName="shared/ci_hsc/direct", chainName="shared/ci_hsc", neither of which follow your recommendations in RFC-663, which doesn't say anything about shared/. So the name of the output collection written by this conversion is already going to change to shared/HSC/something.

Are there any daf_butler restrictions on the characters in collection names? I know that dataset names have a rather hidden restriction based on VALID_NAME_REGEX (which I keep having to root around to find because I never remember what it is called). Does that also apply to collection names?

Yeah, the RFC doesn't try to specify what to do when usernames and ticket numbers aren't relevant, but I think "shared" instead of username and "ci_hsc" instead of a ticket number is a reasonable extrapolation, and it's what the name has always been. I'm not super attached to it, but I don't want it to flop around every time a new developer has a different idea on how to extrapolate the convention. Either we should leave it the same and avoid churn or extend the convention, and the latter should be discussed more broadly than a PR.

In any case, I hadn't intended for output collection names to include the instrument name. It's an interesting idea that may be worth revisiting, but it's not what was RFC'd.

it's what the name has always been: that's only true for ci_hsc (it is undefined anywhere else), and it's currently "shared/ci_hsc/direct" on ci_hsc_gen2 master.

Are you sure we don't have different instruments with the same rerun names right now? I certainly wouldn't bet on that, which is why I think including instrument in the converted collection is important.

Remember: everything we're discussing here only applies to converting gen2 repos.

Are you sure we don't have different instruments with the same rerun names right now? I certainly wouldn't bet on that, which is why I think including instrument in the converted collection is important.

That's a good point. I wouldn't be surprised either way, and being safe by including the instrument name in converted collection names is reasonable. And now that we've got a solid reason to change this at all, I care much less about exactly what we change it to (i.e. just that we get the "ci_hsc" from the Gen2 rerun directory included).

Sorry for being such a pain on this - I may have been reacting more to having been burned on the last collection change (to just instrument name, just before I merged DM-21849) more than this one - but I think we've now converged to a space of names that are all broadly acceptable and a good reason not to just keep the one we have.

parejkoj · 2020-04-01T23:39:11Z

The update I just pushed results in this collection name for ci_hsc_gen2: shared/HSC/rerun/ci_hsc. If there are any daf_butler-based restrictions on collection names (e.g. . or + or - not allowed), this code will fall over in exciting ways.

TallJimbo · 2020-04-02T00:18:21Z

shared/HSC/rerun/ci_hsc

I can guess where the "rerun" comes from, and I'd be inclined to strip it. Where does the "HSC" come from? Just the instrument name?

If there are any daf_butler-based restrictions on collection names (e.g. . or + or - not allowed), this code will fall over in exciting ways.

I don't think we do, though avoiding shell special characters would save quoting on command-lines.

parejkoj · 2020-04-02T00:54:24Z

I don't think we do, though avoiding shell special characters would save quoting on command-lines.

There are existing reruns on lsst-dev:/datasets/hsc with + and . and -: private/jbosch/DM-12968+wc_new, private/czw/rc2_comp.20200217, private/pprice/DM-13553.

timj reviewed Mar 25, 2020

View reviewed changes

python/lsst/obs/base/script/convertGen2RepoToGen3.py Outdated Show resolved Hide resolved

parejkoj force-pushed the tickets/DM-23728 branch 2 times, most recently from ce232a4 to 358f8ec Compare March 27, 2020 00:11

timj approved these changes Mar 27, 2020

View reviewed changes

python/lsst/obs/base/script/convertGen2RepoToGen3.py Outdated Show resolved Hide resolved

timj reviewed Mar 27, 2020

View reviewed changes

parejkoj force-pushed the tickets/DM-23728 branch 2 times, most recently from 85efb9c to 73534ea Compare March 27, 2020 18:14

timj reviewed Mar 27, 2020

View reviewed changes

parejkoj force-pushed the tickets/DM-23728 branch from 73534ea to 542483d Compare April 1, 2020 23:33

add reruns argument to convert script

c1b5d8f

parejkoj force-pushed the tickets/DM-23728 branch from 542483d to c1b5d8f Compare April 2, 2020 20:09

parejkoj merged commit 58bbfa5 into master Apr 3, 2020

timj deleted the tickets/DM-23728 branch June 25, 2020 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-23728: Cleanup ci_hsc_gen2 to use new convert script instead of custom one #213

DM-23728: Cleanup ci_hsc_gen2 to use new convert script instead of custom one #213

parejkoj commented Mar 25, 2020

timj left a comment

timj Mar 27, 2020

parejkoj Mar 27, 2020

TallJimbo Mar 27, 2020

parejkoj Mar 27, 2020

timj Mar 27, 2020

TallJimbo Mar 27, 2020

timj Mar 27, 2020

parejkoj Mar 27, 2020

TallJimbo Mar 28, 2020

parejkoj Mar 28, 2020

TallJimbo Mar 28, 2020

timj Mar 28, 2020

parejkoj Apr 1, 2020

TallJimbo Apr 2, 2020

parejkoj Apr 2, 2020

TallJimbo Apr 2, 2020

parejkoj commented Apr 1, 2020

TallJimbo commented Apr 2, 2020

parejkoj commented Apr 2, 2020

		@@ -176,6 +181,9 @@ def convert(gen2root, gen3root, instrumentClass, calibFilterType,

		configure_translators(instrument, calibFilterType, convertRepoConfig.ccdKey)

		rerunsArg = [Rerun(rerun, runName=f"shared/{instrument.getName()}/converted",

DM-23728: Cleanup ci_hsc_gen2 to use new convert script instead of custom one #213

DM-23728: Cleanup ci_hsc_gen2 to use new convert script instead of custom one #213

Conversation

parejkoj commented Mar 25, 2020

timj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parejkoj commented Apr 1, 2020

TallJimbo commented Apr 2, 2020

parejkoj commented Apr 2, 2020