-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-27685 Add butler make gen3 dcr subfilters subcommand #439
Conversation
short_help="Add subfilters for chaotic modeling.") | ||
@repo_argument(required=True) | ||
@num_subfilters_argument() | ||
@filter_names_argument() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are band names and not filter names wouldn't it be less confusing to call this band_names_argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, and one question for @isullivan on whether we should tweak the behavior slightly.
callback=split_commas, | ||
required=True, | ||
nargs=-1, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I don't think there's any chance these arguments would be reused in any other butler or pipetask script, at least not without substantial changes, so if moving them into the file that uses them is consistent with how you'd like to organize things, feel free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I've kept the not-ad-hoc argument definitions in the arguments.py
file(s). Even tho it's not likely to get reused it still seems more organized (and easy to search for later) than keeping it in commands.py
.
butler.registry.insertDimensionData("subfilter", *record) | ||
except IntegrityError as err: | ||
raise click.ClickException(f"Subfilters for at least one filter of {filter_names} " | ||
f"are already defined in {repo}.") from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isullivan, would it be more convenient if registering the same subfilter multiple times was a harmless no-op insteadof an error? If so, we can just change this from insertDimensionData
to syncDimensionData
(and pass it one record at a time), and we'll get that behavior instead (with no need to try/except
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be fine, as long as it prints a warning if the subfilter is already defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isullivan, @TallJimbo I've changed from insert
to sync
. I had to make some changes to the user feedback to print the warning, please take a quick look at the code in this file. And here is an example of what running the command looks like right now, where there are already sub filters 0-2 for the bar filter, and it adds 3 and 4 to bar and 0-4 for baz. Does this seem good?
$ butler register-dcr-subfilters foo 5 bar baz
For filter "bar": registered subfilters [3, 4], subfilters [0, 1, 2] already existed.
For filter "baz": registered subfilters [0, 1, 2, 3, 4].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the example. If "bar" already has subfilters [0, 1, 2] defined, then it should not register subfilters [3, 4] if someone runs $ butler register-dcr-subfilters foo 5 bar baz
. The meaning of each subfilter depends on the number of subfilters, so adding [3, 4] in this case will change the meaning of the existing subfilters [0, 1, 2] and will break any data products in the repository that depend on them.
If a given band already has subfilters defined, it should be skipped entirely, and either raise an error like before or print a warning that it is being skipped. The output could look like:
$ butler register-dcr-subfilters foo 5 bar baz
For filter "bar": Not registering subfilters for "bar": subfilters [0, 1, 2] already existed.
For filter "baz": registered subfilters [0, 1, 2, 3, 4].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's my misunderstanding then, and it means both that we might not be able to use syncDimensionData
here, and there's a potentially much bigger problem: once you've set the number of subfilters for a band in a repo (even a long-lived, shared repo), you can never change it.
I was instead assuming that if you had more subfilters than you wanted for a band, you could ignore them in a particular processing run with e.g. -d subfilter IN (0, 1)
(and then be super careful about comparing across collections, as the subfilter interpretation might differ). And so adding new subfilters as needed is totally fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n8pease That sounds fine to me. In the future we should work out a way to un-register or over-write registered subfilters for a collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me wonders why we can't always register N subfilters where N is larger than anyone could ever need and then pick an M<=N for any processing run. Then you wouldn't need to ask people what N should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timj The problem is that the number subfilters defines their bandwidth. E.g. if there are 3 subfilters in g-band and the full band is 400-550nm, then that defines the wavelength ranges included in each subfilter as 400-450nm, 450-500nm, and 500-550nm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a shame. I was hoping you could say "For this processing I'm defining N to be 3" and have the processing split the band up appropriately. It seems that if we can't define it this way then once the number of subfilters are defined then changing them at any later time would break all previous processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's let @n8pease get this merged and then come back to this question. I don't think the situation we're in now is tenable, but I also suspect that there isn't a lot of urgency for making DCR coadds in big shared repos right now anyway, and hence we don't have to resolve it right away.
for filterName in filter_names: | ||
for sub in range(num_subfilters): | ||
record.append({"band": filterName, "subfilter": sub}) | ||
records[filterName].append(sub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually see that we're using records
anywhere below.
And if we can just get rid of it, maybe we can rename that record
list to records
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's a remnant from trying a different way to generate the result string. Thanks for catching, will remove.
8e86bec
to
3e1cd1e
Compare
3e1cd1e
to
221882f
Compare
it has been replace by the butler CLI subcommand register-dcr-subfilters
No description provided.