Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-27685 Add butler make gen3 dcr subfilters subcommand #439

Merged
merged 2 commits into from
Dec 14, 2020

Conversation

n8pease
Copy link
Contributor

@n8pease n8pease commented Dec 8, 2020

No description provided.

short_help="Add subfilters for chaotic modeling.")
@repo_argument(required=True)
@num_subfilters_argument()
@filter_names_argument()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are band names and not filter names wouldn't it be less confusing to call this band_names_argument?

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, and one question for @isullivan on whether we should tweak the behavior slightly.

callback=split_commas,
required=True,
nargs=-1,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't think there's any chance these arguments would be reused in any other butler or pipetask script, at least not without substantial changes, so if moving them into the file that uses them is consistent with how you'd like to organize things, feel free.

Copy link
Contributor Author

@n8pease n8pease Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've kept the not-ad-hoc argument definitions in the arguments.py file(s). Even tho it's not likely to get reused it still seems more organized (and easy to search for later) than keeping it in commands.py.

butler.registry.insertDimensionData("subfilter", *record)
except IntegrityError as err:
raise click.ClickException(f"Subfilters for at least one filter of {filter_names} "
f"are already defined in {repo}.") from err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isullivan, would it be more convenient if registering the same subfilter multiple times was a harmless no-op insteadof an error? If so, we can just change this from insertDimensionData to syncDimensionData (and pass it one record at a time), and we'll get that behavior instead (with no need to try/except).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be fine, as long as it prints a warning if the subfilter is already defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isullivan, @TallJimbo I've changed from insert to sync. I had to make some changes to the user feedback to print the warning, please take a quick look at the code in this file. And here is an example of what running the command looks like right now, where there are already sub filters 0-2 for the bar filter, and it adds 3 and 4 to bar and 0-4 for baz. Does this seem good?

$ butler register-dcr-subfilters foo 5 bar baz
For filter "bar": registered subfilters [3, 4], subfilters [0, 1, 2] already existed.
For filter "baz": registered subfilters [0, 1, 2, 3, 4].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the example. If "bar" already has subfilters [0, 1, 2] defined, then it should not register subfilters [3, 4] if someone runs $ butler register-dcr-subfilters foo 5 bar baz. The meaning of each subfilter depends on the number of subfilters, so adding [3, 4] in this case will change the meaning of the existing subfilters [0, 1, 2] and will break any data products in the repository that depend on them.
If a given band already has subfilters defined, it should be skipped entirely, and either raise an error like before or print a warning that it is being skipped. The output could look like:

$ butler register-dcr-subfilters foo 5 bar baz
For filter "bar": Not registering subfilters for "bar": subfilters [0, 1, 2] already existed.
For filter "baz": registered subfilters [0, 1, 2, 3, 4].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's my misunderstanding then, and it means both that we might not be able to use syncDimensionData here, and there's a potentially much bigger problem: once you've set the number of subfilters for a band in a repo (even a long-lived, shared repo), you can never change it.

I was instead assuming that if you had more subfilters than you wanted for a band, you could ignore them in a particular processing run with e.g. -d subfilter IN (0, 1) (and then be super careful about comparing across collections, as the subfilter interpretation might differ). And so adding new subfilters as needed is totally fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n8pease That sounds fine to me. In the future we should work out a way to un-register or over-write registered subfilters for a collection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me wonders why we can't always register N subfilters where N is larger than anyone could ever need and then pick an M<=N for any processing run. Then you wouldn't need to ask people what N should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timj The problem is that the number subfilters defines their bandwidth. E.g. if there are 3 subfilters in g-band and the full band is 400-550nm, then that defines the wavelength ranges included in each subfilter as 400-450nm, 450-500nm, and 500-550nm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a shame. I was hoping you could say "For this processing I'm defining N to be 3" and have the processing split the band up appropriately. It seems that if we can't define it this way then once the number of subfilters are defined then changing them at any later time would break all previous processing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's let @n8pease get this merged and then come back to this question. I don't think the situation we're in now is tenable, but I also suspect that there isn't a lot of urgency for making DCR coadds in big shared repos right now anyway, and hence we don't have to resolve it right away.

for filterName in filter_names:
for sub in range(num_subfilters):
record.append({"band": filterName, "subfilter": sub})
records[filterName].append(sub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually see that we're using records anywhere below.

And if we can just get rid of it, maybe we can rename that record list to records?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's a remnant from trying a different way to generate the result string. Thanks for catching, will remove.

@n8pease n8pease force-pushed the tickets/DM-27685 branch 4 times, most recently from 8e86bec to 3e1cd1e Compare December 11, 2020 20:26
it has been replace by the butler CLI subcommand register-dcr-subfilters
@n8pease n8pease merged commit 9d14b4c into master Dec 14, 2020
@n8pease n8pease deleted the tickets/DM-27685 branch December 14, 2020 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants