Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-28036 Support URI in pipelines #167

Merged
merged 4 commits into from Feb 15, 2021
Merged

DM-28036 Support URI in pipelines #167

merged 4 commits into from Feb 15, 2021

Conversation

natelust
Copy link
Contributor

Support loading and saving files with a URI. This change necessitates
The label specifier on the URI path change to # from :.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of nitpicky naming/typing comments, but nothing serious, except maybe for a naming inconsistency problem that is bigger than just this ticket (or just you).

@@ -168,7 +170,7 @@ def fromFile(cls, filename: str) -> Pipeline:
A path that points to a pipeline defined in yaml format. This
filename may also supply additional labels to be used in
subsetting the loaded Pipeline. These labels are separated from
the path by a colon, and may be specified as a comma separated
the path by a hash, and may be specified as a comma separated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just write a literal # (in double-backticks) instead of "a hash" here, given the prevalence of "what do you call the # symbol?" social-media quizzes.


Parameters
----------
uri: `str`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most other middleware methods that take a URI actually take a ButlerURI instance instead of or in addition to a str. And I bet ButlerURI's constructor will happily accept a ButlerURI instance, so this may just be a matter of updating the annotations and docs (but please don't take my word for it).

split = re.split("[:](?!\\/\\/)", fileSpecifer)
if len(split) > 1:
warnings.warn("The fileSpecifier seems to use the legacy : to separate labels, this is "
"deprecated and will be removed in June 2021, please use # instead.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user that sees this warning is unlikely to know what "fileSpecifier" means, because that's sort of specific to this code context. Maybe "The pipeline file {fileSpecifier!r} seems to use ..." instead?

You should probably also make this a DeprecationWarning or FutureWarning, rather than the default UserWarning (I don't recall the exact syntax, or which of those the dev guide recommends).

@@ -484,6 +526,9 @@ def _addConfigImpl(self, label: str, newConfig: pipelineIR.ConfigIR):
def toFile(self, filename: str):
self._pipelineIR.to_file(filename)

def toUri(self, uri: str):
self._pipelineIR.to_uri(uri)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to return something? (Same for the preexisting toFile.)

If not, is it too late to rename these to something that more strongly suggests an in-place operation.

Finally, MyPy at least is going to complain if you don't add -> None to methods that don't return anything while annotating the arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change the new method name, but technically I think I would have to go through a process to change the other, which does not seem worth it as at some point I would say we should just drop the old one.

@@ -484,6 +526,9 @@ def _addConfigImpl(self, label: str, newConfig: pipelineIR.ConfigIR):
def toFile(self, filename: str):
self._pipelineIR.to_file(filename)

def toUri(self, uri: str):
self._pipelineIR.to_uri(uri)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing line below: typing.Generator requires three type parameters, not just one (the other two are None in this case). FWIW, this is one of the reasons I prefer Iterator when the user shouldn't care how the lazy iterator is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typing "guidelines" I have read have been "Be liberal in what you accept, be specific with what you return" so I try to adhere to that as much as possible. It does not hurt to have more info about what is being returned, but it is a pain to get more specific later (especially in a case where it suddenly does matter)


Note
----
This method is deprecated, please use from_uri
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a deprecated decorator or warning, then?


Parameters
----------
uri: `str`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with Pipeline.fromUri, ideally this would probably take ButlerURI instances as well as str (and probably already actually does).

----------
uri: `str`
Location of document to use in creating a `PipelineIR` object.
Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a newline before this heading, I think.

pipelineIR : `PipelineIR`
The loaded pipeline
"""
loadedUri = ButlerURI(uri)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nitpicky, but I'll just note that you're using snake_case for method names here, camelCase for similar method names in Pipeline, camelCase with Biglittle acronyms for "Uri" (while @timj used BigBig for "URI"), and camelCase with BigBig acronyms for "IR". You could at least use snake_case for variable names, too, now, and avoid at least some of that inconsistency without any API changes.

Ok, so maybe it's not that nitpicky, but also not something we can solve just on this ticket: I am in particular worried about us adding fromUri, fromURI, or from_uri (or similar) to ton of new things over the next few months with absolutely no consistency as we ButlerURI All The Things, and that actually is a real problem for users.

And if we are going to rebel against the dev guide's existing mandate that acronyms should Biglittle (Psf, Wcs, even Lsst) - a rebellion I would support, but don't want to lead - we really should RFC it, not just ignore it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a bit of a quirk of how things evolved over time. The snake vs camel came from pipelineIR when it was going to be more of a stand alone thing, and I thought we were moving toward that for new stuff. Then later this got more integrated with Pipeline.py, but there the convention was already camelCase, and so it was not ok to mix the two. So I am not quite sure what we want to do here. pipelineIR is basically all internal, so we could change this around w/o a lot of disruption. If we go this way I would prefer to do the other interface changes on a different ticket to do that. I am so conditioned to camelCase that I keep falling back to it on variable names. I will address any new ones here.

As far as URI vs Uri I tried to emulate the methods that Tim added in Quantum with the mixed case (and I think I missed that with one fromURI) So I guess the question remains, what would you prefer to see done? Maybe we could come up with a plan from there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timj, do you want to make a call on how we capitalize and separate words in methods involving "URI" going forward?

Once we've settled where we're going, I think I'm basically happy for this PR to choose that or leave existing methods the same as @natelust thinks is best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are at the confluence of competing naming forces. I'm fine if pipe_base switches to xxx_uri naming convention. I've always found xxxUri to be a bit odd but I realize that's the old convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For files that are already camelCase, would you prefer xxxUri or xxxURI? Or would you two be happy if I migrated some interfaces to snake case on this ticket, and others at a different time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with incremental migration if it doesn't have much impact or take a lot of extra time. xxxURI should be used for classes that are related to ButlerURI otherwise things get odd but you are going to say they are already odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more about method names, but if you two are fine with starting migration to snake cake I am fine with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if there isn't an obvious best way to stick with an old style, better to jump all the way to snake case as the new style than land on something awkwardly in-between.

yaml.dump(self.to_primitives(), f, sort_keys=False)
self.to_uri(filename)

def to_uri(self, uri: Union[ButlerURI, str]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this with "write" in the name somewhere ("write_to_uri"?); the name right now feels like it's supposed to somehow transform the object into a URI and then return the URI (which obviously doesn't make sense, but it's still confusing).

Should also accept a ButlerURI instance as well as a string.

"""Split appart a filename path from label subsets
def _parse_file_specifier(uri: Union[str, ButlerURI]
) -> Tuple[ButlerURI, Optional[LabelSpecifier]]:
"""Split appart a uri and any possible label subsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appart -> apart

filename: str
filename, labelSubset = split[0], split[1]
if isinstance(uri, str):
# This is to support legacy pipelines during transition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an example of what this matches to the comment would be nice; I've already forgotten.

Support loading and saving files with a URI. This change necessitates
The label specifier on the URI path change to # from :.
@natelust natelust merged commit 560fb7b into master Feb 15, 2021
@natelust natelust deleted the tickets/DM-28036 branch February 15, 2021 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants