Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent functions and method docstring style #2322

Closed
philippjfr opened this issue Feb 9, 2018 · 36 comments
Closed

Consistent functions and method docstring style #2322

philippjfr opened this issue Feb 9, 2018 · 36 comments
Labels
type: discussion type: docs Related to the documentation and examples

Comments

@philippjfr
Copy link
Member

philippjfr commented Feb 9, 2018

HoloViews is built on param which has the great advantage of providing a neat way of documenting the parameters of classes. While this is a great way to structure class level parameters, it does not provide anything that documents methods and functions. The current state of the docstrings on methods and functions in HoloViews is therefore what I'd call highly inconsistent and therefore quite frustrating for users.

While a large paragraph explaining what the method does is often sensible, docstrings are far more commonly used to quickly figure out what that one particular argument was called, what options it supports and what type it expects. I therefore think there is a pretty desperate need for us to standardize on some standard docstring format at least for functions and methods.

I'd be strongly in favor of moving to simple NumPy-style docstrings with sections for Parameters and Return values:

Some description of the function or method.

Parameters
----------
x : type
    Description of parameter `x`.
y
    Description of parameter `y` (with type not specified)

Returns
-------
err_code : int
    Non-zero value indicates error code, or zero on success.
err_msg : str or None
    Human readable error message, or None on success.

Readability of this is infinitely better than having to scan a large paragraph for the argument that I care about. After 1.10 is released I'd like to start a systematic review of all docstrings (many of which are hopelessly outdated) and upgrade them to a consistent style. I have no strong preference for any specific format but think that explicitly listing the arguments and return values will result in a much improved user experience and help us consider and address current API inconsistencies.

@philippjfr philippjfr added type: docs Related to the documentation and examples type: discussion labels Feb 9, 2018
@jlstevens
Copy link
Contributor

I support the goal of standardizing docstrings to make them consistent (and describing the arguments) though I will note:

  1. I want a format that is succinct. I hope there is a format that is slightly more succinct than that Numpy style.
  2. This is more important for user code than developer code.
  3. Going through all the docstrings will be a big job.

I agree this is a discussion item we should return to after 1.10 is released.

@philippjfr
Copy link
Member Author

This is more important for user code than developer code.

Yes, I'd suggest we do it incrementally starting with some of the core classes like Dimensioned, LabelledData, NdMapping, HoloMap, DynamicMap, Element and Dataset.

@philippjfr
Copy link
Member Author

Could we decide on a format now so I can make sure any new docstrings that get written conform to the standard? @jlstevens You say you don't like the verbosity of NumPy style docstrings, could you make a suggestion on how to shorten it, otherwise that is my strong vote.

@jbednar Any opinions?

@ceball
Copy link
Member

ceball commented Sep 12, 2018 via email

@philippjfr
Copy link
Member Author

Thanks @ceball, somehow missed your issue even though I was pinged in it. I strongly agree that supporting existing tooling is a strong argument to go with a standard format and I have a strong preference for numpydoc.

@jbednar
Copy link
Member

jbednar commented Sep 12, 2018

How will we handle the fact that huge groups of all HoloViews Elements accept the same argument, without duplicating that info across every Element's docstring (and thus making it impossible to change the wording over time)?

@philippjfr
Copy link
Member Author

philippjfr commented Sep 12, 2018

My primary concern is method and function level docstrings. Class-level docstrings need improvements too but not as urgently as methods and functions, which are currently, imo, unusable. The class level docstrings are already covered reasonably well because parameters have docstrings and they can also refer back to the docs for a baseclass, e.g. to cover the supported data formats and explain interfaces in detail.

@ceball
Copy link
Member

ceball commented Sep 12, 2018 via email

@jbednar
Copy link
Member

jbednar commented Sep 12, 2018

Do inheritance for docstrings

I'm probably not thinking very clearly about this specific situation, but what I'm concerned about is that inheritance is all or nothing. For Parameters that's handled by associating the docstring with each parameter, so that as parameters are gradually built up via inheritance, all of that can be provided for hv.help. But for positional arguments, e.g. of the constructor, we can't handle that by class-level docstring inheritance, because each class needs its own documentation, even though all the classes accept the same first positional argument, and large groups of them treat that data the same. Can that be handled by docstring inheritance of the constructors (not the class)? Yes, if we don't use those constructor docstrings already for subclass-specific stuff, in which case we can use numpydoc docstrings in a few key base class constructors?

@philippjfr
Copy link
Member Author

Not quite following, constructors (i.e. __init__) generally don't have a docstring, the class level docstring is usually used instead.

@ceball
Copy link
Member

ceball commented Sep 12, 2018 via email

@jbednar
Copy link
Member

jbednar commented Sep 12, 2018

Not quite following, constructors (i.e. init) generally don't have a docstring, the class level docstring is usually used instead.

That's my point. The constructor arguments are shared across many classes, but each class has its own class docstring, and so I don't see a way to inherit the documentation of positional parameters of constructors. Chris was joking about inheritance for docstrings, but I was just talking about inheritance of docstrings, which can work for methods but not classes, and if we don't generally document the constructor, then it won't work. So I'm still not sure what's proposed be solved by just using numpydoc.

@philippjfr
Copy link
Member Author

philippjfr commented Sep 12, 2018

So I'm still not sure what's proposed be solved by just using numpydoc.

As I mentioned above I'm primarily concerned about method and function docstrings which are currently huge walls of text which make it incredibly difficult to figure out what each argument means. Class level docstrings also need improvement but unlike methods they generally have only one argument that needs explaining (i.e. data), because everything else is a parameter.

@ceball
Copy link
Member

ceball commented Sep 12, 2018 via email

@jbednar
Copy link
Member

jbednar commented Sep 12, 2018

Ah; sorry! I didn't read far enough back to see that indeed, this issue is just about improving existing walls of text.

That said, I don't think we should make any decision about docstring formats without also addressing the issues about positional constructor arguments that users have raised (e.g. #2925 and #2947), because it would be very confusing to have multiple competing conventions for our docstrings, each addressing specific issues.

So if we use numpydoc format to avoid wall-of-text function and method docstrings, then we should also presumably be using numpydoc format to document positional parameters for our constructors, right? And then I don't know how to avoid duplicating all that numpydoc-format documentation for the positional arguments across all our classes.

@philippjfr
Copy link
Member Author

then we should also presumably be using numpydoc format to document positional parameters for our constructors, right?

Sure I'm not necessarily against that, it's just that in the majority of cases it'd be something like this:

data: array or dataframe or dict
    ...
**params
    Dictionary of parameters for the class

where the parameters aren't documented a second time.

@jlstevens
Copy link
Contributor

I don't have much to say other than that the numpy style looks fairly readable to me, at least from a user perspective. From a developer perspective, I'll need to find a way to get emacs to fold the docstrings away: applying this format consistently will hugely increase the number of lines taken up by docstrings which will be distracting when trying to read and write code.

@jbednar
Copy link
Member

jbednar commented Sep 13, 2018

Personally, I'm ok with docstrings dominating code files; that's what people should be reading first (what is this code meant to do?) anyway. :-)

Parameters

x : type
Description of parameter x.
y
Description of parameter y (with type not specified)

This seems like it will cause untold confusion. For some reason, numpydoc calls function arguments "Parameters" and not arguments. Won't this make users think it's about our Param-based Parameters that are listed in hv.help(), particularly when ParameterizedFunctions blur the distinction between functions and objects (which can have param.Parameters)?

data: array or dataframe or dict
...

It doesn't seem to me that array or dataframe or dict will work as documentation for the data parameter. First, it doesn't have to be any of those things, right? Can't it also be a list of lists, a tuple of lists, etc.? Second, only some of those types of objects are accepted, right? E.g. only certain dicts would work, and presumably we should have some descriptions of what's required. So while the words array or dataframe or dict can well be copied to every class, it doesn't seem like that would be feasible for the actual documentation that is needed for users to figure out what can be passed for the data argument.

Returns

err_code : int
Non-zero value indicates error code, or zero on success.
err_msg : str or None
Human readable error message, or None on success.

Given that one of Jean-Luc's next tasks is to add an output decorator to Param, should we not see if that will also let us declare the return type of a method or function using Param? Putting that information in the docstring allows docs to be made but doesn't allow any other automated processing of it like would be needed for boxflow.

@ceball
Copy link
Member

ceball commented Sep 13, 2018

This seems like it will cause untold confusion. For some reason, numpydoc calls function arguments "Parameters" and not arguments. Won't this make users think it's about our Param-based Parameters that are listed in hv.help(), particularly when ParameterizedFunctions blur the distinction between functions and objects (which can have param.Parameters)?

I think it's not a numpy thing, but a python thing:

Parameters are defined by the names that appear in a function definition, whereas arguments are the values actually passed to a function when calling it

(https://docs.python.org/2.7/faq/programming.html#faq-argument-vs-parameter)

I've noticed people in general gradually changing how they speak about stuff, but it's been happening slowly. Anyway, however it might have been in the past, only the situation now matters really. I.e. when we say things like "the parameters of a parameterized function", we are going to cause confusion :(

@jbednar
Copy link
Member

jbednar commented Sep 13, 2018

Not just a Python thing: https://codeburst.io/parameters-arguments-in-javascript-eb1d8bd0ef04

So maybe the confusion is inescapable.

@philippjfr
Copy link
Member Author

It doesn't seem to me that array or dataframe or dict will work as documentation for the data parameter.

My example was not meant to be exhaustive it was just meant to highlight that class level descriptions of the constructor would generally only describe the data attribute while the **params would describe themselves.

@philippjfr
Copy link
Member Author

Given that one of Jean-Luc's next tasks is to add an output decorator to Param, should we not see if that will also let us declare the return type of a method or function using Param?

I see this as a very specific mechanism to declare pipelines and data flow not for general return type annotations and documentation. Param is too limited to describe the return types in enough detail and I definitely do not want to litter all our code with these annotations.

@jbednar
Copy link
Member

jbednar commented Sep 13, 2018

My example was not meant to be exhaustive it was just meant to highlight that class level descriptions of the constructor would generally only describe the data attribute while the **params would describe themselves.

I'm still not hearing any answer to the underlying problem: how will we document the data argument, given that it is shared across many classes, not identical for every class, and very complicated to describe?

@jbednar
Copy link
Member

jbednar commented Sep 13, 2018

Param is too limited to describe the return types in enough detail and I definitely do not want to litter all our code with these annotations.

I don't mean that to get documentation we'd always have to be able to express the return types fully in Param. I mean that, (a), if return types are specified, then that information can be used (where present), (b) a similar mechanism can be used to make human-readable docstrings describing the return type, where the annotation is not present and/or not sufficient to describe it, and (c) it would be odd to have a completely separate, independent convention for documenting return types that is not aware of the annotations for return types.

It seems just like a Parameter -- typically we try to add a very specific type programmatically, e.g. Param.Integer(bounds=(0,10)), and if we do, then very little docstring is needed (nothing about types or values, just about intention). But we can still handle the general case, using param.Parameter, in which case one needs to put very detailed information in the docstring about what's supported.

Similarly, for return values, one could add annotations when it's very simple what's returned (True, False), but human-written docs can be added instead or as well, which together form the documentation for the return type, some of which is generated (when the information is declared), and the rest of which comes from typing it in.

I'm not arguing strongly that it has to be done this way, but I think it's something more in keeping with how Param works than adopting a convention with specially formatted big docstrings. And I'm not seeing how the code would get littered with the annotations any more than the code would get littered with the same information put in a docstring.

@philippjfr
Copy link
Member Author

philippjfr commented Sep 13, 2018

I'm still not hearing any answer to the underlying problem: how will we document the data argument, given that it is shared across many classes, not identical for every class, and very complicated to describe?

Every class will have to describe exactly what it supports and refer back to the Dataset docstring to explain each format in detail. The supported data formats are very stable at this point so I'm not that worried about a fair amount duplication in this case.

@ceball
Copy link
Member

ceball commented Sep 13, 2018

Not just a Python thing

I just meant that everyone (including the official docs) in the python world used to talk only about "arguments", but that's changed. At least, that's how I remember it (I might be wrong).

@jlstevens
Copy link
Contributor

jlstevens commented Sep 13, 2018

I agree with Jim on these two points:

For some reason, numpydoc calls function arguments "Parameters" and not arguments. Won't this make users think it's about our Param-based Parameters that are listed in hv.help(),....

This would bother me as well.

And I'm not seeing how the code would get littered with the annotations any more than the code would get littered with the same information put in a docstring.

I also agree with this in principle though I don't yet see how this would work in practice.

Specially formatted strings in code are a nightmare to manage. That said, I don't want to be arguing that we should have betters docs/docstrings. As a result, I'm not objecting to this proposal even though I am not particularly enthusiastic about it (as a developer, I would find that having docstrings dominate a file just makes it harder to work with the code).

@philippjfr
Copy link
Member Author

I also agree that we should call arguments Arguments not Parameters.

@wmayner
Copy link

wmayner commented Oct 25, 2018

Just my two cents as a relatively new user who's unfamiliar with the codebase:

I've definitely suffered because the documentation is really tough to use. At several points I considered abandoning Holoviews completely, despite all the amazing functionality it offers, just because of this one weakness.

Also I just wanted to throw Google style docstrings out there as an alternative to NumPy style. These can be parsed by Sphinx with an extension.

@philippjfr
Copy link
Member Author

I've definitely suffered because the documentation is really tough to use.

I really sympathize, and kudos for sticking with it. This needs to be the main priority as soon as we push 1.11.0 out the door and if possible we should at least tackle some of the core classes and methods before then.

@philippjfr
Copy link
Member Author

I don't mean that to get documentation we'd always have to be able to express the return types fully in Param. I mean that, (a), if return types are specified, then that information can be used (where present), (b) a similar mechanism can be used to make human-readable docstrings describing the return type, where the annotation is not present and/or not sufficient to describe it, and (c) it would be odd to have a completely separate, independent convention for documenting return types that is not aware of the annotations for return types.

Dynamically generated docstrings sound good in theory but I see no reasonable way to make them interleave sensible with actual text written by a developer and to ensure that they are presented in a readable format. I'd prefer to refer the user to some more exhaustive reference than automatically interpolate some potentially complex (list of) type(s) that may have been declared as part of the output. The output should be readable and so I'd much prefer on coming up with conventions to refer users to actual docs when discussing accepted data arguments and so on.

I also do not want to annotate return types using some special param only syntax and litter all or even a significant fraction of our code with it and if we only do it in some cases it is as good as useless for consistent documentation. We would then also have to have a separate decorator to annotate the input types. Basically I'm pretty strongly convinced that annotating input and return types with param is a horrible way to auto-generate docs.

It sounds like you're envisioning as the param decorators as some general type system that we will apply to all our methods and functions, which I also think is a really bad idea. We're not going to put param decorators around every public function and method in holoviews, which is what would be needed to generate a consistent set of documentation.

My understanding was that param output declaration would be very useful for declaring pipelines on things like operations but not that they would become our own hand-rolled typing system, which we're going to use throughout the codebase.

So I would strongly argue we adopt either a NumPy or Google docstring style very soon, rather than waiting on param decorators, which imo are not well suited to the task anyway.

@jbednar
Copy link
Member

jbednar commented Oct 26, 2018

annotating input and return types with param is a horrible way to auto-generate docs

Here you're talking about methods? Seems like there is a continuum where the tradeoffs differ. If you have an arbitrary method with several complex positional arguments, there's no point to having some separate way to annotate the return type, because you're going to have a long docstring already and it's annoying to go to some special place and syntax to add one more bit of information. Plus such a method can never be automated in any meaningful way; it's for a human to call.

But the other end of the extreme is what I would have thought is a common case, with a method that only at most accepts parameters, with no positional or special keyword arguments. Such a method is worth annotating the return type with Param, because that one last bit of information will let it be handled automatically by things like boxflow -- on top of the parameter declarations, the output type is just one tiny bit more information needed, at which point it can be used to chain pipelines while doing full validation of outputs to inputs. So for __call__(**params), I'd argue that annotating the output type is worthwhile and adds very important affordances for validation, automation, and so on; much bang for the buck!

So you and I are probably thinking of different scenarios when you're imagining using these tools; I'm not imagining hand-rolling a complete typing system to cover arbitrary methods, because there would be no use for such a thing, but I do expect that simple parameterized pipeline stages can be captured well and have important usages. For HoloViews you are the one who has the set of methods in mind and knows whether most methods that need documentation fall in one or the other of those categories (i.e. lots of positional arguments vs none or only a standard widely shared positional argument), so it's your call to make.

@philippjfr
Copy link
Member Author

Okay, I'm glad we were mostly talking past each other here then. Yes, I'm only talking about methods and functions here, I was hoping that was made clear by the opening paragraph of this issue.

So for call(**params), I'd argue that annotating the output type is worthwhile and adds very important affordances for validation, automation, and so on; much bang for the buck!

This really only applies to operations, of which there are only a handful and where I'd be very happy to do this. What I've been talking about in this issue is every single other public method and function in holoviews, of which there are 100x more than there are operations.

@philippjfr
Copy link
Member Author

philippjfr commented Oct 26, 2018

The point I do agree with is that the constructors for all the core datastructures will need to be given some thought, but that should be fairly straightforward as in most cases the positional arguments are simply: data, kdims and vdims. So we only need some description of what kind of data the object accepts, i.e. columnar, gridded, or geometry, and then point to the appropriate documentation for those types.

@jbednar
Copy link
Member

jbednar commented Oct 26, 2018

Sounds good.

every single other public method and function in holoviews, of which there are 100x more than there are operations

And yet operations are mainly what I use, perhaps because the rest aren't well documented. :-)

@philippjfr
Copy link
Member Author

Since we've not made a decision and started the conversion I don't think we need this issue anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: discussion type: docs Related to the documentation and examples
Projects
None yet
Development

No branches or pull requests

5 participants