Skip to content

Conversation

@nateraw
Copy link
Contributor

@nateraw nateraw commented Jan 26, 2023

Playing with updating some docstrings. Will likely do this across multiple PRs. For now, just create_model.

Wanted to make sure @rwightman doesn't oppose any of the style changes here.

Here's everything I've changed here:

  • Turn each arg description into a complete sentence, using capitol letter at beginning and ending with a period.
  • Put the arg descriptions on their own new line (style used across HF repos...I personally find this much easier to read in my python interactive, which is why I opted to do it here as well).
  • Using the <name>: (<type>, *optional*, defaults to ...) syntax/style, which is consistent across all HF docs.
  • Adding example code snippet right in the docstring, which is nice both for when folks call help(create_model) in Python as well as when it gets rendered in the docs.
  • Used HF Doc-specific helper tags where applicable (in this case <Tip>...</Tip>, which highlights sections nicely.

The rendered docstring for this can be seen live here.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 26, 2023

The documentation is not available anymore as the PR was closed or merged.

@rwightman
Copy link
Collaborator

rwightman commented Jan 26, 2023

Thanks for the eample. I'm definitely open to improving docstrings, buti it's going to differ from Transformers and follow Google style a bit more closely.

  • Complete sentence ok, but should be succint

  • I want to keep everything on one line when it comfortably fits a 120 char line, no need to CR unless there is a reaally long var name, 4-space one new l ines when it needs to wrap

  • Now that I'm adding Python type hints to more and more fns as I update parts, I feel the redundancy is undesirable if we're updating everything, I'd like to avoid redundancy as it's error prone and wasted space (already out of sync in places, type hints are more concise than the underspecified docstring equivalents). Looks like Google guide is doing this now, and others (see below). So instead of adding / updating typing for the docstrings, I'd like to remove types from the docstrings and add type hints to the actual arg defs where they don't exist. Defaults are similar here, better one source of truth than many.

Google:

  [Args:](https://google.github.io/styleguide/pyguide.html#doc-function-args)
  List each parameter by name. A description should follow the name, and be separated by a colon followed by 
  either a space or newline. If the description is too long to fit on a single 80-character line, use a hanging indent
  of 2 or 4 spaces more than the parameter name (be consistent with the rest of the docstrings in the file). The
  description should include required type(s) if the code does not contain a corresponding type annotation.
  If a function accepts *foo (variable length argument lists) and/or **bar (arbitrary keyword arguments), 
  they should be listed as *foo and **bar.

Numpy:

    `PEP 484`_ type annotations are supported. If attribute, parameter, and
    return types are annotated according to `PEP 484`_, they do not need to be
    included in the docstring:

FYI the HF use of 'optional' in docstrings is objectively confusing AF to anyone who's been programming Python for awhile

  • Tip tags are cool

  • Example snippets are welcome, but I feel only worth their space for high traffic functions and not as a rule

@nateraw
Copy link
Contributor Author

nateraw commented Jan 26, 2023

If a function accepts *foo (variable length argument lists) and/or **bar (arbitrary keyword arguments),
they should be listed as *foo and **bar.

Will have to check if this will do wonky things with doc builder, I'm not mad at this though. Could actually make this docstring in particular a lot easier to read (as long as the rendered version isnt looking for a closing *).

I want to keep everything on one line when it comfortably fits a 120 char line, no need to CR unless there is a reaally long var name, 4-space one new lines when it needs to wrap

Fine by me!

FYI the HF use of 'optional' in docstrings is objectively confusing AF to anyone who's been programming Python for awhile

Yea I was just following the HF pattern...imo I agree w/ you. Lets do without here.

@nateraw
Copy link
Contributor Author

nateraw commented Jan 26, 2023

Example snippets are welcome, but I feel only worth their space for high traffic functions and not as a rule

Definitely 100% agree

@nateraw
Copy link
Contributor Author

nateraw commented Jan 31, 2023

@rwightman feel free to take a second look, I think I made the changes you requested.

Only issue I see is in the docstring for the kwargs its making one of the * bold, but its not really that noticeable. I think this is the syntax you were suggesting - pls let me know if I misunderstood.

@rwightman
Copy link
Collaborator

@nateraw was going to allocate a bit more time to see how others might be handling the kwargs, I feel the old approach was a bit more clear, but non-standard, and this current approach looks a bit off.. hmmm

@nateraw
Copy link
Contributor Author

nateraw commented Feb 1, 2023

Agree. I'll think of some options too

@rwightman
Copy link
Collaborator

@nateraw circling back to this after wrapping up the epic hub model push & multi-weight conversions...

On the kwargs thing, I feel there are 3 realistic options, 1 was my original choice, 2 is your change, and I've seen 3 out there...

1

    Args:
        model_name: Name of model to instantiate.
        pretrained: If set to `True`, load pretrained ImageNet-1k weights.
        pretrained_cfg: Pass in an external pretrained_cfg for model.
        pretrained_cfg_overlay: Replace key-values in base pretrained_cfg with these.
        checkpoint_path: Path of checkpoint to load _after_ the model is initialized.
        scriptable: Set layer config so that model is jit scriptable (not working for all models yet).
        exportable: Set layer config so that model is traceable / ONNX exportable (not fully impl/obeyed yet).
        no_jit: Set layer config so that model doesn't utilize jit scripted layers (so far activations only).

    Keyword Args:
        drop_rate (float): Classifier dropout rate for training.
        drop_path_rate (float): Stochastic depth drop rate for training.
        global_pool (str): Classifier global pooling type.

2

    Args:
        model_name: Name of model to instantiate.
        pretrained: If set to `True`, load pretrained ImageNet-1k weights.
        pretrained_cfg: Pass in an external pretrained_cfg for model.
        pretrained_cfg_overlay: Replace key-values in base pretrained_cfg with these.
        checkpoint_path: Path of checkpoint to load _after_ the model is initialized.
        scriptable: Set layer config so that model is jit scriptable (not working for all models yet).
        exportable: Set layer config so that model is traceable / ONNX exportable (not fully impl/obeyed yet).
        no_jit: Set layer config so that model doesn't utilize jit scripted layers (so far activations only).
        **drop_rate (float): Classifier dropout rate for training.
        **drop_path_rate (float): Stochastic depth drop rate for training.
        **global_pool (str): Classifier global pooling type.

3

    Args:
        model_name: Name of model to instantiate.
        pretrained: If set to `True`, load pretrained ImageNet-1k weights.
        pretrained_cfg: Pass in an external pretrained_cfg for model.
        pretrained_cfg_overlay: Replace key-values in base pretrained_cfg with these.
        checkpoint_path: Path of checkpoint to load _after_ the model is initialized.
        scriptable: Set layer config so that model is jit scriptable (not working for all models yet).
        exportable: Set layer config so that model is traceable / ONNX exportable (not fully impl/obeyed yet).
        no_jit: Set layer config so that model doesn't utilize jit scripted layers (so far activations only).
        **kwargs:
          - drop_rate (float): Classifier dropout rate for training.
          - drop_path_rate (float): Stochastic depth drop rate for training.
          - global_pool (str): Classifier global pooling type.

For kwargs I feel the Keyword Args: sections makes the most sense, I originally chose it as it has existing use

  • sphinx Napoleon extension supports it
  • pylint supports it
  • mkdocs supports it

For anything that doesn't have support the parsing addition is minor compared to fully supporting 3, and 2 doesn't have separation of args vs kwargs that I'd like.

@rwightman rwightman merged commit e0ec0f7 into huggingface:main May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants