Skip to content

Docstring Guidelines

Svetlana Karslioglu edited this page May 22, 2024 · 4 revisions

The PyTorch project adheres to the Google style for docstrings, in conjunction with the conventions outlined in PEP-257. We strive to document all public methods and functions to ensure clarity and ease of use. To ensure the aforementioned conventions are followed, PyTorch uses the ruff linter tool, which enforces the pydocstyle format for docstrings. For a full list of error codes that pydocstyle check returns, see the ruff documentation. Some of the errors are being intentionally ignored due to the nature of the PyTorch code.

Tooling

The PyTorch project uses a combination of tools to build and generate documentation from docstrings. The primary tool used is Sphinx, a powerful documentation generator that converts reStructuredText files into HTML. In addition, the autosummary and autodoc Sphinx extensions are used to document the docstrings. All generated documentation is put in the generated/ directory on the site. Here is a sample .rst file:

torch.xpu
===================================
.. automodule:: torch.xpu
.. currentmodule:: torch.xpu

.. autosummary::
    :toctree: generated
    :nosignatures:

    StreamContext
    current_device
    current_stream
    device
    ...

After creating your .rst files, run the lintrunner tool to fix all formatting errors.

Submit a PR with the changes, the pytorchbot will post a link to preview your documentation in the PR.

python-docs-preview

For your PR to land, you need to have the following jobs to succeed:

  • all related linters
  • labeler
  • docs_test
  • linux_docs (build-docs-cpp-false, build-docs-python-false, build-docs-functorch-false)

Here is how these jobs appear in your PR:

pytorch_docs_job

After your PR is approved, you can merge it by using @pytorchbot merge. For more information about pytorchbot, see pytorchbot commands.

General guidelines

When writing a docstring for any callable object, follow these guidelines:

  • Language: Write all docstrings in English and use American English spelling. A good docstring should be descriptive and concise, providing all the necessary information without being overly verbose.
  • Docstring Format: PyTorch uses the Google style for docstrings and the PEP-257 docstring conventions.
  • Syntax: Use the standard .rst syntax inside of the docstrings, as well as Sphinx directives. For example, you can use such Sphinx directives, as :math:, :ref:, :attr:, :class:, :func:, and so on.
  • Docstring Type Formatting: Review the special guidelines for docstring type formatting in the CONTRIBUTING.md.

Raw docstrings vs regular strings

In Python, docstrings can be defined using either regular strings or raw strings. In the PyTorch codebase, you will find both usages. In most cases, regular strings should be your preferred choice apart from the cases where extensive math syntax is required.

Regular Strings

Regular strings are defined using triple quotes """. In a regular string, backslashes \ are treated as escape characters. This means that sequences like \n, \t, \\, and others have special meanings. For example, \n represents a newline, and \\ represents a single backslash. Regular strings are suitable for most docstrings. However, if your docstring includes a lot of backslashes, such as in LaTeX math expressions, you might find it easier to use a raw string instead.

Raw Strings

Raw strings are defined using r""". The r prefix before the opening quote denotes a raw string. In a raw string, backslashes \ are treated as literal characters, not as escape characters. This is useful when you want to include backslashes in your string without causing an escape sequence. For example, in a raw string, \n is just two characters - a backslash and an n. Raw strings are often used for docstrings that include LaTeX math expressions, regular expressions, or other content that includes a lot of backslashes.

Including content into the generated documentation set

In the PyTorch codebase, you'll find both autodoc and autosummary conventions being used to publish the documentation. While both autosummary and autodoc are powerful tools for generating documentation, there are several reasons why autosummary might be preferred over autodoc in certain situations.

Autosummary generates concise summary tables for modules, classes, and functions and places them in the generated directory in the documentation build. This makes it easier for users to get an overview of the API and navigate to the specific sections they are interested in. Autodoc, on the other hand, generates a one pager documentation for all functions in a class which is often overwhelming and hard for the users to read. In most cases, you'll find that autosummary is a better way of organizing API documentation for PyTorch.

Documenting a module

In Python, a module refers to any .py file that contains Python definitions and statements. Modules are used to logically organize Python code, and they provide a way to reuse code across different projects.

When documenting a module, follow these guidelines:

  • Short description: On the first line of the module's .py file, add a one sentence description followed by a blank line. For example:

    """This package introduces support for the XPU backend, specifically tailored for Intel GPU optimization."""
    
  • Long description: A more detailed description of what the module does. For example:

    This package is lazily initialized, so you can always import it, and use
    :func:`is_available()` to determine if your system supports XPU.
    
  • Classes and functions descriptions: Follow the guidelines on how to add docstrings for classes and functions to describe all public classes and functions in a module. See more in Documenting a Class and Documenting a function.

To include the content of a module, you should create a new .rst file (or use an existing one) by the name of your module. For example, to document a module called torch.xpu, we have a xpu.rst file with the following content:

torch.xpu
===================================
.. automodule:: torch.xpu
.. currentmodule:: torch.xpu

.. autosummary::
    :toctree: generated
    :nosignatures:

    StreamContext
    current_device
    current_stream
    device
    device_count
    device_of
    empty_cache
    ...

Autosummary is the preferred way of documenting docstrings. In this example, autosummary creates a comprehensive table with links to descriptions of the methods and classes that is easy to navigate.

Documenting a class

When documenting a class, the docstring should include the following sections:

  • Short Description: A brief summary of the class's purpose in one sentence followed by a blank line.
  • Long Description: A more detailed explanation of what the class does and how it works.
  • (Optional) Shape: If your class or function is designed to work with data of a certain shape, outputs the data in a specific shape, or modifies data's shape, document its shape so that users know how to format their data.
  • Attributes (or Variables): A list of the class's attributes and their types.
  • Example: An example of how to use the class. One example per class.

Here is an example of a well-documented class in PyTorch:

class Linear(nn.Module):
    """
    A module for applying a linear transformation to the input data.

    This module supports input of any dimension and applies a specified linear transformation.
    The transformation is defined by the formula :math:`y = xA^T + b`.
    Shape:
        - Input: :math:`(*, H_{in})` where :math:`*` means any number of
          dimensions including none and :math:`H_{in} = \text{in\_features}`.
        - Output: :math:`(*, H_{out})` where all but the last dimension
          are the same shape as the input and :math:`H_{out} = \text{out\_features}`.
    Attributes:
        in_features (int): Size of each input sample.
        out_features (int): Size of each output sample.
        bias (Tensor): The learnable bias of the module of shape. If set to
       ``False``, the layer will not learn an additive bias. Default: ``True``.
    Example:
        >>> m = nn.Linear(20, 30)
        >>> input = torch.randn(128, 20)
        >>> output = m(input)
        >>> print(output.size())
        torch.Size([128, 30])
    """
    def __init__(self, in_features, out_features, bias=True):
        ...

Typically, you would include a class into module description with autosummary as described above.

Documenting a function

When documenting a function, the docstring should include the following sections:

  • Short Description: A brief summary of the function's purpose in one sentence followed by a blank line. The first word of the summary should be a verb in imperative mood. For example: "Adds". A period should conclude the short description.
  • Long Description: A more detailed description of what the function does.
  • Args: A description of the function's arguments, including the type of each argument in parenthesis. For example: x (Tensor).
  • Returns: A description of the return value and its type. Add a column after the value and before the type. For example, Tensor: The sum of x and y.
  • Raises: Any exceptions that the function may raise. For example, TypeError: x and y are not the same type.
  • Examples: Examples of how to use the function. Each function should have at least one example but less than 5. The examples should be written in doctest format, which means they should start with >>>. For example:
>>> x = torch.tensor([1, 2])
>>> y = torch.tensor([3, 4])
>>> add(x, y)
tensor([4, 6])

Here is an example of a well-documented function in PyTorch:

def add(x, y):
   """
   Adds two tensors.


   This function takes two tensors as input and returns a new tensor that is the element-wise sum of the input tensors.
   Args:
       x (Tensor): The first operand.
       y (Tensor): The second operand.
   Returns:
       Tensor: The sum of x and y.
   Raises:
       TypeError: If x and y are not both Tensors.
   Example:
       >>> x = torch.tensor([1, 2])
       >>> y = torch.tensor([3, 4])
       >>> add(x, y)
       tensor([4, 6])
   """
   return x + y

Typically, you would include a function into module description with autosummary as described above.

Documenting deprecated methods

When a function, method, or class is deprecated, it's important to clearly document this in the docstring. This helps users understand that they should avoid using the deprecated callable, and it provides information about alternatives.

Here are some guidelines for documenting deprecated callable:

  • Indicate Deprecation: At the beginning of the docstring, after the short description, clearly state that the callable is deprecated. Use the .. deprecated:: Sphinx directive to start the deprecation message.
  • Version: Mention the version in which the callable was deprecated.
  • Reason: If applicable, briefly explain why the callable was deprecated.
  • Alternative: Recommend an alternative function, method, or class that users should use instead of the deprecated callable.
  • Removal Version: If known, indicate in which future version the deprecated callable will be removed.

Here is an example:

def old_function(x, y):
    """
    This function adds two numbers.
    .. deprecated:: 1.4.0
       `old_function` is deprecated, use `new_function` instead. It will be removed in version 2.0.0.
    Args:
        x (float): The first number.
        y (float): The second number.
    Returns:
        float: The sum of x and y.
    """
    return x + y

See also

The following resources provide additional information and examples on how to write and generate documentation for Python projects.

  • PEP 257 - Docstring Conventions: The official Python Enhancement Proposal which provides conventions for Python docstrings.
  • Sphinx: A tool that makes it easy to create intelligent and beautiful documentation.
  • Sphinx Directives: Instructions for Sphinx on how to handle specific elements of the documentation, such as code blocks, images, or sections.
  • Autosummary: A Sphinx extension that generates function/method/attribute summary lists.
  • Autodoc: A Sphinx extension that automatically extracts documentation from your Python modules.
  • Google Style Python Docstrings: Examples and guidelines for writing Python docstrings in Google style.
  • reStructuredText Primer: A guide to the markup syntax used by Sphinx and autosummary.
Clone this wiki locally