# Docstrings

## Introduction

>If the implementation is difficult to explain, it is a bad idea. If the implementation is easy to explain, it may be a good idea.

_[The Zen of Python](https://www.python.org/dev/peps/pep-0020/)_

The Zen of Python is a set of eight short principles by Tim Peters. Even though it looks like an internal easter egg, it is actually a very important principle. In this paragraph, Tim Peters talks about the importance of explanation ease and readability in regard to implementation.

One great strategy to realise the difficulty (or ease) of explaining your code is to define a docstring. Docstrings are employed to document code. They help you explain your code to the user or another developer.

Thus, when writing code, you should consider how easy it would be for the user or another developer to understand your code. If it is difficult to explain, the implementation is bad. If it is easy, the implementation is good.

Another way to document your code is by strict documentation of your program (we will cover that later in this module).

Apart from documentation, you should consider the type of variables. For example, if your function is to work with a list of numbers, you can specify this information in both the docstring and the function arguments.

> Code tells you how, while comments tell you why.

By now, the code you have been working with should have explanatory comments. Commenting has many purposes:
- For describing sections of your code.
- For noticing and using algorithms that would otherwise be difficult to notice.
- Tagging (probably one of the most important uses), which is utilised to mark a section of the code (conventionally) as incomplete. Typical tags are BUG, FIXME and TODO.

In [10]:
my_list = [1, 2, 3]
# TODO: Check the length of this list
length = 3
# FIXME: Use the len() function
length = len(my_list)


Docstrings present another way to add comments; however, they are more concrete and targeted to functions, methods, classes, modules or packages (as we will see later).

Docstrings can be checked using the `__doc__` attribute or using the help() built-in function.

In [9]:
import pandas as pd
# help(pd)
print(pd.__doc__)


pandas - a powerful data analysis and manipulation library for Python

**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way toward this goal.

Main Features
-------------
Here are just a few of the things that pandas does well:

  - Easy handling of missing data in floating point as well as non-floating
    point data.
  - Size mutability: columns can be inserted and deleted from DataFrame and
    higher dimensional objects
  - Automatic and explicit data alignment: objects can be explicitly aligned
    to a set of labels, or the user can simply ignore the labels and

In this case, we saw the docstring of the pandas module; however, we can also check the docstrings of its methods.

In [7]:
print(pd.DataFrame.from_dict.__doc__)


        Construct DataFrame from dict of array-like or dicts.

        Creates DataFrame object from dictionary by columns or by index
        allowing dtype specification.

        Parameters
        ----------
        data : dict
            Of the form {field : array-like} or {field : dict}.
        orient : {'columns', 'index', 'tight'}, default 'columns'
            The "orientation" of the data. If the keys of the passed dict
            should be the columns of the resulting DataFrame, pass 'columns'
            (default). Otherwise if the keys should be rows, pass 'index'.
            If 'tight', assume a dict with keys ['index', 'columns', 'data',
            'index_names', 'column_names'].

            .. versionadded:: 1.4.0
               'tight' as an allowed value for the ``orient`` argument

        dtype : dtype, default None
            Data type to force after DataFrame construction, otherwise infer.
        columns : list, default None
            Column labels to u

A docstring can be created by providing a description of the object's functionality. However, recall that you have to use three single (''' docstring ''') or double (""" docstring """) quotation marks.

In [4]:
def say_hi(name):
    # This function says hi to the user
    print(f"Hello {name}")

help(say_hi)

Help on function say_hi in module __main__:

say_hi(name)



In [8]:
def say_hi(name):
    """ This function says hi to the user """
    print(f"Hello {name}")

help(say_hi)

Help on function say_hi in module __main__:

say_hi(name)
    This function says hi to the user



As you may have observed, __a regular comment does not work.__

The convention for docstring can be found in [PEP257 ](https://www.python.org/dev/peps/pep-0257/). However, we recommend visiting the link only after you have learnt how to write docstring, since only the rules to follow when writing one are contained therein.

## Docstring Classification

Docstrings can be classified into two groups: one-line docstrings (similar to the one we saw in the `say_hi` function) or multiline docstrings (which are more descriptive).

### Multiline Docstrings

The structure of a multiline docstring is as follows:
- A one-line summary.
- An empty line.
- An elaborate description.

In [None]:
def say_hi(name):
    """
    This function says hi to the user

    The purpose of this function is to demonstrate how to document
    a function following the convention established in PEP257.
    It actually does not do much, and I am writing this to fill
    the docstring... Lorem ipsum dolor sit amet.
    """
    print("Hello {}".format(name))

help(say_hi)

## Docstring for Classes

Thus far, we have only explored docstrings for functions. However, as mentioned, we can also utilise docstrings for classes.

For classes, they follow the same principle as those for functions, with a few exceptions:
- The docstring should be the first thing in the class definition.
- Each method should have a docstring, provided that the method is not private.
- There is no clear consensus on whether the `__init__` method should have a docstring. However, many frameworks refer to the class docstring when defining the `__init__` method docstring.

In [None]:
class Date:
    '''
    This class is used to represent a date.

    Attributes:
        year (int): the year of the date.
        month (int): the month of the date.
        day (int): the day of the date.functions
    '''
    def __init__(self, year: int, month: int, day: int):
        '''
        See help(Date) for accurate signature
        '''
        self.year = year
        self.month = month
        self.day = day

    def __str__(self):
        '''
        This function is used to return the string representation of the date.

        Returns:
            str: the string representation of the date.
        '''
        return "{0}-{1}-{2}".format(self.year, self.month, self.day)

    def __repr__(self):
        '''
        This function is used to return the string representation of the date.

        Returns:
            str: the string representation of the date.
        '''
        return "{0}-{1}-{2}".format(self.year, self.month, self.day)

    def __eq__(self, other):
        '''
        This function is used to compare the date with other dates.

        Args:
            other (Date): the other date to be compared with.

        Returns:
            bool: true if the date is equal to the other date; false otherwise.
        '''
        return self.year == other.year and self.month == other.month and \
            self.day == other.day

    def __lt__(self, other):
        '''
        This function is used to compare the date with other dates.

        Args:
            other (Date): the other date to be compared with.

        Returns:
            bool: true if the date is less than the other date; False otherwise.
        '''
        if self.year < other.year:
            return True
        elif self.year == other.year:
            if self.month < other.month:
                return True
            elif self.month == other.month:
                if self.day < other.day:
                    return True
        return False


    @staticmethod
    def is_date_valid(year, month, day):
        '''
        This function is used to check if the date is valid.

        Args:
            year (int): the year of the date.
            month (int): the month of the date.
            day (int): the day of the date.

        Returns:
            bool: true if the date is valid; False otherwise.
        '''
        return year >= 0 and month >= 1 and month <= 12 and \
            day >= 1 and day <= 31

    @classmethod
    def from_string(cls, date_as_string):
        '''
        This function is used to create a date from a string.

        Args:
            date_as_string (str): the string representation of the date.

        Returns:
            Date: the date created from the string.
        '''
        year, month, day = map(int, date_as_string.split('-'))
        return cls(year, month, day)

In [None]:
help(Date)

Although there seems to be a considerable amount of work involved, it will pay off in the future.

## Docstrings in Modules and Packages

Docstrings can also be included at the beginning of a module or inside a package containing multiple modules. The principle for both is the same. Therefore, here, we will only show the syntax for the module-level docstring.

To see this, download a script which we can import.

In [11]:
!wget "https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/spreadsheet_printer.py" "https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/Date.py"

--2023-10-21 16:15:54--  https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/spreadsheet_printer.py
Resolving aicore-files.s3.amazonaws.com (aicore-files.s3.amazonaws.com)... 52.216.221.73, 52.217.104.196, 16.182.71.81, ...
Connecting to aicore-files.s3.amazonaws.com (aicore-files.s3.amazonaws.com)|52.216.221.73|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1602 (1.6K) [text/x-python-script]
Saving to: ‘spreadsheet_printer.py’


2023-10-21 16:15:55 (227 MB/s) - ‘spreadsheet_printer.py’ saved [1602/1602]

--2023-10-21 16:15:55--  https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/Date.py
Reusing existing connection to aicore-files.s3.amazonaws.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 3173 (3.1K) [text/x-python-script]
Saving to: ‘Date.py’


2023-10-21 16:15:55 (96.9 MB/s) - ‘Date.py’ saved [3173/3173]

FINISHED --2023-10-21 16:15:55--
Total wall clock time: 0.7s
Downloaded: 2 files, 4.7K in 0s (120

In [None]:
import Date
help(Date)

Later in this module, we will learn how to document a package. Note that it will be in the `__init__.py` file.

## Docstring for the CLI

Occasionally, you may wish to run your program in the command line. In such cases, you will pass arguments to your program. The `spreadsheet_printer.py` program is a simple program that takes a file name as an argument and prints the contents of that file to the screen. You can examine the docstring in the command line by typing `python spreadsheet_printer.py -h`, and you will find that a description of the program's role is provided.

## Docstring Formats

You may have noticed that the formats are different in this notebook and in some of the examples given.

There are specific docstring formats that most users are familiar with. Furthermore, these formats can enable docstring parsers create documentation automatically (Sphinx). Some of the most common formats are:

- [Google](https://google.github.io/styleguide/pyguide.html)
- [Sphinx or reStructuredText](http://sphinx-doc.org/markup/desc.html)
- [Numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html)
- [Epytext](https://epytext.readthedocs.io/en/latest/format.html)

As an example, let us look at how the `spreadsheet_printer` module is documented using each of the above formats.

In [15]:
### Google

"""Gets and prints the spreadsheet's header columns

Args:
    file_loc (str): the file location of the spreadsheet
    print_cols (bool): a flag used to print the columns to the console
        (default is False)

Returns:
    list: a list of strings representing the header columns
"""

"Gets and prints the spreadsheet's header columns\n\nArgs:\n    file_loc (str): the file location of the spreadsheet\n    print_cols (bool): a flag used to print the columns to the console\n        (default is False)\n\nReturns:\n    list: a list of strings representing the header columns\n"

In [16]:
### Sphinx

"""Gets and prints the spreadsheet's header columns

:param file_loc: the file location of the spreadsheet
:type file_loc: str
:param print_cols: a flag used to print the columns to the console
    (default is False)
:type print_cols: bool
:returns: a list of strings representing the header columns
:rtype: list
"""


"Gets and prints the spreadsheet's header columns\n\n:param file_loc: the file location of the spreadsheet\n:type file_loc: str\n:param print_cols: a flag used to print the columns to the console\n    (default is False)\n:type print_cols: bool\n:returns: a list of strings representing the header columns\n:rtype: list\n"

In [None]:
### NumPy

"""Gets and prints the spreadsheet's header columns

Parameters
----------
file_loc : str
    The file location of the spreadsheet
print_cols : bool, optional
    A flag used to print the columns to the console (default is False)

Returns
-------
list
    a list of strings representing the header columns
"""


In [17]:
### Epytext
"""Gets and prints the spreadsheet's header columns

@type file_loc: str
@param file_loc: the file location of the spreadsheet
@type print_cols: bool
@param print_cols: a flag used to print the columns to the console
    (default is False)
@rtype: list
@returns: a list of strings representing the header columns
"""


"Gets and prints the spreadsheet's header columns\n\n@type file_loc: str\n@param file_loc: the file location of the spreadsheet\n@type print_cols: bool\n@param print_cols: a flag used to print the columns to the console\n    (default is False)\n@rtype: list\n@returns: a list of strings representing the header columns\n"

# Summary

- Comments are a great way to document code. But you need more than that to create a good documentation.
- Thus, we need to rely on docstrings
- A docstring is a string that describes the purpose of a function, method, class, script, module, or package.