[DOC] Completing estimator class docstrings #1148

fkiraly · 2021-07-17T09:38:28Z

Every estimator class should have a complete docstring.
This should be worked on one-by-one, and feel free to complete only individual rubrics if it's unclear what to fill in for the others.

A good estimator docstring should include rubrics:

one-liner description (top), start capitalized, end with .
description paragraph - what is the algorithm?
Components block - only if there are estimator components. The list of components should be identical with constructor arguments that are estimators (inheriting from BaseClassifier, BaseForecaster, etc).
Parameters block - individual parameters listed with param_name: type, explanation, explanation should include value/structure convention if expectation is more specific than just stating the type, e.g., n: int, integer between 0 and 42.
The list of Parameters should be identical with constructor arguments that are not estimators.
Attributes block - these are the most important attributes of object instances which are not parameters or components. It should include attributes that correspond to the "fitted model".
Notes - details, formulae, academic references
Example - self-contained example on sktime internal toy data that runs

For formatting, we use the numpy style, though note that the rubrics are slightly different (because we are dealing with algorithms/estimators).
Also look at the extension templates for the algorithm scitype for a "fill-in template" that algorithm implementers are using (or should be using).

Here's an example of a good class docstring:

class BOSSEnsemble(BaseClassifier):
    """Ensemble of bag of Symbolic Fourier Approximation Symbols (BOSS).

    Implementation of BOSS Ensemble from Schäfer (2015). [1]_

    Overview: Input "n" series of length "m" and BOSS performs a grid search over
    a set of parameter values, evaluating each with a LOOCV. It then retains
    all ensemble members within 92% of the best by default for use in the ensmeble.
    There are three primary parameters:
        - alpha: alphabet size
        - w: window length
        - l: word length.

    For any combination, a single BOSS slides a window length "w" along the
    series. The w length window is shortened to an "l" length word through
    taking a Fourier transform and keeping the first l/2 complex coefficients.
    These "l" coefficients are then discretized into alpha possible values,
    to form a word length "l". A histogram of words for each
    series is formed and stored.

    Fit involves finding "n" histograms.

    Predict uses 1 nearest neighbor with a bespoke BOSS distance function.

    Parameters
    ----------
    threshold : float, default=0.92
        Threshold used to determine which classifiers to retain. All classifiers
        within percentage `threshold` of the best one are retained.
    max_ensemble_size : int or None, default=500
        Maximum number of classifiers to retain. Will limit number of retained
        classifiers even if more than `max_ensemble_size` are within threshold.
    max_win_len_prop : int or float, default=1
        Maximum window length as a proportion of the series length.
    min_window : int, default=10
        Minimum window size.
    n_jobs : int, default=1
        The number of jobs to run in parallel for both `fit` and `predict`.
        ``-1`` means using all processors.
    random_state : int or None, default=None
        Seed for random, integer.

    Attributes
    ----------
    n_classes : int
        Number of classes. Extracted from the data.
    n_instances : int
        Number of instances. Extracted from the data.
    n_estimators : int
        The final number of classifiers used. Will be <= `max_ensemble_size` if
        `max_ensemble_size` has been specified.
    series_length : int
        Length of all series (assumed equal).
    classifiers : list
       List of DecisionTree classifiers.

    See Also
    --------
    IndividualBOSS, ContractableBOSS

    Notes
    -------
    For the Java version, see
    `TSML <https://github.com/uea-machine-learning/tsml/blob/master/src/
    main/java/tsml/classifiers/dictionary_based/BOSS.java>`_.

    References
    ----------
    .. [1] Patrick Schäfer, "The BOSS is concerned with time series classification
       in the presence of noise", Data Mining and Knowledge Discovery, 29(6): 2015
       https://link.springer.com/article/10.1007/s10618-014-0377-7

    Example
    -------
    >>> from sktime.classification.dictionary_based import BOSSEnsemble
    >>> from sktime.datasets import load_italy_power_demand
    >>> X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
    >>> X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
    >>> clf = BOSSEnsemble()
    >>> clf.fit(X_train, y_train)
    BOSSEnsemble(...)
    >>> y_pred = clf.predict(X_test)
    """

The text was updated successfully, but these errors were encountered:

mloning · 2021-07-17T09:53:15Z

Running: pydocstyle sktime/ --config=setup.cfg | grep ./ | cut -d ':' -f1 | uniq from a Unix command line should give you a list of all files that have incomplete docstrings.

RNKuhns · 2021-07-17T13:54:38Z

@mloning and @fkiraly, I've got an update to the docstring above that fixes some formatting issues that won't render well in Sphinx (e.g. some of the formatting of parameters and attribute sections). Note that this moves the reference to the paper to the references section as specified in NumPy docstring format. Moved the reference to the Java version to the See Also section, but still need to figure out how to make the link work correctly there.

This also cleans up some typos, capitalization issues aand other minor things.

class BOSSEnsemble(BaseClassifier):
    """Ensemble of bag of Symbolic Fourier Approximation Symbols (BOSS).

    Implementation of BOSS Ensemble from Schäfer (2015). [1]_

    Overview: Input "n" series of length "m" and BOSS performs a grid search over
    a set of parameter values, evaluating each with a LOOCV. It then retains
    all ensemble members within 92% of the best by default for use in the ensmeble.
    There are three primary parameters:
        - alpha: alphabet size
        - w: window length
        - l: word length.

    For any combination, a single BOSS slides a window length "w" along the
    series. The w length window is shortened to an "l" length word through
    taking a Fourier transform and keeping the first l/2 complex coefficients.
    These "l" coefficients are then discretized into alpha possible values,
    to form a word length "l". A histogram of words for each
    series is formed and stored.

    Fit involves finding "n" histograms.

    Predict uses 1 nearest neighbor with a bespoke BOSS distance function.

    Parameters
    ----------
    threshold : float, default=0.92
        Threshold used to determine which classifiers to retain. All classifiers
        within percentage `threshold` of the best one are retained.
    max_ensemble_size : int or None, default=500
        Maximum number of classifiers to retain. Will limit number of retained
        classifiers even if more than `max_ensemble_size` are within threshold.
    max_win_len_prop : int or float, default=1
        Maximum window length as a proportion of the series length.
    min_window : int, default=10
        Minimum window size.
    n_jobs : int, default=1
        The number of jobs to run in parallel for both `fit` and `predict`.
        ``-1`` means using all processors.
    random_state : int or None, default=None
        Seed for random, integer.

    Attributes
    ----------
    n_classes : int
        Number of classes. Extracted from the data.
    n_instances : int
        Number of instances. Extracted from the data.
    n_estimators : int
        The final number of classifiers used. Will be <= `max_ensemble_size` if
        `max_ensemble_size` has been specified.
    series_length : int
        Length of all series (assumed equal).
    classifiers : list
       List of DecisionTree classifiers.

    See Also
    --------
    :py:class:`IndividualBOSS`, :py:class:`ContractableBOSS`

    For the Java version, see
    `TSML <https://github.com/uea-machine-learning/tsml/blob/master/src/
    main/java/tsml/classifiers/dictionary_based/BOSS.java>`_.

    References
    ----------
    .. [1] Patrick Schäfer, "The BOSS is concerned with time series classification
       in the presence of noise", Data Mining and Knowledge Discovery, 29(6): 2015
       https://link.springer.com/article/10.1007/s10618-014-0377-7

    Example
    -------
    >>> from sktime.classification.dictionary_based import BOSSEnsemble
    >>> from sktime.datasets import load_italy_power_demand
    >>> X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
    >>> X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
    >>> clf = BOSSEnsemble()
    >>> clf.fit(X_train, y_train)
    BOSSEnsemble(...)
    >>> y_pred = clf.predict(X_test)
    """

fkiraly · 2021-07-18T09:30:58Z

Looks good, @RNKuhns, thanks!
I cooked up mine just very quickly to help people in the sprint.

Would you mind:

adding the example
editing/overwriting my original post, so the "thing at the top" is the perfect docstring?

RNKuhns · 2021-07-18T12:22:01Z

@fkiraly -- no problem. I've added the example (didn't copy that over correctly in my post) and updated your post at the top.

RNKuhns · 2021-07-18T13:53:59Z

Was able to get with the Scipy documentation sprint and figure out how to specify the link. I've updated both posts to include the correct link usage in "See Also"

RNKuhns · 2021-07-20T14:35:00Z

I've made another minor tweak to the doc -- the references to related classifiers in See Also will now also work.

mloning · 2021-12-04T01:41:05Z

I think we should transfer this issue into the relevant developer guide section and close it.

fkiraly mentioned this issue Jul 17, 2021

Good first issues & getting started, for new contributors #1147

Open

mloning added the documentation Documentation & tutorials label Jul 17, 2021

mloning added this to the The Great Documentation Overhaul milestone Jul 17, 2021

fkiraly changed the title ~~Completing estimator class docstrings~~ [DOC] Completing estimator class docstrings Jul 17, 2021

ltoniazzi mentioned this issue Jul 17, 2021

adding example in docstring of KNeighborsTimeSeriesClassifier #1155

Merged

5 tasks

RNKuhns mentioned this issue Jul 18, 2021

Refine the Docstrings for BOSS Classifiers #1166

Merged

5 tasks

fkiraly mentioned this issue Aug 16, 2021

Add content to documentation guide for use in docsprint #1297

Merged

5 tasks

AreloTanoh mentioned this issue Oct 1, 2021

[DOC] fix docstring in Feature Union #1470

Merged

5 tasks

fkiraly closed this as completed Apr 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Completing estimator class docstrings #1148

[DOC] Completing estimator class docstrings #1148

fkiraly commented Jul 17, 2021 •

edited by RNKuhns

mloning commented Jul 17, 2021

RNKuhns commented Jul 17, 2021 •

edited

fkiraly commented Jul 18, 2021

RNKuhns commented Jul 18, 2021

RNKuhns commented Jul 18, 2021

RNKuhns commented Jul 20, 2021

mloning commented Dec 4, 2021

[DOC] Completing estimator class docstrings #1148

[DOC] Completing estimator class docstrings #1148

Comments

fkiraly commented Jul 17, 2021 • edited by RNKuhns

mloning commented Jul 17, 2021

RNKuhns commented Jul 17, 2021 • edited

fkiraly commented Jul 18, 2021

RNKuhns commented Jul 18, 2021

RNKuhns commented Jul 18, 2021

RNKuhns commented Jul 20, 2021

mloning commented Dec 4, 2021

fkiraly commented Jul 17, 2021 •

edited by RNKuhns

RNKuhns commented Jul 17, 2021 •

edited