-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] Completing estimator class docstrings #1148
Comments
Running: |
@mloning and @fkiraly, I've got an update to the docstring above that fixes some formatting issues that won't render well in Sphinx (e.g. some of the formatting of parameters and attribute sections). Note that this moves the reference to the paper to the references section as specified in NumPy docstring format. Moved the reference to the Java version to the See Also section, but still need to figure out how to make the link work correctly there. This also cleans up some typos, capitalization issues aand other minor things. class BOSSEnsemble(BaseClassifier):
"""Ensemble of bag of Symbolic Fourier Approximation Symbols (BOSS).
Implementation of BOSS Ensemble from Schäfer (2015). [1]_
Overview: Input "n" series of length "m" and BOSS performs a grid search over
a set of parameter values, evaluating each with a LOOCV. It then retains
all ensemble members within 92% of the best by default for use in the ensmeble.
There are three primary parameters:
- alpha: alphabet size
- w: window length
- l: word length.
For any combination, a single BOSS slides a window length "w" along the
series. The w length window is shortened to an "l" length word through
taking a Fourier transform and keeping the first l/2 complex coefficients.
These "l" coefficients are then discretized into alpha possible values,
to form a word length "l". A histogram of words for each
series is formed and stored.
Fit involves finding "n" histograms.
Predict uses 1 nearest neighbor with a bespoke BOSS distance function.
Parameters
----------
threshold : float, default=0.92
Threshold used to determine which classifiers to retain. All classifiers
within percentage `threshold` of the best one are retained.
max_ensemble_size : int or None, default=500
Maximum number of classifiers to retain. Will limit number of retained
classifiers even if more than `max_ensemble_size` are within threshold.
max_win_len_prop : int or float, default=1
Maximum window length as a proportion of the series length.
min_window : int, default=10
Minimum window size.
n_jobs : int, default=1
The number of jobs to run in parallel for both `fit` and `predict`.
``-1`` means using all processors.
random_state : int or None, default=None
Seed for random, integer.
Attributes
----------
n_classes : int
Number of classes. Extracted from the data.
n_instances : int
Number of instances. Extracted from the data.
n_estimators : int
The final number of classifiers used. Will be <= `max_ensemble_size` if
`max_ensemble_size` has been specified.
series_length : int
Length of all series (assumed equal).
classifiers : list
List of DecisionTree classifiers.
See Also
--------
:py:class:`IndividualBOSS`, :py:class:`ContractableBOSS`
For the Java version, see
`TSML <https://github.com/uea-machine-learning/tsml/blob/master/src/
main/java/tsml/classifiers/dictionary_based/BOSS.java>`_.
References
----------
.. [1] Patrick Schäfer, "The BOSS is concerned with time series classification
in the presence of noise", Data Mining and Knowledge Discovery, 29(6): 2015
https://link.springer.com/article/10.1007/s10618-014-0377-7
Example
-------
>>> from sktime.classification.dictionary_based import BOSSEnsemble
>>> from sktime.datasets import load_italy_power_demand
>>> X_train, y_train = load_italy_power_demand(split="train", return_X_y=True)
>>> X_test, y_test = load_italy_power_demand(split="test", return_X_y=True)
>>> clf = BOSSEnsemble()
>>> clf.fit(X_train, y_train)
BOSSEnsemble(...)
>>> y_pred = clf.predict(X_test)
""" |
Looks good, @RNKuhns, thanks! Would you mind:
|
@fkiraly -- no problem. I've added the example (didn't copy that over correctly in my post) and updated your post at the top. |
Was able to get with the Scipy documentation sprint and figure out how to specify the link. I've updated both posts to include the correct link usage in "See Also" |
I've made another minor tweak to the doc -- the references to related classifiers in See Also will now also work. |
I think we should transfer this issue into the relevant developer guide section and close it. |
Every estimator class should have a complete docstring.
This should be worked on one-by-one, and feel free to complete only individual rubrics if it's unclear what to fill in for the others.
A good estimator docstring should include rubrics:
.
Components
block - only if there are estimator components. The list of components should be identical with constructor arguments that are estimators (inheriting fromBaseClassifier
,BaseForecaster
, etc).Parameters
block - individual parameters listed withparam_name: type, explanation
, explanation should include value/structure convention if expectation is more specific than just stating the type, e.g.,n: int, integer between 0 and 42
.The list of
Parameters
should be identical with constructor arguments that are not estimators.Attributes
block - these are the most important attributes of object instances which are not parameters or components. It should include attributes that correspond to the "fitted model".Notes
- details, formulae, academic referencesExample
- self-contained example onsktime
internal toy data that runsFor formatting, we use the numpy style, though note that the rubrics are slightly different (because we are dealing with algorithms/estimators).
Also look at the extension templates for the algorithm scitype for a "fill-in template" that algorithm implementers are using (or should be using).
Here's an example of a good class docstring:
The text was updated successfully, but these errors were encountered: