Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link API docs to user guide and other examples #5816

Open
rabernat opened this issue Sep 24, 2021 · 3 comments
Open

Link API docs to user guide and other examples #5816

rabernat opened this issue Sep 24, 2021 · 3 comments

Comments

@rabernat
Copy link
Contributor

Noting down a comment by @danjonesocean on Twitter: https://twitter.com/DanJonesOcean/status/1441392596362874882

In general, having more examples on each xarray page (like the one below) would be good. Then they would come up quickly in function searches:

http://xarray.pydata.org/en/stable/generated/xarray.Dataset.merge.html#xarray.Dataset.merge

Our API docs are generated by the function docstrings, and these are usually the first thing users hit when they search for functions. However, these docstring uniformly lack examples, often leaving users stuck.

I see two ways to mitigate this:

  • Add examples directly to the docstings (suggested by @jklymak)
  • Cross reference other examples from the user guide or other tutorials
@jklymak
Copy link
Contributor

jklymak commented Sep 24, 2021

I think doing both is always appreciated: short "how to" with a common pattern or two and then a "see also".

@raybellwaves
Copy link
Contributor

See https://github.com/pydata/xarray/blob/main/xarray/core/dataset.py#L2022

for how to link the doc string to other parts of the docs

@keewis
Copy link
Collaborator

keewis commented Oct 10, 2021

I agree, this is would be really helpful. There's lots to do, though: a small script reports about 170 functions / methods without example sections (and 6 numpy wrappers), and all others could use reviews:

analysis script
import itertools

import numpy as np
import xarray as xr


def public_api(obj, base_name):
    api = dir(obj)
    public_api = tuple(name for name in api if not name.startswith("_"))
    public_api_objects = {name: getattr(obj, name) for name in public_api}

    # uppercase indicates classes
    public_functions = {
        f"{base_name}.{name}": obj
        for name, obj in public_api_objects.items()
        if not name[0].isupper() and callable(obj)
    }

    return public_functions


def is_dict_method(fqn):
    *_, name = fqn.split(".")

    return hasattr(dict, name)


def has_section(docstring, name):
    lines = docstring.split("\n")
    marker = "-" * len(name)
    for current, next in itertools.zip_longest(lines, lines[1:]):
        if next is None or next.strip() != marker:
            continue

        if current.strip() == name:
            return True

    return False


def is_numpy_wrapper(fqn, docstring):
    *_, name = fqn.split(".")

    segment = f"Refer to `numpy.{name}` for full documentation."
    shortened_segment = f"Refer to `numpy.{name[:4]}` for full documentation."
    if segment in docstring or shortened_segment in docstring:
        return True

    numpy_func = getattr(np, name, None)
    if numpy_func is None:
        return False

    numpy_docstring = numpy_func.__doc__
    return numpy_docstring in docstring


def format_names(names):
    return "\n".join(f"  - {name}" for name in names)


namespaces = {
    "xarray": xr,
    "xarray.DataArray": xr.DataArray,
    "xarray.Dataset": xr.Dataset,
}

funcs = dict(
    itertools.chain.from_iterable(
        public_api(namespace, base_name=name).items()
        for name, namespace in namespaces.items()
    )
)

docstrings = {name: func.__doc__ for name, func in funcs.items()}

without_docstring = tuple(
    name
    for name, docstring in docstrings.items()
    if docstring is None or not docstring.strip()
)

filtered_docstrings = tuple(
    (name, docstring)
    for name, docstring in docstrings.items()
    if (
        docstring is not None
        and not has_section(docstring, name="Examples")
        and not is_dict_method(name)
    )
)
without_examples_xarray = tuple(
    name
    for name, docstring in filtered_docstrings
    if not is_numpy_wrapper(name, docstring)
)
without_examples_numpy = tuple(
    name for name, docstring in filtered_docstrings if is_numpy_wrapper(name, docstring)
)


print(
    f"functions without examples ({len(without_examples_xarray)}):",
    format_names(without_examples_xarray),
    sep="\n",
)
print()
print(
    f"numpy wrappers without examples ({len(without_examples_numpy)}):",
    format_names(without_examples_numpy),
    sep="\n",
)
print()
print(
    f"functions without docstring ({len(without_docstring)}):",
    format_names(without_docstring),
    sep="\n",
)
analysis report in current main (with coarse filtering of advanced / deprecated API)
functions without examples (170):
  - xarray.decode_cf
  - xarray.get_options
  - xarray.infer_freq
  - xarray.load_dataarray
  - xarray.load_dataset
  - xarray.open_dataarray
  - xarray.open_dataset
  - xarray.open_mfdataset
  - xarray.open_zarr
  - xarray.polyval
  - xarray.unify_chunks
  - xarray.DataArray.all
  - xarray.DataArray.any
  - xarray.DataArray.as_numpy
  - xarray.DataArray.assign_attrs
  - xarray.DataArray.astype
  - xarray.DataArray.bfill
  - xarray.DataArray.broadcast_equals
  - xarray.DataArray.chunk
  - xarray.DataArray.close
  - xarray.DataArray.combine_first
  - xarray.DataArray.compute
  - xarray.DataArray.conj
  - xarray.DataArray.count
  - xarray.DataArray.cumprod
  - xarray.DataArray.cumsum
  - xarray.DataArray.curvefit
  - xarray.DataArray.drop
  - xarray.DataArray.drop_duplicates
  - xarray.DataArray.drop_isel
  - xarray.DataArray.drop_sel
  - xarray.DataArray.drop_vars
  - xarray.DataArray.dropna
  - xarray.DataArray.equals
  - xarray.DataArray.expand_dims
  - xarray.DataArray.ffill
  - xarray.DataArray.fillna
  - xarray.DataArray.from_cdms2
  - xarray.DataArray.from_dict
  - xarray.DataArray.from_iris
  - xarray.DataArray.from_series
  - xarray.DataArray.get_axis_num
  - xarray.DataArray.get_index
  - xarray.DataArray.groupby_bins
  - xarray.DataArray.head
  - xarray.DataArray.identical
  - xarray.DataArray.interp_like
  - xarray.DataArray.load
  - xarray.DataArray.max
  - xarray.DataArray.mean
  - xarray.DataArray.median
  - xarray.DataArray.min
  - xarray.DataArray.persist
  - xarray.DataArray.plot
  - xarray.DataArray.polyfit
  - xarray.DataArray.prod
  - xarray.DataArray.reduce
  - xarray.DataArray.reindex_like
  - xarray.DataArray.rename
  - xarray.DataArray.reorder_levels
  - xarray.DataArray.reset_coords
  - xarray.DataArray.reset_index
  - xarray.DataArray.rolling_exp
  - xarray.DataArray.searchsorted
  - xarray.DataArray.set_close
  - xarray.DataArray.squeeze
  - xarray.DataArray.std
  - xarray.DataArray.str
  - xarray.DataArray.sum
  - xarray.DataArray.tail
  - xarray.DataArray.thin
  - xarray.DataArray.to_cdms2
  - xarray.DataArray.to_dataframe
  - xarray.DataArray.to_dataset
  - xarray.DataArray.to_dict
  - xarray.DataArray.to_index
  - xarray.DataArray.to_iris
  - xarray.DataArray.to_masked_array
  - xarray.DataArray.to_netcdf
  - xarray.DataArray.to_numpy
  - xarray.DataArray.to_pandas
  - xarray.DataArray.to_series
  - xarray.DataArray.transpose
  - xarray.DataArray.unify_chunks
  - xarray.DataArray.var
  - xarray.DataArray.weighted
  - xarray.Dataset.all
  - xarray.Dataset.any
  - xarray.Dataset.apply
  - xarray.Dataset.argmax
  - xarray.Dataset.argmin
  - xarray.Dataset.as_numpy
  - xarray.Dataset.assign_attrs
  - xarray.Dataset.astype
  - xarray.Dataset.bfill
  - xarray.Dataset.broadcast_equals
  - xarray.Dataset.broadcast_like
  - xarray.Dataset.chunk
  - xarray.Dataset.close
  - xarray.Dataset.combine_first
  - xarray.Dataset.compute
  - xarray.Dataset.conj
  - xarray.Dataset.count
  - xarray.Dataset.cumprod
  - xarray.Dataset.cumsum
  - xarray.Dataset.curvefit
  - xarray.Dataset.differentiate
  - xarray.Dataset.drop
  - xarray.Dataset.drop_dims
  - xarray.Dataset.drop_vars
  - xarray.Dataset.dropna
  - xarray.Dataset.dump_to_store
  - xarray.Dataset.equals
  - xarray.Dataset.expand_dims
  - xarray.Dataset.ffill
  - xarray.Dataset.from_dataframe
  - xarray.Dataset.from_dict
  - xarray.Dataset.get_index
  - xarray.Dataset.groupby_bins
  - xarray.Dataset.head
  - xarray.Dataset.identical
  - xarray.Dataset.info
  - xarray.Dataset.interp_like
  - xarray.Dataset.isel
  - xarray.Dataset.load
  - xarray.Dataset.load_store
  - xarray.Dataset.max
  - xarray.Dataset.mean
  - xarray.Dataset.median
  - xarray.Dataset.merge
  - xarray.Dataset.min
  - xarray.Dataset.persist
  - xarray.Dataset.plot
  - xarray.Dataset.polyfit
  - xarray.Dataset.prod
  - xarray.Dataset.rank
  - xarray.Dataset.reduce
  - xarray.Dataset.reindex_like
  - xarray.Dataset.rename
  - xarray.Dataset.rename_dims
  - xarray.Dataset.rename_vars
  - xarray.Dataset.reorder_levels
  - xarray.Dataset.reset_coords
  - xarray.Dataset.reset_index
  - xarray.Dataset.rolling_exp
  - xarray.Dataset.sel
  - xarray.Dataset.set_close
  - xarray.Dataset.set_coords
  - xarray.Dataset.squeeze
  - xarray.Dataset.stack
  - xarray.Dataset.std
  - xarray.Dataset.sum
  - xarray.Dataset.tail
  - xarray.Dataset.thin
  - xarray.Dataset.to_array
  - xarray.Dataset.to_dask_dataframe
  - xarray.Dataset.to_dataframe
  - xarray.Dataset.to_dict
  - xarray.Dataset.to_netcdf
  - xarray.Dataset.to_pandas
  - xarray.Dataset.to_zarr
  - xarray.Dataset.transpose
  - xarray.Dataset.unify_chunks
  - xarray.Dataset.unstack
  - xarray.Dataset.var
  - xarray.Dataset.weighted

numpy wrappers without examples (6):
  - xarray.DataArray.argsort
  - xarray.DataArray.clip
  - xarray.DataArray.conjugate
  - xarray.Dataset.argsort
  - xarray.Dataset.clip
  - xarray.Dataset.conjugate

functions without docstring (1):
  - xarray.DataArray.dt

For some of those it might be difficult to write examples for (i.e. the I/O methods / functions), and modifying the docstring of the numpy wrappers would require some refactoring.

Most of the remaining ones are pretty easy to figure out and write examples for, however. If we remove the entry barrier as much as possible maybe these can be good first PRs with high impact?

I'm not really sure how to do that, though... I'd try to group them by difficulty and create sub-issues with a check list for each group (so they're easy to find), and maybe also explain what a good example looks like or link to a appropriate explanation. The latter might be a good thing to add to the contributor's guide, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants