Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataPipe] Update docstring for functional form of DataPipes #100446

Closed
wants to merge 11 commits into from
66 changes: 66 additions & 0 deletions test/test_datapipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import os
import os.path
import pickle
import pydoc
import random
import sys
import tempfile
Expand Down Expand Up @@ -831,6 +832,43 @@ def _fn3(x):
with self.assertRaises((pickle.PicklingError, AttributeError)):
pickle.dumps(datapipe)

def test_docstring(self):
"""
Ensure functional form of IterDataPipe has the correct docstring from
the class form.

Regression test for https://github.com/pytorch/data/issues/792.
"""
input_dp = dp.iter.IterableWrapper(range(10))

for dp_funcname in [
"batch",
"collate",
"concat",
"demux",
"filter",
"fork",
"map",
"mux",
"read_from_stream",
# "sampler",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that SamplerIterDataPipe is missing the functional form at

class SamplerIterDataPipe(IterDataPipe[T_co]):
r"""
Generates sample elements using the provided ``Sampler`` (defaults to :class:`SequentialSampler`).
Args:
datapipe: IterDataPipe to sample from
sampler: Sampler class to generate sample elements from input DataPipe.
Default is :class:`SequentialSampler` for IterDataPipe
"""

Should add the @functional_datapipe('sample') decorator in a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we left it out for a reason, we can add the functional form for sampler later if there is demand for me

"shuffle",
"unbatch",
"zip",
]:
if sys.version_info >= (3, 9):
docstring = pydoc.render_doc(
thing=getattr(input_dp, dp_funcname), forceload=True
)
elif sys.version_info < (3, 9):
# pydoc works differently on Python 3.8, see
# https://docs.python.org/3/whatsnew/3.9.html#pydoc
docstring = getattr(input_dp, dp_funcname).__doc__

assert f"(functional name: ``{dp_funcname}``)" in docstring
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14 Can you have a look at the failing CI tests? They seem relevant

Yes I see the error, though it seems to be only for certain combinations of the build matrix which is weird. Traceback at https://github.com/pytorch/pytorch/actions/runs/4929875273/jobs/8810511389#step:13:628:

==================================== RERUNS ====================================
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
  File "test_datapipe.py", line 860, in test_docstring
    assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
  File "test_datapipe.py", line 860, in test_docstring
    assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
=================================== FAILURES ===================================
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
  File "test_datapipe.py", line 860, in test_docstring
    assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
- generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_datapipe/test_datapipe-17e8ddb9c33a3903.xml -
=========================== short test summary info ============================
FAILED [0.0028s] test_datapipe.py::TestFunctionalIterDataPipe::test_docstring - AssertionError
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
=============== 1 failed, 22 passed, 7 skipped, 2 rerun in 0.45s ===============
Got exit code 1, retrying (retries left=2)
Test results will be stored in test-reports/python-pytest/test_datapipe/test_datapipe-0b2a1e96f86b787b.xml

Logs aren't very helpful, let me try and track down the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging in the changes from #100503 to see if it helps. I can't seem to reproduce this locally on my setup, and not quite sure what the shard/num_shards config means in the linux-bionic-py3_8-clang9-build CI build matrix at https://github.com/pytorch/pytorch/blob/daed3bf8f9d10367ae3a34d8ea6ca2b594f9afe2/.github/workflows/pull.yml#L124C1-L139

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried using pydoc.render(..., forceload=True) at c21eaa5, but somehow the docstring still isn't updated on some build matrix combinations. Getting an output like so from https://github.com/pytorch/pytorch/actions/runs/4955538159/jobs/8877794743?pr=100446#step:14:1087:

  __________________ TestFunctionalIterDataPipe.test_docstring ___________________
  Traceback (most recent call last):
    File "test_datapipe.py", line 863, in test_docstring
      assert f"(functional name: ``{dp_funcname}``)" in docstring
  AssertionError
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "test_datapipe.py", line 868, in test_docstring
      raise ValueError(dp_funcname, "IterDataPipe docstring incorrect")
  ValueError: ('batch', 'IterDataPipe docstring incorrect')
  ----------------------------- Captured stdout call -----------------------------
  ***Begin docstring for batch
  Python Library Documentation: partial in module torch.utils.data.datapipes.iter.grouping object
  
  class p�pa�ar�rt�ti�ia�al�l(builtins.object)
   |  partial(func, *args, **keywords) - new function with partial application
   |  of the given arguments and keywords.
   |
   |  Methods defined here:
   |
   |  _�__�_c�ca�al�ll�l_�__�_(self, /, *args, **kwargs)
   |      Call self as a function.
   |
   |  _�__�_d�de�el�la�at�tt�tr�r_�__�_(self, name, /)
   |      Implement delattr(self, name).
   |
   |  _�__�_g�ge�et�ta�at�tt�tr�ri�ib�bu�ut�te�e_�__�_(self, name, /)
   |      Return getattr(self, name).
   |
   |  _�__�_r�re�ed�du�uc�ce�e_�__�_(...)
   |      Helper for pickle.
   |
   |  _�__�_r�re�ep�pr�r_�__�_(self, /)
   |      Return repr(self).
   |
   |  _�__�_s�se�et�ta�at�tt�tr�r_�__�_(self, name, value, /)
   |      Implement setattr(self, name, value).
   |
   |  _�__�_s�se�et�ts�st�ta�at�te�e_�__�_(...)
   |
   |  ----------------------------------------------------------------------
   |  Static methods defined here:
   |
   |  _�__�_n�ne�ew�w_�__�_(*args, **kwargs) from builtins.type
   |      Create and return a new object.  See help(type) for accurate signature.
   |
   |  ----------------------------------------------------------------------
   |  Data descriptors defined here:
   |
   |  _�__�_d�di�ic�ct�t_�__�_
   |
   |  a�ar�rg�gs�s
   |      tuple of arguments to future partial calls
   |
   |  f�fu�un�nc�c
   |      function object to use in future partial calls
   |
   |  k�ke�ey�yw�wo�or�rd�ds�s
   |      dictionary of keyword arguments to future partial calls
  
  ***End docstring for batch
  __________________ TestFunctionalIterDataPipe.test_docstring ___________________
  Traceback (most recent call last):
    File "test_datapipe.py", line 863, in test_docstring
      assert f"(functional name: ``{dp_funcname}``)" in docstring
  AssertionError
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "test_datapipe.py", line 868, in test_docstring
      raise ValueError(dp_funcname, "IterDataPipe docstring incorrect")
  ValueError: ('batch', 'IterDataPipe docstring incorrect')

Not sure why the docstring output is repeated in some parts 😕 I could xfail those tests for now and investigate them later?

assert "Args:" in docstring
assert "Example:" in docstring or "Examples:" in docstring

def test_iterable_wrapper_datapipe(self):

input_ls = list(range(10))
Expand Down Expand Up @@ -1894,6 +1932,34 @@ def _fn1(x):
with self.assertRaises((pickle.PicklingError, AttributeError)):
pickle.dumps(datapipe)

def test_docstring(self):
"""
Ensure functional form of MapDataPipe has the correct docstring from
the class form.

Regression test for https://github.com/pytorch/data/issues/792.
"""
input_dp = dp.map.SequenceWrapper(range(10))

for dp_funcname in [
"batch",
"concat",
"map",
"shuffle",
"zip",
]:
if sys.version_info >= (3, 9):
docstring = pydoc.render_doc(
thing=getattr(input_dp, dp_funcname), forceload=True
)
elif sys.version_info < (3, 9):
# pydoc works differently on Python 3.8, see
# https://docs.python.org/3/whatsnew/3.9.html#pydoc
docstring = getattr(input_dp, dp_funcname).__doc__
assert f"(functional name: ``{dp_funcname}``)" in docstring
assert "Args:" in docstring
assert "Example:" in docstring or "Examples:" in docstring

def test_sequence_wrapper_datapipe(self):
seq = list(range(10))
input_dp = dp.map.SequenceWrapper(seq)
Expand Down
18 changes: 15 additions & 3 deletions torch/utils/data/datapipes/datapipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,9 @@ def __getattr__(self, attribute_name):
if attribute_name in _iter_deprecated_functional_names:
kwargs = _iter_deprecated_functional_names[attribute_name]
_deprecation_warning(**kwargs)
function = functools.partial(IterDataPipe.functions[attribute_name], self)
f = IterDataPipe.functions[attribute_name]
function = functools.partial(f, self)
functools.update_wrapper(wrapper=function, wrapped=f, assigned=("__doc__",))
return function
else:
raise AttributeError("'{0}' object has no attribute '{1}".format(self.__class__.__name__, attribute_name))
Expand All @@ -144,7 +146,12 @@ def class_function(cls, enable_df_api_tracing, source_dp, *args, **kwargs):

return result_pipe

function = functools.partial(class_function, cls_to_register, enable_df_api_tracing)
function = functools.partial(
class_function, cls_to_register, enable_df_api_tracing
)
functools.update_wrapper(
wrapper=function, wrapped=cls_to_register, assigned=("__doc__",)
)
cls.functions[function_name] = function

def __getstate__(self):
Expand Down Expand Up @@ -253,7 +260,9 @@ def __getattr__(self, attribute_name):
if attribute_name in _map_deprecated_functional_names:
kwargs = _map_deprecated_functional_names[attribute_name]
_deprecation_warning(**kwargs)
function = functools.partial(MapDataPipe.functions[attribute_name], self)
f = MapDataPipe.functions[attribute_name]
function = functools.partial(f, self)
functools.update_wrapper(wrapper=function, wrapped=f, assigned=("__doc__",))
return function
else:
raise AttributeError("'{0}' object has no attribute '{1}".format(self.__class__.__name__, attribute_name))
Expand All @@ -272,6 +281,9 @@ def class_function(cls, source_dp, *args, **kwargs):
return result_pipe

function = functools.partial(class_function, cls_to_register)
functools.update_wrapper(
wrapper=function, wrapped=cls_to_register, assigned=("__doc__",)
)
cls.functions[function_name] = function

def __getstate__(self):
Expand Down