New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataPipe] Update docstring for functional form of DataPipes #100446
Conversation
|
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100446
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0c6a6c8: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changes look good. Is there a way for us to test this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried to see if the inline doc shows up in PyCharm with this change:
dp = IterableWrapper(range(10)).shuffle()
And it doesn't seem to. Am I missing something? Perhaps changes to datapipe.fyi
is necessary? I believe it is currently rendering whatever is in datapipe.fyi
.
Not sure if this is related, but it is worth noting this works even before this change:
dp = IterableWrapper(range(10)).shuffle()
print(dp.__doc__) # This prints the right string even before this change
Let me know if none of the above is relevant to this PR.
How do I regenerate the |
Re-installing PyTorch locally ( Edit: in the process of looking into this, I think I found a relatively easy way to add this to |
Ok, let me try that. It's taking a bit of time to initialize the third party submodules. |
Hmm, I'm wondering if some change is needed on the torchdata repo too as mentioned in #100503 (comment)? I've regenerated the Or, it could also be that I'm missing some step in the compiling/install process. Tried using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiji14 I have only checked the doc string that was shown in an IDE (which becomes correct with #100503), I didn't try
help(dp)
.Does this PR allows
help(dp.batch)
to render correctly (i.e. equivalent tohelp(Batcher)
? If that is the case, we can land both PRs.
Yes, I think this PR makes help(dp.batch)
equivalent to help(Batcher)
. Have added a unit test in 452f254 to check for this (though it uses pydoc.render
following https://stackoverflow.com/questions/7123660/is-there-an-option-to-print-the-output-of-help). From what I can tell, the .pyi
stub file updates at #100503 is only for the type hints according to https://www.jetbrains.com/help/pycharm/stubs.html?
"map", | ||
"mux", | ||
"read_from_stream", | ||
# "sampler", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that SamplerIterDataPipe
is missing the functional form at
pytorch/torch/utils/data/datapipes/iter/combinatorics.py
Lines 17 to 25 in 8f6951c
class SamplerIterDataPipe(IterDataPipe[T_co]): | |
r""" | |
Generates sample elements using the provided ``Sampler`` (defaults to :class:`SequentialSampler`). | |
Args: | |
datapipe: IterDataPipe to sample from | |
sampler: Sampler class to generate sample elements from input DataPipe. | |
Default is :class:`SequentialSampler` for IterDataPipe | |
""" |
Should add the @functional_datapipe('sample')
decorator in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we left it out for a reason, we can add the functional form for sampler
later if there is demand for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding this!
"map", | ||
"mux", | ||
"read_from_stream", | ||
# "sampler", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we left it out for a reason, we can add the functional form for sampler
later if there is demand for me
@pytorchbot merge -r |
@pytorchbot successfully started a rebase job. Check the current status here |
Successfully rebased |
c6ecebf
to
e51b19b
Compare
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiji14 Can you have a look at the failing CI tests? They seem relevant
"zip", | ||
]: | ||
docstring = pydoc.render_doc(thing=getattr(input_dp, dp_funcname)) | ||
assert f"(functional name: ``{dp_funcname}``)" in docstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiji14 Can you have a look at the failing CI tests? They seem relevant
Yes I see the error, though it seems to be only for certain combinations of the build matrix which is weird. Traceback at https://github.com/pytorch/pytorch/actions/runs/4929875273/jobs/8810511389#step:13:628:
==================================== RERUNS ====================================
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
File "test_datapipe.py", line 860, in test_docstring
assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
File "test_datapipe.py", line 860, in test_docstring
assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
=================================== FAILURES ===================================
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
File "test_datapipe.py", line 860, in test_docstring
assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
- generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_datapipe/test_datapipe-17e8ddb9c33a3903.xml -
=========================== short test summary info ============================
FAILED [0.0028s] test_datapipe.py::TestFunctionalIterDataPipe::test_docstring - AssertionError
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
=============== 1 failed, 22 passed, 7 skipped, 2 rerun in 0.45s ===============
Got exit code 1, retrying (retries left=2)
Test results will be stored in test-reports/python-pytest/test_datapipe/test_datapipe-0b2a1e96f86b787b.xml
Logs aren't very helpful, let me try and track down the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging in the changes from #100503 to see if it helps. I can't seem to reproduce this locally on my setup, and not quite sure what the shard/num_shards config means in the linux-bionic-py3_8-clang9-build
CI build matrix at https://github.com/pytorch/pytorch/blob/daed3bf8f9d10367ae3a34d8ea6ca2b594f9afe2/.github/workflows/pull.yml#L124C1-L139
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have tried using pydoc.render(..., forceload=True)
at c21eaa5, but somehow the docstring still isn't updated on some build matrix combinations. Getting an output like so from https://github.com/pytorch/pytorch/actions/runs/4955538159/jobs/8877794743?pr=100446#step:14:1087:
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
File "test_datapipe.py", line 863, in test_docstring
assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_datapipe.py", line 868, in test_docstring
raise ValueError(dp_funcname, "IterDataPipe docstring incorrect")
ValueError: ('batch', 'IterDataPipe docstring incorrect')
----------------------------- Captured stdout call -----------------------------
***Begin docstring for batch
Python Library Documentation: partial in module torch.utils.data.datapipes.iter.grouping object
class p�pa�ar�rt�ti�ia�al�l(builtins.object)
| partial(func, *args, **keywords) - new function with partial application
| of the given arguments and keywords.
|
| Methods defined here:
|
| _�__�_c�ca�al�ll�l_�__�_(self, /, *args, **kwargs)
| Call self as a function.
|
| _�__�_d�de�el�la�at�tt�tr�r_�__�_(self, name, /)
| Implement delattr(self, name).
|
| _�__�_g�ge�et�ta�at�tt�tr�ri�ib�bu�ut�te�e_�__�_(self, name, /)
| Return getattr(self, name).
|
| _�__�_r�re�ed�du�uc�ce�e_�__�_(...)
| Helper for pickle.
|
| _�__�_r�re�ep�pr�r_�__�_(self, /)
| Return repr(self).
|
| _�__�_s�se�et�ta�at�tt�tr�r_�__�_(self, name, value, /)
| Implement setattr(self, name, value).
|
| _�__�_s�se�et�ts�st�ta�at�te�e_�__�_(...)
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| _�__�_n�ne�ew�w_�__�_(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| _�__�_d�di�ic�ct�t_�__�_
|
| a�ar�rg�gs�s
| tuple of arguments to future partial calls
|
| f�fu�un�nc�c
| function object to use in future partial calls
|
| k�ke�ey�yw�wo�or�rd�ds�s
| dictionary of keyword arguments to future partial calls
***End docstring for batch
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
File "test_datapipe.py", line 863, in test_docstring
assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_datapipe.py", line 868, in test_docstring
raise ValueError(dp_funcname, "IterDataPipe docstring incorrect")
ValueError: ('batch', 'IterDataPipe docstring incorrect')
Not sure why the docstring output is repeated in some parts 😕 I could xfail those tests for now and investigate them later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiji14 It looks like it is failing on Python 3.8, can you try to run it in a local Linux environment with 3.8? I think that should help you reproduce the issue.
Thanks for the tip! Apparently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
@pytorchbot merge -r |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Copy the docstring from IterDataPipe classes to their functional form. Xref pytorch/data#792.
Ensure that the functional form of IterDataPipe and MapDataPipe contains the proper docstring copied over from the class form.
UnZipperMapDataPipe is only in https://github.com/pytorch/data/blob/v0.6.0/torchdata/datapipes/map/util/unzipper.py#L16-L17 and not pytorch/pytorch repo.
Force reload the module from disk to see if it helps with the test.
Same as 52be9e1 but for MapDataPipe
Successfully rebased |
ee5272e
to
0c6a6c8
Compare
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Copy the docstring from AsyncIterDataPipe classes to their functional form. Xref pytorch/data#792 and pytorch/pytorch#100446. Added a unit test to ensure the docstring is copied correctly.
* ✨ Implement functional_datapipe decorator to inject new methods Let AsyncIterDataPipe classes have a functional form! The AsyncIterDataPipe class now has a private _functions dictionary to store functions injected from the functional_datapipe decorator. Tested this on MapperAsyncIterDataPipe which now has the `.map()` functional form. Included a doctest, added a new section in the API docs under 'Helper Functions'. Also added more type hints to aiter.py. * 🧑💻 Update docstring for functional form of AsyncIterDataPipe Copy the docstring from AsyncIterDataPipe classes to their functional form. Xref pytorch/data#792 and pytorch/pytorch#100446. Added a unit test to ensure the docstring is copied correctly. * ✅ Test accessing valid or invalid methods on DataPipe Checking that the __getattr__ method of an AsyncIterDataPipe works to access functions added by the functional_datapipe decorator, and raises an AttributeError when the function doesn't exist. Also documented how the functional_datapipe injects these methods into the private _functins variable to make things work. * 📝 Minor docstring updates to the decorator class and test Adding code wraps and intersphinx links, minor rewording.
Copy the docstring from IterDataPipe and MapDataPipe classes to their functional form. Done using
functools.update_wrapper
, xref https://stackoverflow.com/questions/6394511/python-functools-wraps-equivalent-for-classes.See also parallel change to
.pyi
stub files at #100503Fixes pytorch/data#792 and weiji14/zen3geo#69.