Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataPipe] Update docstring for functional form of DataPipes #100446

Closed
wants to merge 11 commits into from

Conversation

weiji14
Copy link
Contributor

@weiji14 weiji14 commented May 2, 2023

Copy the docstring from IterDataPipe and MapDataPipe classes to their functional form. Done using functools.update_wrapper, xref https://stackoverflow.com/questions/6394511/python-functools-wraps-equivalent-for-classes.

See also parallel change to .pyi stub files at #100503

Fixes pytorch/data#792 and weiji14/zen3geo#69.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 2, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: weiji14 / name: Wei Ji (603172a)

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label May 2, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented May 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100446

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0c6a6c8:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@weiji14 weiji14 marked this pull request as ready for review May 2, 2023 03:19
@weiji14 weiji14 requested review from NivekT and ejguan as code owners May 2, 2023 03:19
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changes look good. Is there a way for us to test this change?

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to see if the inline doc shows up in PyCharm with this change:

dp = IterableWrapper(range(10)).shuffle()

And it doesn't seem to. Am I missing something? Perhaps changes to datapipe.fyi is necessary? I believe it is currently rendering whatever is in datapipe.fyi.

Screenshot 2023-05-02 at 2 24 49 PM

Not sure if this is related, but it is worth noting this works even before this change:

dp = IterableWrapper(range(10)).shuffle()
print(dp.__doc__)  # This prints the right string even before this change

Let me know if none of the above is relevant to this PR.

@weiji14
Copy link
Contributor Author

weiji14 commented May 2, 2023

And it doesn't seem to. Am I missing something? Perhaps changes to datapipe.fyi is necessary? I believe it is currently rendering whatever is in datapipe.fyi.

How do I regenerate the torch/utils/data/datapipes/datapipe.pyi.in file? To be honest, I haven't quite figured out how to check this locally yet, it's my first time working on the pytorch codebase 😅

@NivekT
Copy link
Contributor

NivekT commented May 2, 2023

Re-installing PyTorch locally (python setup.py develop) will regenerates that file. Note that file is generated from datapipe.pyi.in and gen_pyi.py in torch.utils.data.datapipes.

Edit: in the process of looking into this, I think I found a relatively easy way to add this to datapipe.pyi. I will open a separate PR.

@weiji14
Copy link
Contributor Author

weiji14 commented May 2, 2023

Re-installing PyTorch locally (python setup.py develop) will regenerates that file. Note that file is generated from datapipe.pyi.in and gen_pyi.py in torch.utils.data.datapipes.

Ok, let me try that. It's taking a bit of time to initialize the third party submodules.

@weiji14 weiji14 changed the title Update docstring for functional form of IterDataPipes [DataPipe] Update docstring for functional form of IterDataPipes May 2, 2023
@weiji14
Copy link
Contributor Author

weiji14 commented May 2, 2023

Hmm, I'm wondering if some change is needed on the torchdata repo too as mentioned in #100503 (comment)? I've regenerated the torch/utils/data/datapipes/datapipe.pyi file using the patched torch/utils/data/datapipes/gen_pyi.py script in #100503 which matches https://gist.github.com/NivekT/95095f14da85a837a0727a19a5ba367c, but it seems like help(dp.batch) still returns the partial docstring?

Or, it could also be that I'm missing some step in the compiling/install process. Tried using python setup.py install and confirmed that torch.__version__ is something like 2.1.0a0+gitd878904, but still the docstring is the old one.

@NivekT
Copy link
Contributor

NivekT commented May 3, 2023

@weiji14 I have only checked the doc string that was shown in an IDE (which becomes correct with #100503), I didn't try help(dp).

Does this PR allows help(dp.batch) to render correctly (i.e. equivalent to help(Batcher)? If that is the case, we can land both PRs.

Copy link
Contributor Author

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14 I have only checked the doc string that was shown in an IDE (which becomes correct with #100503), I didn't try help(dp).

Does this PR allows help(dp.batch) to render correctly (i.e. equivalent to help(Batcher)? If that is the case, we can land both PRs.

Yes, I think this PR makes help(dp.batch) equivalent to help(Batcher). Have added a unit test in 452f254 to check for this (though it uses pydoc.render following https://stackoverflow.com/questions/7123660/is-there-an-option-to-print-the-output-of-help). From what I can tell, the .pyi stub file updates at #100503 is only for the type hints according to https://www.jetbrains.com/help/pycharm/stubs.html?

"map",
"mux",
"read_from_stream",
# "sampler",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that SamplerIterDataPipe is missing the functional form at

class SamplerIterDataPipe(IterDataPipe[T_co]):
r"""
Generates sample elements using the provided ``Sampler`` (defaults to :class:`SequentialSampler`).
Args:
datapipe: IterDataPipe to sample from
sampler: Sampler class to generate sample elements from input DataPipe.
Default is :class:`SequentialSampler` for IterDataPipe
"""

Should add the @functional_datapipe('sample') decorator in a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we left it out for a reason, we can add the functional form for sampler later if there is demand for me

@weiji14 weiji14 changed the title [DataPipe] Update docstring for functional form of IterDataPipes [DataPipe] Update docstring for functional form of DataPipes May 3, 2023
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this!

"map",
"mux",
"read_from_stream",
# "sampler",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we left it out for a reason, we can add the functional form for sampler later if there is demand for me

@NivekT
Copy link
Contributor

NivekT commented May 9, 2023

@pytorchbot merge -r

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 9, 2023
@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased datapipe/functional_docstring onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout datapipe/functional_docstring && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14 Can you have a look at the failing CI tests? They seem relevant

@NivekT NivekT added the topic: improvements topic category label May 9, 2023
"zip",
]:
docstring = pydoc.render_doc(thing=getattr(input_dp, dp_funcname))
assert f"(functional name: ``{dp_funcname}``)" in docstring
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14 Can you have a look at the failing CI tests? They seem relevant

Yes I see the error, though it seems to be only for certain combinations of the build matrix which is weird. Traceback at https://github.com/pytorch/pytorch/actions/runs/4929875273/jobs/8810511389#step:13:628:

==================================== RERUNS ====================================
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
  File "test_datapipe.py", line 860, in test_docstring
    assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
  File "test_datapipe.py", line 860, in test_docstring
    assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
=================================== FAILURES ===================================
__________________ TestFunctionalIterDataPipe.test_docstring ___________________
Traceback (most recent call last):
  File "test_datapipe.py", line 860, in test_docstring
    assert f"(functional name: ``{dp_funcname}``)" in docstring
AssertionError
- generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_datapipe/test_datapipe-17e8ddb9c33a3903.xml -
=========================== short test summary info ============================
FAILED [0.0028s] test_datapipe.py::TestFunctionalIterDataPipe::test_docstring - AssertionError
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
=============== 1 failed, 22 passed, 7 skipped, 2 rerun in 0.45s ===============
Got exit code 1, retrying (retries left=2)
Test results will be stored in test-reports/python-pytest/test_datapipe/test_datapipe-0b2a1e96f86b787b.xml

Logs aren't very helpful, let me try and track down the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging in the changes from #100503 to see if it helps. I can't seem to reproduce this locally on my setup, and not quite sure what the shard/num_shards config means in the linux-bionic-py3_8-clang9-build CI build matrix at https://github.com/pytorch/pytorch/blob/daed3bf8f9d10367ae3a34d8ea6ca2b594f9afe2/.github/workflows/pull.yml#L124C1-L139

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried using pydoc.render(..., forceload=True) at c21eaa5, but somehow the docstring still isn't updated on some build matrix combinations. Getting an output like so from https://github.com/pytorch/pytorch/actions/runs/4955538159/jobs/8877794743?pr=100446#step:14:1087:

  __________________ TestFunctionalIterDataPipe.test_docstring ___________________
  Traceback (most recent call last):
    File "test_datapipe.py", line 863, in test_docstring
      assert f"(functional name: ``{dp_funcname}``)" in docstring
  AssertionError
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "test_datapipe.py", line 868, in test_docstring
      raise ValueError(dp_funcname, "IterDataPipe docstring incorrect")
  ValueError: ('batch', 'IterDataPipe docstring incorrect')
  ----------------------------- Captured stdout call -----------------------------
  ***Begin docstring for batch
  Python Library Documentation: partial in module torch.utils.data.datapipes.iter.grouping object
  
  class p�pa�ar�rt�ti�ia�al�l(builtins.object)
   |  partial(func, *args, **keywords) - new function with partial application
   |  of the given arguments and keywords.
   |
   |  Methods defined here:
   |
   |  _�__�_c�ca�al�ll�l_�__�_(self, /, *args, **kwargs)
   |      Call self as a function.
   |
   |  _�__�_d�de�el�la�at�tt�tr�r_�__�_(self, name, /)
   |      Implement delattr(self, name).
   |
   |  _�__�_g�ge�et�ta�at�tt�tr�ri�ib�bu�ut�te�e_�__�_(self, name, /)
   |      Return getattr(self, name).
   |
   |  _�__�_r�re�ed�du�uc�ce�e_�__�_(...)
   |      Helper for pickle.
   |
   |  _�__�_r�re�ep�pr�r_�__�_(self, /)
   |      Return repr(self).
   |
   |  _�__�_s�se�et�ta�at�tt�tr�r_�__�_(self, name, value, /)
   |      Implement setattr(self, name, value).
   |
   |  _�__�_s�se�et�ts�st�ta�at�te�e_�__�_(...)
   |
   |  ----------------------------------------------------------------------
   |  Static methods defined here:
   |
   |  _�__�_n�ne�ew�w_�__�_(*args, **kwargs) from builtins.type
   |      Create and return a new object.  See help(type) for accurate signature.
   |
   |  ----------------------------------------------------------------------
   |  Data descriptors defined here:
   |
   |  _�__�_d�di�ic�ct�t_�__�_
   |
   |  a�ar�rg�gs�s
   |      tuple of arguments to future partial calls
   |
   |  f�fu�un�nc�c
   |      function object to use in future partial calls
   |
   |  k�ke�ey�yw�wo�or�rd�ds�s
   |      dictionary of keyword arguments to future partial calls
  
  ***End docstring for batch
  __________________ TestFunctionalIterDataPipe.test_docstring ___________________
  Traceback (most recent call last):
    File "test_datapipe.py", line 863, in test_docstring
      assert f"(functional name: ``{dp_funcname}``)" in docstring
  AssertionError
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "test_datapipe.py", line 868, in test_docstring
      raise ValueError(dp_funcname, "IterDataPipe docstring incorrect")
  ValueError: ('batch', 'IterDataPipe docstring incorrect')

Not sure why the docstring output is repeated in some parts 😕 I could xfail those tests for now and investigate them later?

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 10, 2023
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14 It looks like it is failing on Python 3.8, can you try to run it in a local Linux environment with 3.8? I think that should help you reproduce the issue.

@weiji14
Copy link
Contributor Author

weiji14 commented May 18, 2023

@weiji14 It looks like it is failing on Python 3.8, can you try to run it in a local Linux environment with 3.8? I think that should help you reproduce the issue.

Thanks for the tip! Apparently pydoc.render_doc works differently on Python 3.8 and Python 3.9+, see https://docs.python.org/3/whatsnew/3.9.html#pydoc and python/cpython#84438. The output of help(dp.batch) is still the same though (just that help() renders things to stdout). I've made a commit at 52be9e1 that should make the unit test work for both Python 3.8 and Python 3.9+ 🤞

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@NivekT
Copy link
Contributor

NivekT commented May 18, 2023

@pytorchbot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased datapipe/functional_docstring onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout datapipe/functional_docstring && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@weiji14 weiji14 deleted the datapipe/functional_docstring branch May 18, 2023 20:30
weiji14 added a commit to weiji14/bambooflow that referenced this pull request Aug 6, 2023
Copy the docstring from AsyncIterDataPipe classes to their functional form. Xref pytorch/data#792 and pytorch/pytorch#100446. Added a unit test to ensure the docstring is copied correctly.
weiji14 added a commit to weiji14/bambooflow that referenced this pull request Aug 6, 2023
* ✨ Implement functional_datapipe decorator to inject new methods

Let AsyncIterDataPipe classes have a functional form! The AsyncIterDataPipe class now has a private _functions dictionary to store functions injected from the functional_datapipe decorator. Tested this on MapperAsyncIterDataPipe which now has the `.map()` functional form. Included a doctest, added a new section in the API docs under 'Helper Functions'. Also added more type hints to aiter.py.

* 🧑‍💻 Update docstring for functional form of AsyncIterDataPipe

Copy the docstring from AsyncIterDataPipe classes to their functional form. Xref pytorch/data#792 and pytorch/pytorch#100446. Added a unit test to ensure the docstring is copied correctly.

* ✅ Test accessing valid or invalid methods on DataPipe

Checking that the __getattr__ method of an AsyncIterDataPipe works to access functions added by the functional_datapipe decorator, and raises an AttributeError when the function doesn't exist. Also documented how the functional_datapipe injects these methods into the private _functins variable to make things work.

* 📝 Minor docstring updates to the decorator class and test

Adding code wraps and intersphinx links, minor rewording.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: dataloader release notes category topic: improvements topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing docstring of IterDataPipes in functional form
6 participants