Skip to content

[DataLoader] Switch to guaranteed determinism & add option to non_deterministic #53532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Mar 8, 2021

Stack from ghstack:

Add a feature for the decorator non_deterministic with an argument of deterministic function to determine if the instance of this class is deterministic or not.

  1. Non-argument decorator
@non_deterministic
class ABC(IterDataPipe):
    def __init__(self, datapipe, ...):
        ...
  1. With function argument
    e.g. GreedyJoin is deterministic when there is only one datapipe as input. But, it's non-deterministic with multiple datapipes
@non_deterministic(lambda datapipes: len(datapipes) > 1)
class GreedyJoinIterDataPipe(IterDataPipe):
    def __init__(self, datapipes, ...):
        ...

Test is updated here

Differential Revision: D26888825

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 8, 2021

💊 CI failures summary and remediations

As of commit 01b6133 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ejguan added a commit that referenced this pull request Mar 8, 2021
…erministic

ghstack-source-id: 4378080
Pull Request resolved: #53532
@ejguan ejguan requested a review from VitalyFedyunin March 8, 2021 19:03
… to non_deterministic"


Add a feature for the decorator `non_deterministic` with an argument of deterministic function to determine if the instance of this class is deterministic or not.
1. Non-argument decorator
```py
@non_deterministic
class ABC(IterDataPipe):
    def __init__(self, datapipe, ...):
        ...
```
2. With function argument
e.g. GreedyJoin is deterministic when there is only one datapipe as input. But, it's non-deterministic with multiple datapipes
```py
@non_deterministic(lambda datapipes: len(datapipes) > 1)
class GreedyJoinIterDataPipe(IterDataPipe):
    def __init__(self, datapipes, ...):
        ...
```
Test is updated [here](https://github.com/facebookexternal/torchdata/pull/15)

Differential Revision: [D26888825](https://our.internmc.facebook.com/intern/diff/D26888825)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Mar 9, 2021
…erministic

ghstack-source-id: 1928c1a
Pull Request resolved: #53532
@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #53532 (01b6133) into gh/ejguan/42/base (bb21aea) will decrease coverage by 0.00%.
The diff coverage is 29.41%.

@@                  Coverage Diff                  @@
##           gh/ejguan/42/base   #53532      +/-   ##
=====================================================
- Coverage              77.34%   77.34%   -0.01%     
=====================================================
  Files                   1887     1887              
  Lines                 184826   184845      +19     
=====================================================
+ Hits                  142958   142962       +4     
- Misses                 41868    41883      +15     

… to non_deterministic"


Add a feature for the decorator `non_deterministic` with an argument of deterministic function to determine if the instance of this class is deterministic or not.
1. Non-argument decorator
```py
@non_deterministic
class ABC(IterDataPipe):
    def __init__(self, datapipe, ...):
        ...
```
2. With function argument
e.g. GreedyJoin is deterministic when there is only one datapipe as input. But, it's non-deterministic with multiple datapipes
```py
@non_deterministic(lambda datapipes: len(datapipes) > 1)
class GreedyJoinIterDataPipe(IterDataPipe):
    def __init__(self, datapipes, ...):
        ...
```
Test is updated [here](https://github.com/facebookexternal/torchdata/pull/15)

Differential Revision: [D26888825](https://our.internmc.facebook.com/intern/diff/D26888825)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Mar 15, 2021
…erministic

ghstack-source-id: 56daffd
Pull Request resolved: #53532
@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in e87ab2a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants