Skip to content

Conversation

@000Justin000
Copy link

Summary: This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage.

Differential Revision: D39145980

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Aug 30, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 30, 2022

🔗 Helpful links

❌ 2 New Failures

As of commit fa85b156cd (more details on the Dr. CI page):

Expand to see more
  • 2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-bionic-cuda11.6-py3.10-gcc7-bazel-test / build-and-test (1/1)

Step: "Build" (full log | diagnosis details)

2022-09-06T22:56:07.1668020Z ##[error]Process completed with exit code 2.
2022-09-06T22:56:07.1592145Z Non-cacheable calls                   0
2022-09-06T22:56:07.1592505Z Non-compilation calls                 1
2022-09-06T22:56:07.1592810Z Unsupported compiler calls            0
2022-09-06T22:56:07.1593104Z Average cache write               0.000 s
2022-09-06T22:56:07.1593378Z Average cache read miss           0.000 s
2022-09-06T22:56:07.1593666Z Average cache read hit            0.000 s
2022-09-06T22:56:07.1593987Z Failed distributed compilations       0
2022-09-06T22:56:07.1594754Z Cache location                  S3, bucket: Bucket(name=ossci-compiler-cache-circleci-v2, base_url=http://ossci-compiler-cache-circleci-v2.s3.amazonaws.com/)
2022-09-06T22:56:07.1618381Z + echo ::endgroup::
2022-09-06T22:56:07.1619250Z ##[endgroup]
2022-09-06T22:56:07.1668020Z ##[error]Process completed with exit code 2.
2022-09-06T22:56:07.1705113Z Prepare all required actions
2022-09-06T22:56:07.1730989Z ##[group]Run ./.github/actions/chown-workspace
2022-09-06T22:56:07.1731343Z env:
2022-09-06T22:56:07.1731669Z   GIT_DEFAULT_BRANCH: master
2022-09-06T22:56:07.1731999Z ##[endgroup]
2022-09-06T22:56:07.1751289Z ##[group]Run docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
2022-09-06T22:56:07.1751844Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-09-06T22:56:07.1767248Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2022-09-06T22:56:07.1767580Z env:
2022-09-06T22:56:07.1767840Z   GIT_DEFAULT_BRANCH: master

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step
GitHub Actions Lint / Test tools Unknown

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39145980

@VitalyFedyunin
Copy link
Contributor

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again.

Details for Dev Infra team Raised by workflow job

@000Justin000
Copy link
Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again.

Details for Dev Infra team Raised by workflow job

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39145980

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39145980

@VitalyFedyunin
Copy link
Contributor

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@000Justin000
Copy link
Author

@pytorchbot merge -g

@VitalyFedyunin
Copy link
Contributor

@pytorchbot test

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 7, 2022

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'test' (choose from 'merge', 'revert', 'rebase', 'label')

usage: @pytorchbot [-h] {merge,revert,rebase,label} ...

Try @pytorchbot --help for more info.

@VitalyFedyunin
Copy link
Contributor

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased export-D39145980 onto refs/remotes/origin/master, please pull locally before adding more changes (for example, via git checkout export-D39145980 && git pull --rebase)

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 7, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84301

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 9 Pending

As of commit da3fcaa:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary:
Pull Request resolved: pytorch#84301

This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage.

Reviewed By: VitalyFedyunin

Differential Revision: D39145980

fbshipit-source-id: b63e0de28bc9bf9a659fc4619eba43e81ea20f69
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39145980

@VitalyFedyunin
Copy link
Contributor

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the green (-g) flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2022

Hey @000Justin000.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request Sep 9, 2022
Summary:
This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage.

Pull Request resolved: #84301
Approved by: https://github.com/VitalyFedyunin

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/335033f7182bf421d203d5eeaad598fa1102933f

Original Phabricator Test Plan:

Reviewed By: VitalyFedyunin

Differential Revision: D39145980

Pulled By: 000Justin000

fbshipit-source-id: f148b0337faa156314487e71e465cf80737b570e
pytorchmergebot pushed a commit that referenced this pull request Sep 15, 2022
The [fastNLP](https://github.com/fastnlp/fastNLP/blob/v0.6.0/fastNLP/core/batch.py#L51) model uses DataSetGetter to fetch data from the dataset. The following code breaks because of #84301:

```
from fastNLP.io.pipe.qa import CMRC2018BertPipe
input_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".data", "cmrc2018-sim")
data_bundle = CMRC2018BertPipe().process_from_file(paths=input_dir)
data_bundle.rename_field('chars', 'words')
data_bundle.get_dataset('dev')
dataset = DataSetGetter(dataset, as_numpy)
dataiter = torch.utils.data.DataLoader(dataset=dataset)
for batch in dataiter:
    # data-processing...
```

This is because for the `DataSetGetter` class, the following condition holds:
```
# hasattr(dataset_getter, '__getitems__') == True
# dataset_getter.__getitems__ == None
```

This PR adds an additional check to make sure `__getitems__` is only called when it is not None.

This error was found by the torchbench nightly CI, original error stack trace:
```
ERROR: test_fastNLP_Bert_train_cuda (__main__.TestBenchmark)
----------------------------------------------------------------------
components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last):
  File "/home/circleci/project/components/_impl/workers/subprocess_rpc.py", line 470, in _run_block
    exec(  # noqa: P204
  File "<subprocess-worker>", line 35, in <module>
  File "<subprocess-worker>", line 12, in _run_in_worker_f
  File "/home/circleci/project/torchbenchmark/util/model.py", line 16, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 93, in __init__
    self.example_inputs = self._prefetch(example_inputs)
  File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 133, in _prefetch
    for batch_x, batch_y in example_inputs:
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/fastNLP/core/batch.py", line 266, in __iter__
    for indices, batch_x, batch_y in self.dataiter:
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch
    data = self.dataset.__getitems__(possibly_batched_index)
TypeError: 'NoneType' object is not callable
```

Full error log: https://app.circleci.com/pipelines/github/pytorch/benchmark/5143/workflows/0676f36d-0ab4-42bd-adb4-90e6b0df76d1/jobs/5293
Pull Request resolved: #85099
Approved by: https://github.com/ejguan
mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
The [fastNLP](https://github.com/fastnlp/fastNLP/blob/v0.6.0/fastNLP/core/batch.py#L51) model uses DataSetGetter to fetch data from the dataset. The following code breaks because of #84301:

```
from fastNLP.io.pipe.qa import CMRC2018BertPipe
input_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".data", "cmrc2018-sim")
data_bundle = CMRC2018BertPipe().process_from_file(paths=input_dir)
data_bundle.rename_field('chars', 'words')
data_bundle.get_dataset('dev')
dataset = DataSetGetter(dataset, as_numpy)
dataiter = torch.utils.data.DataLoader(dataset=dataset)
for batch in dataiter:
    # data-processing...
```

This is because for the `DataSetGetter` class, the following condition holds:
```
# hasattr(dataset_getter, '__getitems__') == True
# dataset_getter.__getitems__ == None
```

This PR adds an additional check to make sure `__getitems__` is only called when it is not None.

This error was found by the torchbench nightly CI, original error stack trace:
```
ERROR: test_fastNLP_Bert_train_cuda (__main__.TestBenchmark)
----------------------------------------------------------------------
components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last):
  File "/home/circleci/project/components/_impl/workers/subprocess_rpc.py", line 470, in _run_block
    exec(  # noqa: P204
  File "<subprocess-worker>", line 35, in <module>
  File "<subprocess-worker>", line 12, in _run_in_worker_f
  File "/home/circleci/project/torchbenchmark/util/model.py", line 16, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 93, in __init__
    self.example_inputs = self._prefetch(example_inputs)
  File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 133, in _prefetch
    for batch_x, batch_y in example_inputs:
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/fastNLP/core/batch.py", line 266, in __iter__
    for indices, batch_x, batch_y in self.dataiter:
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch
    data = self.dataset.__getitems__(possibly_batched_index)
TypeError: 'NoneType' object is not callable
```

Full error log: https://app.circleci.com/pipelines/github/pytorch/benchmark/5143/workflows/0676f36d-0ab4-42bd-adb4-90e6b0df76d1/jobs/5293
Pull Request resolved: #85099
Approved by: https://github.com/ejguan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants