- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.7k
asyncio increase throughput (pytorch change) #84301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 🔗 Helpful links
 ❌ 2 New FailuresAs of commit fa85b156cd (more details on the Dr. CI page): Expand to see more
 🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages
 | 
| This pull request was exported from Phabricator. Differential Revision: D39145980 | 
| @pytorchbot merge -g | 
| @pytorchbot successfully started a merge job. Check the current status here. | 
| Merge failedReason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. Details for Dev Infra teamRaised by workflow job | 
| @pytorchbot merge -g | 
| @pytorchbot successfully started a merge job. Check the current status here. | 
| Merge failedReason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. Details for Dev Infra teamRaised by workflow job | 
4d2e1f6    to
    ce8b343      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D39145980 | 
| This pull request was exported from Phabricator. Differential Revision: D39145980 | 
ce8b343    to
    fa85b15      
    Compare
  
    | @pytorchbot merge -g | 
| @pytorchbot successfully started a merge job. Check the current status here. | 
| Merge failedReason: The following mandatory check(s) failed (Rule  Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job | 
| @pytorchbot merge -g | 
| @pytorchbot test | 
| ❌ 🤖 pytorchbot command failed: Try  | 
| @pytorchbot rebase | 
| @pytorchbot successfully started a merge job. Check the current status here. | 
| Merge failedReason: The following mandatory check(s) failed (Rule  Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job | 
| @pytorchbot successfully started a rebase job. Check the current status here | 
| Successfully rebased  | 
fa85b15    to
    6282886      
    Compare
  
    | 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84301
 Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 9 PendingAs of commit da3fcaa: This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
Summary: Pull Request resolved: pytorch#84301 This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage. Reviewed By: VitalyFedyunin Differential Revision: D39145980 fbshipit-source-id: b63e0de28bc9bf9a659fc4619eba43e81ea20f69
6282886    to
    da3fcaa      
    Compare
  
    | This pull request was exported from Phabricator. Differential Revision: D39145980 | 
| @pytorchbot merge -g | 
| @pytorchbot successfully started a merge job. Check the current status here. | 
| Hey @000Justin000. | 
Summary: This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage. Pull Request resolved: #84301 Approved by: https://github.com/VitalyFedyunin Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/335033f7182bf421d203d5eeaad598fa1102933f Original Phabricator Test Plan: Reviewed By: VitalyFedyunin Differential Revision: D39145980 Pulled By: 000Justin000 fbshipit-source-id: f148b0337faa156314487e71e465cf80737b570e
The [fastNLP](https://github.com/fastnlp/fastNLP/blob/v0.6.0/fastNLP/core/batch.py#L51) model uses DataSetGetter to fetch data from the dataset. The following code breaks because of #84301: ``` from fastNLP.io.pipe.qa import CMRC2018BertPipe input_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".data", "cmrc2018-sim") data_bundle = CMRC2018BertPipe().process_from_file(paths=input_dir) data_bundle.rename_field('chars', 'words') data_bundle.get_dataset('dev') dataset = DataSetGetter(dataset, as_numpy) dataiter = torch.utils.data.DataLoader(dataset=dataset) for batch in dataiter: # data-processing... ``` This is because for the `DataSetGetter` class, the following condition holds: ``` # hasattr(dataset_getter, '__getitems__') == True # dataset_getter.__getitems__ == None ``` This PR adds an additional check to make sure `__getitems__` is only called when it is not None. This error was found by the torchbench nightly CI, original error stack trace: ``` ERROR: test_fastNLP_Bert_train_cuda (__main__.TestBenchmark) ---------------------------------------------------------------------- components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last): File "/home/circleci/project/components/_impl/workers/subprocess_rpc.py", line 470, in _run_block exec( # noqa: P204 File "<subprocess-worker>", line 35, in <module> File "<subprocess-worker>", line 12, in _run_in_worker_f File "/home/circleci/project/torchbenchmark/util/model.py", line 16, in __call__ obj = type.__call__(cls, *args, **kwargs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 93, in __init__ self.example_inputs = self._prefetch(example_inputs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 133, in _prefetch for batch_x, batch_y in example_inputs: File "/home/circleci/miniconda3/lib/python3.8/site-packages/fastNLP/core/batch.py", line 266, in __iter__ for indices, batch_x, batch_y in self.dataiter: File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__ data = self._next_data() File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch data = self.dataset.__getitems__(possibly_batched_index) TypeError: 'NoneType' object is not callable ``` Full error log: https://app.circleci.com/pipelines/github/pytorch/benchmark/5143/workflows/0676f36d-0ab4-42bd-adb4-90e6b0df76d1/jobs/5293 Pull Request resolved: #85099 Approved by: https://github.com/ejguan
The [fastNLP](https://github.com/fastnlp/fastNLP/blob/v0.6.0/fastNLP/core/batch.py#L51) model uses DataSetGetter to fetch data from the dataset. The following code breaks because of #84301: ``` from fastNLP.io.pipe.qa import CMRC2018BertPipe input_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".data", "cmrc2018-sim") data_bundle = CMRC2018BertPipe().process_from_file(paths=input_dir) data_bundle.rename_field('chars', 'words') data_bundle.get_dataset('dev') dataset = DataSetGetter(dataset, as_numpy) dataiter = torch.utils.data.DataLoader(dataset=dataset) for batch in dataiter: # data-processing... ``` This is because for the `DataSetGetter` class, the following condition holds: ``` # hasattr(dataset_getter, '__getitems__') == True # dataset_getter.__getitems__ == None ``` This PR adds an additional check to make sure `__getitems__` is only called when it is not None. This error was found by the torchbench nightly CI, original error stack trace: ``` ERROR: test_fastNLP_Bert_train_cuda (__main__.TestBenchmark) ---------------------------------------------------------------------- components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last): File "/home/circleci/project/components/_impl/workers/subprocess_rpc.py", line 470, in _run_block exec( # noqa: P204 File "<subprocess-worker>", line 35, in <module> File "<subprocess-worker>", line 12, in _run_in_worker_f File "/home/circleci/project/torchbenchmark/util/model.py", line 16, in __call__ obj = type.__call__(cls, *args, **kwargs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 93, in __init__ self.example_inputs = self._prefetch(example_inputs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 133, in _prefetch for batch_x, batch_y in example_inputs: File "/home/circleci/miniconda3/lib/python3.8/site-packages/fastNLP/core/batch.py", line 266, in __iter__ for indices, batch_x, batch_y in self.dataiter: File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__ data = self._next_data() File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch data = self.dataset.__getitems__(possibly_batched_index) TypeError: 'NoneType' object is not callable ``` Full error log: https://app.circleci.com/pipelines/github/pytorch/benchmark/5143/workflows/0676f36d-0ab4-42bd-adb4-90e6b0df76d1/jobs/5293 Pull Request resolved: #85099 Approved by: https://github.com/ejguan
Summary: This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage.
Differential Revision: D39145980