Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][DataLoader] Prototype of SamplerIterableDataset #49363

Closed
wants to merge 7 commits into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Dec 14, 2020

Stack from ghstack:

Differential Revision: D25623637

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 14, 2020

💊 CI failures summary and remediations

As of commit 7283772 (more details on the Dr. CI page):


  • 6/6 failures possibly* introduced in this PR
    • 1/6 non-CircleCI failure(s)

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/5)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 728377242f Update on "[WIP][DataLoader] Prototype of SamplerIterableDataset"
+ git reset --hard 728377242f0cdbd0c58f8e2966b3710081eff9b2
HEAD is now at 728377242f Update on "[WIP][DataLoader] Prototype of SamplerIterableDataset"
+ git merge --allow-unrelated-histories --no-edit --no-ff 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7
CONFLICT (modify/delete): torch/testing/_internal/distributed/pipe_with_ddp_test.py deleted in HEAD and modified in 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7. Version 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7 of torch/testing/_internal/distributed/pipe_with_ddp_test.py left in tree.
Removing torch/distributed/_pipeline/__init__.py
Auto-merging test/run_test.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (2/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 19 02:30:27 RuntimeError: test_nn failed! Received signal: SIGIOT
Dec 19 02:30:27   test_MaxPool1d_indices_cuda_float64 (__main__.TestNNDeviceTypeCUDA) ... ok (0.055s)
Dec 19 02:30:27   test_MaxPool2d_indices_cuda_bfloat16 (__main__.TestNNDeviceTypeCUDA) ... ok (0.055s)
Dec 19 02:30:27   test_MaxPool2d_indices_cuda_float16 (__main__.TestNNDeviceTypeCUDA) ... ok (0.055s)
Dec 19 02:30:27   test_MaxPool2d_indices_cuda_float32 (__main__.TestNNDeviceTypeCUDA) ... terminate called after throwing an instance of 'std::out_of_range'
Dec 19 02:30:27   what():  vector::_M_range_check: __n (which is 18446744073709551614) >= this->size() (which is 2)
Dec 19 02:30:27 Traceback (most recent call last):
Dec 19 02:30:27   File "test/run_test.py", line 906, in <module>
Dec 19 02:30:27     main()
Dec 19 02:30:27   File "test/run_test.py", line 889, in main
Dec 19 02:30:27     raise RuntimeError(err_message)
Dec 19 02:30:27 RuntimeError: test_nn failed! Received signal: SIGIOT
Dec 19 02:30:28 + cleanup
Dec 19 02:30:28 + retcode=1
Dec 19 02:30:28 + set +x
Dec 19 02:30:28 =================== sccache compilation log ===================
Dec 19 02:30:28 =========== If your build fails, please take a look at the log above for possible reasons ===========
Dec 19 02:30:28 Compile requests                      0
Dec 19 02:30:28 Compile requests executed             0
Dec 19 02:30:28 Cache hits                            0
Dec 19 02:30:28 Cache misses                          0
Dec 19 02:30:28 Cache timeouts                        0

See CircleCI build pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (3/5)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 728377242f Update on "[WIP][DataLoader] Prototype of SamplerIterableDataset"
+ git reset --hard 728377242f0cdbd0c58f8e2966b3710081eff9b2
HEAD is now at 728377242f Update on "[WIP][DataLoader] Prototype of SamplerIterableDataset"
+ git merge --allow-unrelated-histories --no-edit --no-ff 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7
CONFLICT (modify/delete): torch/testing/_internal/distributed/pipe_with_ddp_test.py deleted in HEAD and modified in 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7. Version 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7 of torch/testing/_internal/distributed/pipe_with_ddp_test.py left in tree.
Removing torch/distributed/_pipeline/__init__.py
Auto-merging test/run_test.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 19 03:07:42 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 19 03:07:42 At:
Dec 19 03:07:42   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 19 03:07:42   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 19 03:07:42 
Dec 19 03:07:42 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 19 03:07:42 
Dec 19 03:07:42 At:
Dec 19 03:07:42   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 19 03:07:42   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 19 03:07:42 
Dec 19 03:07:42 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 19 03:07:42 
Dec 19 03:07:42 At:
Dec 19 03:07:42   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Dec 19 03:07:42   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Dec 19 03:07:42 
Dec 19 03:07:42 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Dec 19 03:07:42 [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Dec 19 03:07:43 ok (2.551s)
Dec 19 03:07:45   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown)
Dec 19 03:07:45 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker3: EOF: end of file (this is expected to happen during shutdown)

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (5/5)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 728377242f Update on "[WIP][DataLoader] Prototype of SamplerIterableDataset"
+ git reset --hard 728377242f0cdbd0c58f8e2966b3710081eff9b2
HEAD is now at 728377242f Update on "[WIP][DataLoader] Prototype of SamplerIterableDataset"
+ git merge --allow-unrelated-histories --no-edit --no-ff 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7
CONFLICT (modify/delete): torch/testing/_internal/distributed/pipe_with_ddp_test.py deleted in HEAD and modified in 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7. Version 3659560fba91dc5b6f0d1d9beaf8d04ff0acf4d7 of torch/testing/_internal/distributed/pipe_with_ddp_test.py left in tree.
Removing torch/distributed/_pipeline/__init__.py
Auto-merging test/run_test.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1


1 job timed out:

  • pytorch_linux_bionic_py3_8_gcc9_coverage_test1

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 29 times.

@ejguan
Copy link
Contributor Author

ejguan commented Dec 14, 2020

I am thinking Sampler can be eliminated as containers can be put at any position in the sequence. We can shuffle or make different batch before fetching data to do sampling over indexes or after fetching to do bucket batching based on data.
It would be same to Map dataset, no need to add sampler any more.

But, if we decide to add samplerDataset as a wrapper for original Sampler or user's customized sampler, I would support to add this wrapper class.

In general, for our new implementation, I would not suggest using SamplerDataset.

ejguan added a commit that referenced this pull request Dec 14, 2020
ghstack-source-id: e2099d04d07f6da90f1a7a5da039a24f12f4d56a
Pull Request resolved: #49363
ejguan added a commit that referenced this pull request Dec 16, 2020
ghstack-source-id: 6a35fffc2a859bc9bb0792ab3f3979778d4a4ec1
Pull Request resolved: #49363
@VitalyFedyunin
Copy link
Contributor

I suggest we need to put one more example on how to implement sampler as pure IterDataset and without old Sampler class.

@ejguan
Copy link
Contributor Author

ejguan commented Dec 16, 2020

I suggest we need to put one more example on how to implement sampler as pure IterDataset and without old Sampler class.

It makes sense. I will update our RFC.

ejguan added a commit that referenced this pull request Dec 16, 2020
ghstack-source-id: dd31d19d561347b5556c9d0e031dea5969217030
Pull Request resolved: #49363
ejguan added a commit that referenced this pull request Dec 17, 2020
ghstack-source-id: 5e696324f49a2da4194cce9cfa456478fd7495f0
Pull Request resolved: #49363
ejguan added a commit that referenced this pull request Dec 19, 2020
ghstack-source-id: bcb27cbd8925489d4230336151631c51cd0907b5
Pull Request resolved: #49363
@VitalyFedyunin
Copy link
Contributor

Ready to land

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in 7ed140a.

@facebook-github-bot facebook-github-bot deleted the gh/ejguan/14/head branch December 24, 2020 15:21
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
Summary: Pull Request resolved: pytorch#49363

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D25623637

Pulled By: ejguan

fbshipit-source-id: 9155d27d1fc91996b74110795cc73f1da0eedd44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants