Skip to content

Conversation

ShawnZhong
Copy link
Contributor

@ShawnZhong ShawnZhong commented Jun 9, 2020

Fix #39572

Add generator= kwarg for DataLoader & random samplers

cc: @ssnl, @deeppatel4557, @albanD, @mitar

@ShawnZhong ShawnZhong changed the title Add generator= kwarg for DataLoader & random samplers [WIP] Add generator= kwarg for DataLoader & random samplers Jun 9, 2020
@dr-ci
Copy link

dr-ci bot commented Jun 9, 2020

💊 CI failures summary and remediations

As of commit d95ff17 (more details on the Dr. CI page):


  • 9/9 failures possibly* introduced in this PR
    • 2/9 non-CircleCI failure(s)

🕵️ 7 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cpu_test2 (1/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_autograd failed!
  File "test_autograd.py", line 6451, in <module> 
    run_tests() 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 278, in run_tests 
    import xmlrunner 
ModuleNotFoundError: No module named 'xmlrunner' 
Traceback (most recent call last): 
  File "run_test.py", line 711, in <module> 
    main() 
  File "run_test.py", line 704, in main 
    raise RuntimeError(message) 
RuntimeError: test_autograd failed! 
 
(base) circleci@PACKER-5ECD3249 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

See CircleCI build pytorch_windows_vs2019_py36_cpu_test1 (2/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    Tell CMake where to find the compiler by setting either the environment 
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to 
    the compiler, or to the compiler name if it is in the PATH. 
   
   
  -- Configuring incomplete, errors occurred! 
  See also "C:/Users/circleci/AppData/Local/Temp/pip-install-m450xtk9/ninja/_cmake_test_compile/build/CMakeFiles/CMakeOutput.log". 
  See also "C:/Users/circleci/AppData/Local/Temp/pip-install-m450xtk9/ninja/_cmake_test_compile/build/CMakeFiles/CMakeError.log". 
  Not searching for unused variables given on the command line. 
  -- The C compiler identification is unknown 
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE): 
    The CMAKE_C_COMPILER: 
   
      cl 
   
    is not a full path and was not found in the PATH. 
   
    To use the JOM generator with Visual C++, cmake must be run from a shell 
    that can use the compiler cl from the command line.  This environment is 
    unable to invoke the cl compiler.  To fix this problem, run cmake from the 
    Visual Studio Command Prompt (vcvarsall.bat). 

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test1 (3/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    Tell CMake where to find the compiler by setting either the environment 
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to 
    the compiler, or to the compiler name if it is in the PATH. 
   
   
  -- Configuring incomplete, errors occurred! 
  See also "C:/Users/circleci/AppData/Local/Temp/pip-install-9z5tud79/ninja/_cmake_test_compile/build/CMakeFiles/CMakeOutput.log". 
  See also "C:/Users/circleci/AppData/Local/Temp/pip-install-9z5tud79/ninja/_cmake_test_compile/build/CMakeFiles/CMakeError.log". 
  Not searching for unused variables given on the command line. 
  -- The C compiler identification is unknown 
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE): 
    The CMAKE_C_COMPILER: 
   
      cl 
   
    is not a full path and was not found in the PATH. 
   
    To use the JOM generator with Visual C++, cmake must be run from a shell 
    that can use the compiler cl from the command line.  This environment is 
    unable to invoke the cl compiler.  To fix this problem, run cmake from the 
    Visual Studio Command Prompt (vcvarsall.bat). 

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (4/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_autograd failed!
  File "test_autograd.py", line 6451, in <module> 
    run_tests() 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 278, in run_tests 
    import xmlrunner 
ModuleNotFoundError: No module named 'xmlrunner' 
Traceback (most recent call last): 
  File "run_test.py", line 711, in <module> 
    main() 
  File "run_test.py", line 704, in main 
    raise RuntimeError(message) 
RuntimeError: test_autograd failed! 
 
(base) circleci@PACKER-5ECD3242 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_on_cpu_test2 (5/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_autograd failed!
  File "test_autograd.py", line 6451, in <module> 
    run_tests() 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 278, in run_tests 
    import xmlrunner 
ModuleNotFoundError: No module named 'xmlrunner' 
Traceback (most recent call last): 
  File "run_test.py", line 711, in <module> 
    main() 
  File "run_test.py", line 704, in main 
    raise RuntimeError(message) 
RuntimeError: test_autograd failed! 
 
(base) circleci@PACKER-5ECD3249 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_on_cpu_test1 (6/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE):
    Tell CMake where to find the compiler by setting either the environment 
    variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to 
    the compiler, or to the compiler name if it is in the PATH. 
   
   
  -- Configuring incomplete, errors occurred! 
  See also "C:/Users/circleci/AppData/Local/Temp/pip-install-qhf9i0pf/ninja/_cmake_test_compile/build/CMakeFiles/CMakeOutput.log". 
  See also "C:/Users/circleci/AppData/Local/Temp/pip-install-qhf9i0pf/ninja/_cmake_test_compile/build/CMakeFiles/CMakeError.log". 
  Not searching for unused variables given on the command line. 
  -- The C compiler identification is unknown 
  CMake Error at CMakeLists.txt:3 (ENABLE_LANGUAGE): 
    The CMAKE_C_COMPILER: 
   
      cl 
   
    is not a full path and was not found in the PATH. 
   
    To use the JOM generator with Visual C++, cmake must be run from a shell 
    that can use the compiler cl from the command line.  This environment is 
    unable to invoke the cl compiler.  To fix this problem, run cmake from the 
    Visual Studio Command Prompt (vcvarsall.bat). 

See CircleCI build pytorch_macos_10_13_py3_test (7/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 11 21:32:27 RuntimeError: test_cuda failed!
Jun 11 21:32:27   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/librosa/util/utils.py", line 15, in <module> 
Jun 11 21:32:27     from .decorators import deprecated 
Jun 11 21:32:27   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/librosa/util/decorators.py", line 9, in <module> 
Jun 11 21:32:27     from numba.decorators import jit as optional_jit 
Jun 11 21:32:27 ModuleNotFoundError: No module named 'numba.decorators' 
Jun 11 21:32:27 Traceback (most recent call last): 
Jun 11 21:32:27   File "test/run_test.py", line 711, in <module> 
Jun 11 21:32:27     main() 
Jun 11 21:32:27   File "test/run_test.py", line 704, in main 
Jun 11 21:32:27     raise RuntimeError(message) 
Jun 11 21:32:27 RuntimeError: test_cuda failed! 
Jun 11 21:32:27 + cleanup 
Jun 11 21:32:27 + retcode=1 
Jun 11 21:32:27 + set +x 

ci.pytorch.org: 2 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 26 times.

@ShawnZhong ShawnZhong changed the title [WIP] Add generator= kwarg for DataLoader & random samplers Add generator= kwarg for DataLoader & random samplers Jun 9, 2020
@ShawnZhong ShawnZhong marked this pull request as ready for review June 9, 2020 19:50
@ShawnZhong ShawnZhong requested a review from apaszke as a code owner June 9, 2020 19:50
Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generator for DataLoader should also be used for base_seed in iterators

@albanD albanD self-requested a review June 9, 2020 20:36
@albanD albanD added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: nn Related to torch.nn labels Jun 9, 2020
@ShawnZhong ShawnZhong marked this pull request as draft June 10, 2020 21:02
@ShawnZhong ShawnZhong changed the title Add generator= kwarg for DataLoader & random samplers [WIP] Add generator= kwarg for DataLoader & random samplers Jun 10, 2020
@ShawnZhong ShawnZhong marked this pull request as ready for review June 11, 2020 17:42
@ShawnZhong ShawnZhong changed the title [WIP] Add generator= kwarg for DataLoader & random samplers Add generator= kwarg for DataLoader & random samplers Jun 11, 2020
@ShawnZhong ShawnZhong requested a review from ssnl June 11, 2020 17:43
Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test the base_seed reproducibility by doing something similar to test_worker_seed?

@ShawnZhong ShawnZhong marked this pull request as draft June 12, 2020 02:35
dataset = SynchronizedSeedDataset(num_workers, batch_size, num_workers)
dataloader = DataLoader(dataset, batch_size=batch_size, num_workers=num_workers, generator=torch.Generator().manual_seed(42))
actual_seeds = set(int(batch) for batch in dataloader)
expected_seeds = set(6909045637428952499 + i for i in range(num_workers))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this. seeding is only guaranteed to be consistent on the same device & version. just do two runs and compare results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge when green. thanks!

@albanD
Copy link
Collaborator

albanD commented Jun 12, 2020

Windows CI is having a big hiccup because of some pip issues. Don't worry about those :)

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ShawnZhong ShawnZhong marked this pull request as ready for review June 12, 2020 16:46
@ssnl
Copy link
Collaborator

ssnl commented Jun 15, 2020

xwang233 pushed a commit to xwang233/pytorch that referenced this pull request Jun 20, 2020
Summary:
Fix pytorch#39572

Add `generator=` kwarg for DataLoader & random samplers

cc: SsnL, deeppatel4557, albanD, mitar
Pull Request resolved: pytorch#39737

Differential Revision: D22019132

Pulled By: albanD

fbshipit-source-id: 835e08b86c5396bc0b0e41057661306b15394d6e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: nn Related to torch.nn open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Required some argument in dataloader for setting randomstate
5 participants