Skip to content

Conversation

droberts195
Copy link
Contributor

Changes:

  • Adds build setup instructions for PyTorch 1.7.1 on Windows.
  • Some minor tweaks to the build system to account for the
    way Windows imports DLLs using import libraries, and the
    extra dependencies of the PyTorch DLLs on Windows.
  • WinSock2 is needed for ntohl on Windows.
  • Reinstate the --version functionality to the new program.

This needs another iteration before the feature branch is merged to master. The outstanding issues relate to the commands and flags used within the PyTorch build:

  • The PyTorch build system uses /Z7 instead of /Zi for debug symbols. This causes the DLLs to be huge, for example torch_cpu.dll is 78MB. It's crazy for us to ship that when most of it is debug symbols and our customers aren't going to be doing debugging. So we need to find a way to use /Zi even if there is some drawback to it, or if /Zi is completely out-of-the-question then turn off debug symbol generation altogether for this library.
  • Some compilations report errors like cl : Command line warning D9025 : overriding '/O2' with '/Od' - we need to find which ones and confirm it makes sense.
  • The warning Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT - Failed suggests that the build system is making decisions about which CPU instructions to use based on the capabilities of the build machine. Great for someone who's going to build and run on their own laptop, but not great for a controlled redistribution as part of a product. This ties in with the point below about exactly which compiler options are used during the build, as this will reveal flags that may be limiting which runtime machines will work.
  • The build uses incremental linking. This is great for developers who are iteratively debugging PyTorch itself, but it may be better for us to switch it off. Some of the linking steps take ages when building PyTorch. This may be because LTO is in use (unclear if it is - see below), or it may be due to building from scratch while setting up for incremental linking.
  • The above are just specific examples, but the Ninja build system used by PyTorch doesn't print the command line arguments being used for each file. We need to figure out how to get it to print them and audit what else is being used that might not be ideal for us.
  • Since the build time is so long there may be a benefit in only building the libraries we're going to ship. Need to look at the build system to figure out how easy this would be.

So there's still lots to do, but the changes of this PR prove that it's possible for us to build a Windows program that uses PyTorch.

- Adds build setup instructions for PyTorch 1.7.1 on Windows.
- Some minor tweaks to the build system to account for the
  way Windows imports DLLs using import libraries, and the
  extra dependencies of the PyTorch DLLs on Windows.
- WinSock2 is needed for ntohl on Windows.
- Reinstate the --version functionality to the new program.
@droberts195
Copy link
Contributor Author

The PyTorch build system uses /Z7 instead of /Zi for debug symbols

From https://github.com/pytorch/pytorch/blob/c458558334e4aef5107bb60807c76ea9a7078727/CMakeLists.txt#L312-L318:

# /Z7 override option
# When generating debug symbols, CMake default to use the flag /Zi.
# However, it is not compatible with sccache. So we rewrite it off.
# But some users don't use sccache; this override is for them.
cmake_dependent_option(
  MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON
  "MSVC" OFF)

We fall into the "don't use sccache" category, so should use MSVC_Z7_OVERRIDE=OFF on the next iteration.

@davidkyle
Copy link
Member

PyTorch doesn't print the command line arguments being used for each file. We need to figure out how to get it to print them and audit what else is being used that might not be ideal for us.

Configure cmake with -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON then when building with cmake --build . --target install or just make you will see the full output.

Also if there is a link error cmake will create a file link.txt containing the link command

https://sidvind.com/wiki/CMake/Verbose_output

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195 droberts195 merged commit 3e23336 into elastic:feature/pytorch-inference Jan 21, 2021
@droberts195 droberts195 deleted the torch_windows_libs branch January 21, 2021 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants