[ML] Build setup instructions and tweaks for Torch on Windows #1677

droberts195 · 2021-01-19T11:22:09Z

Changes:

Adds build setup instructions for PyTorch 1.7.1 on Windows.
Some minor tweaks to the build system to account for the
way Windows imports DLLs using import libraries, and the
extra dependencies of the PyTorch DLLs on Windows.
WinSock2 is needed for ntohl on Windows.
Reinstate the --version functionality to the new program.

This needs another iteration before the feature branch is merged to master. The outstanding issues relate to the commands and flags used within the PyTorch build:

The PyTorch build system uses /Z7 instead of /Zi for debug symbols. This causes the DLLs to be huge, for example torch_cpu.dll is 78MB. It's crazy for us to ship that when most of it is debug symbols and our customers aren't going to be doing debugging. So we need to find a way to use /Zi even if there is some drawback to it, or if /Zi is completely out-of-the-question then turn off debug symbol generation altogether for this library.
Some compilations report errors like cl : Command line warning D9025 : overriding '/O2' with '/Od' - we need to find which ones and confirm it makes sense.
The warning Performing Test COMPILER_SUPPORTS_NO_AVX256_SPLIT - Failed suggests that the build system is making decisions about which CPU instructions to use based on the capabilities of the build machine. Great for someone who's going to build and run on their own laptop, but not great for a controlled redistribution as part of a product. This ties in with the point below about exactly which compiler options are used during the build, as this will reveal flags that may be limiting which runtime machines will work.
The build uses incremental linking. This is great for developers who are iteratively debugging PyTorch itself, but it may be better for us to switch it off. Some of the linking steps take ages when building PyTorch. This may be because LTO is in use (unclear if it is - see below), or it may be due to building from scratch while setting up for incremental linking.
The above are just specific examples, but the Ninja build system used by PyTorch doesn't print the command line arguments being used for each file. We need to figure out how to get it to print them and audit what else is being used that might not be ideal for us.
Since the build time is so long there may be a benefit in only building the libraries we're going to ship. Need to look at the build system to figure out how easy this would be.

So there's still lots to do, but the changes of this PR prove that it's possible for us to build a Windows program that uses PyTorch.

- Adds build setup instructions for PyTorch 1.7.1 on Windows. - Some minor tweaks to the build system to account for the way Windows imports DLLs using import libraries, and the extra dependencies of the PyTorch DLLs on Windows. - WinSock2 is needed for ntohl on Windows. - Reinstate the --version functionality to the new program.

droberts195 · 2021-01-19T11:45:15Z

The PyTorch build system uses /Z7 instead of /Zi for debug symbols

From https://github.com/pytorch/pytorch/blob/c458558334e4aef5107bb60807c76ea9a7078727/CMakeLists.txt#L312-L318:

# /Z7 override option
# When generating debug symbols, CMake default to use the flag /Zi.
# However, it is not compatible with sccache. So we rewrite it off.
# But some users don't use sccache; this override is for them.
cmake_dependent_option(
  MSVC_Z7_OVERRIDE "Work around sccache bug by replacing /Zi and /ZI with /Z7 when using MSVC (if you are not using sccache, you can turn this OFF)" ON
  "MSVC" OFF)

We fall into the "don't use sccache" category, so should use MSVC_Z7_OVERRIDE=OFF on the next iteration.

davidkyle · 2021-01-20T11:25:16Z

PyTorch doesn't print the command line arguments being used for each file. We need to figure out how to get it to print them and audit what else is being used that might not be ideal for us.

Configure cmake with -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON then when building with cmake --build . --target install or just make you will see the full output.

Also if there is a link error cmake will create a file link.txt containing the link command

https://sidvind.com/wiki/CMake/Verbose_output

davidkyle

LGTM

droberts195 added :ml >build labels Jan 19, 2021

davidkyle mentioned this pull request Jan 20, 2021

[ML] Add LibTorch to the CI build images #1679

Closed

17 tasks

davidkyle approved these changes Jan 21, 2021

View reviewed changes

droberts195 merged commit 3e23336 into elastic:feature/pytorch-inference Jan 21, 2021

droberts195 deleted the torch_windows_libs branch January 21, 2021 09:40

davidkyle added 3rd party models and removed :ml labels Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Build setup instructions and tweaks for Torch on Windows #1677

[ML] Build setup instructions and tweaks for Torch on Windows #1677

Uh oh!

droberts195 commented Jan 19, 2021

Uh oh!

droberts195 commented Jan 19, 2021

Uh oh!

davidkyle commented Jan 20, 2021

Uh oh!

davidkyle left a comment

Uh oh!

Uh oh!

[ML] Build setup instructions and tweaks for Torch on Windows #1677

[ML] Build setup instructions and tweaks for Torch on Windows #1677

Uh oh!

Conversation

droberts195 commented Jan 19, 2021

Uh oh!

droberts195 commented Jan 19, 2021

Uh oh!

davidkyle commented Jan 20, 2021

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!