-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] PyTorch Inference #1638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] PyTorch Inference #1638
Conversation
The PyTorch repo is no longer cloned into the 3rd_Party directory instead Note CI is not building this change, that is beyond the scope of this PR but that change can be made in parallel. |
3rd_party/3rd_party.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to pytorch/pytorch#13130 (comment) if you build the binaries yourself then everything still works on aarch64. So I think the assumption should be that by the time we've got this working for all platforms aarch64 will also work. If it proves to be completely impossible to get working then maybe we can exclude them, but that will create headaches for documentation, support, Cloud, and will make the corresponding Java code much more complex as that will then also have to account for the hardware architecture, so even if it's a month of work to build for aarch64 it's probably less work company-wide than having a chunk of functionality that isn't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in the description aarch64 support is beta in the 1.7 release which came out as this was being written. I removed the conditional as it is a no-op until the libraries are installed on the platform anyway
bin/pytorch_inference/Main.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EMPTY, EMPTY
makes this call a no-op. Other comments that have been pasted from the other programs make it sound like logging could go to a named pipe, so I think there should be a TODO here saying that log redirection is still left to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a log pipe argument to the program and now log to the pipe. evaluate.py
reads the log pipe output but in the main thread for now.
I also removed the --xxxIsPipe
arguments as pipes are the only option right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a log pipe argument to the program and now log to the pipe. evaluate.py reads the log pipe output but in the main thread for now.
Thanks.
I also removed the
--xxxIsPipe
arguments as pipes are the only option right now
One thing to note about this is that it will add complexity in the production Java code in order to simplify a dev Python script - see https://github.com/elastic/elasticsearch/blob/d613a1a1d12890825b982c1a7a4fd6e17f20b325/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/process/ProcessPipes.java#L110-L133
Theoretically on-disk files could have been used for the inputs if someone put the appropriate binary data in them. We sometimes use this technique for running autodetect through valgrind.
I would keep the arguments consistent between all the processes that connect to Java using named pipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took out the --xxxIsPipe
arguments because that is not the intended usage but I can see the sense in keeping the arguments consistent with the other apps.
It also made me realise the C++ can be launched by the python script making testing a lot easier as you just have to run the python now.
build-setup/pytorch/setup_pytorch.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises the question of whether this PR is aiming for a dev-only prototype or the beginnings of a productionisation of the functionality.
If the intention is that it's the beginnings of productionisation then it shouldn't be "if you wish", it should be a compulsory setup step, and the instructions should be in build-setup/macos.md
, so that people who are setting up from scratch just have one set of instructions to follow. I know since you're specifically working on PyTorch it seems nicer to you to have that bit split out into a separate file, but in the future this will just be part of our required toolchain, and people set up one machine at a time, not one build component at a time, so it's better for the future to just have all the instructions for a particular platform in one file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language was intentional this is an optional step for everyone that is the reason the top level makefile does not build the the pytorch binary. If the future once this code is in production I can still see the value of having this step optional, if you just want to build anomaly detection why do you have to install anaconda, clone the PyTorch repo etc.
The use of a separate file in a subfolder is following the pattern set by clion
and strptime
and because the initial setup steps are the same for each platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only an optional step while this is a dev-only prototype. If we are ever going to ship this then it should be in the top level per-OS files.
clion
is truly optional even for shipped code. It's not installed on any of the CI machines.
strptime
has a separate directory because the patches are too complex to inline in the top level build setup instructions. But the instructions for using the files in that directory are in the top level build setup instructions here.
You could say "the initial setup steps are the same for each platform" for Boost. So instead of having a Boost section in macos.md
, linux.md
and windows.md
we could say in each of these files "now build Boost using the instructions in boost/boost.md
and direct people to another file that has separate sections for macOS, Linux and Windows. But I am not sure that would really make life easier for someone who had been tasked with setting up a particular type of build server. They'd have to look at two files instead of one, or N files instead of one in the case where we did this for every component.
So the instructions could stay as-is for this PR, while we're not actually building the new program on every build, but eventually they need to move to be consistent with the other instructions.
build-setup/pytorch/setup_pytorch.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People setting up build machines shouldn't be making decisions about how to do it. The instructions should tell them exactly which steps to follow. So for a controlled build it sounds like this step should be "You must install Anaconda". I am not sure how much it matters which version of Anaconda is used if we're effectively just using it to run the build script - presumably it doesn't play that much of a role in building the C++ libraries. But if I am wrong and it does affect the functionality of the C++ libraries that get built then the instructions should specify a specific version and link to the specific file to install from https://repo.anaconda.com/archive/ so that somebody setting up an official build server gets the prescribed version even if a more recent one has been released. That's only necessary if it affects the artifacts we'll redistribute though.
There's no need to mention "Additionally a C++14 compiler is required." The build instructions in build-setup/macos.md
will already have ensured this. (For macOS it just means the Anaconda section should go after the Xcode section.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same can be achieved by pip
installing the dependencies with Python 3.6 or above and installing CMake but I didn't want to create a maintenance issue by describing the more complex option. You don't have to install Anaconda but it comes with the build tools invoked by setup.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will always be different ways to build things. Consider the instructions for building Boost on Linux. Suppose instead they said "become an expert in how to build Boost, then build it in the most appropriate way". That would be a complete nightmare for somebody who just wanted to get to the point where they could type make -j 8
at the top level of the ml-cpp repo and get a working build.
You are the expert in how to build PyTorch, so you know several ways to do it that work. But for everyone else it's easiest if you choose one way that works and then write down a set of steps that can be mindlessly followed that get you to a working library.
If other experts read the build setup instructions and decide they know a better way to build one of the required components then that's fine, but the instructions should make it as simple and mechanical as possible for people who don't want to spend the time becoming experts in how to set up the tool chain to get something that works. Imagine that every sentence of the setup instructions needs to unambiguously specify one line of a shell script that automates it (which is pretty much what is required for the Dockerfiles).
build-setup/pytorch/setup_pytorch.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the MACOSX_DEPLOYMENT_TARGET=10.9
bit is worthwhile. We don't build our other code for 10.9. Currently it's 10.11 for 7.x and 10.14 for 8.x, but the default for Apple clang is to support the build platform and higher, and that's what we do for our other tools (i.e., for what we ship to customers we build 7.x on 10.11 and 8.x on 10.14).
|
||
## Build macOS | ||
``` | ||
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an assumption here that CMake is available, but it's not on a newly installed Mac, so instructions for installing CMake should be in these instructions. Probably before the conda install
step in case installing the Python cmake
module requires that the cmake
binary be on the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to merge this to the feature branch and iterate
The application connects to 4 named pipes and accepts a TorchScript model on the 'restore' pipe. Input in a series of tokens from the BERT vocabulary and the output an JSON document containing the results tensor. Logging is written to the 4th pipe. The script evaluate.py will start the app, connect the named pipes and run an example from the examples directory. This is the easiest way to test and develop the app serving as a proxy for the Java/ES side until that is complete. The app will not be built by a call to the top level makefile so the dependencies do not have to be installed yet.
This application reads a TorchScript model from a named pipe, the input is encoded tokens from the BERT dictionary.
The script
evaluate.py
starts the C++ app, connects the named pipes runs a language modelling example where the response is the classification of the tokens in the input sentence. This is the easiest way to test the app and serves as a proxy for the Java/ES side as this is in development.Input
LibTorch loads requires an istream supporting seek and tell to load the model, this cannot be done over a named pipe so the entire model is first read into a char array. That memory is freed once the model is loaded. The first 4 bytes sent to over the restore stream is the size of the model followed by the model definition. The inference input is a number of integers encoded in network order, the output is a JSON document.
LibTorch is a dependency
LibTorch has to be built externally and the headers added to build this app.
setup_pytorch.md
provides instructions for doing so using Anaconda. The3rd_party.sh
script will copy the dependencies to the install location.Building
I have not added this application
bin/Makefile
so it will not break builds where the library does not exist. If you want to build it follow the instructions insetup_pytorch.md
then run3rd_party/3rd_party.sh --add
andcd bin/pytorch_inference & make
There is no ARM build of PyTorch. UPDATE ARM support is in beta with the 1.7 release