Skip to content

FFI bindings for libtorch library

junji hashimoto edited this page Nov 23, 2019 · 1 revision

Welcome to the libtorch-ffi wiki!

What is this?

Experimental work on next-gen ffi bindings into the c++ libtorch library in preparation for 0.0.2 which targets the 1.0 backend.

pytorch-project provides prebuild-binaries of c++ libtorch library on official page and debian-package for ubuntu. By using the reliable binaries, we can start running haskell-programs on various environments quickly. (Complation of c++ libtorch for CUDA takes long time.) Imagine running hasktorch on colaboratory by typing single command.

The development of pytorch-project is very fast, then the API is changed frequently. So it is difficult to keep maintenance of haskell's API for it manually.

Approach

Support almost all libtorch-API automatically

There is a plan that Declarations.yaml becomes the single, externally visible API. See this issue.

Use generated Declarations.yaml spec instead of header parsing for code generation. Declarations.yaml is located at hasktorch/deps/pytorch/build/aten/src/ATen/Declarations.yaml. The file is generated by building libtorch-binary or running hasktorch/deps/get-deps.sh. It supports the functions of Native, TH and NN. It does not support the methods of c++'s class. The codes for methods are generated by hasktorch/spec/cppclass/*.yaml.

The dataflow is below.

spec/Declarations.yaml(pytorch) -> codegen(a program of this repo.) -> ffi(ffi bindings of this repo.)
spec/cppclass/*.yaml(this repo.)-| 

The method of connecting c++-API of libtorch to haskell

Use inline-c-cpp functionality to bind the C++ API instead of the C API. inline-c-cpp generates c++-codes and haskell-codes at compilation time. To generate the codes, it uses template-haskell.

Technically, symbols of c++-codes are wrapped by extern "C". See How to mix C and C++. The generated haskell-codes use FFI.

Original inline-c-cpp does not support namespace and template of c++. To support namespace and template of c++, we use modified inline-c-cpp. See this PR.

Mapping of data-types

C++ has 2 memory models. One is heap. Another is stack. libtorch functions return stack's object. When the function using the object of local variable returns, the object on stack is deleted,
For example, see below, when test() returns, "Tensor a" on stack is deleted.

void test(){
  at::Tensor a = at::ones({2, 2}, at::kInt);
  at::Tensor b = at::randn({2, 2});
  auto c = a + b.to(at::kInt);
}

So this ffi puts it on the heap using new so that it is not deleted.

at::Tensor* ones_for_haskell(){
  at::Tensor a = at::ones({2, 2}, at::kInt);
  return new at::Tensor(a);
}

Mapping of arguments-type

c-lang's data is passed to function-argument directly. c++'s object is passed to function-argument by using object-pointer.

Mapping of return-type

In end of function-call, c-lang's data returns by value. c++'s object returns by object-pointer with new.

Memory Management

Use garbage collection of GHC. Generated ffi-codes have unmanaged codes(hasktorch/libtorch-ffi/src/Aten/Unmanaged/*) and managed codes(hasktorch/libtorch-ffi/src/Aten/Managed/*). Unmanaged codes use 'Ptr'-type which is the same as c/c++'s raw-pointer.

Managed codes use ForeignPtr-type which is managed by GHC.

To convert unmanaged codes to managed codes, c++'s object have to be a instance of CppObject-type-class and managed codes is wrapped by cast of Castable-type-class. You can see details of cast in hasktorch/libtorch-ffi/src/Aten/Cast.hs.

class CppObject a where
  fromPtr :: Ptr a -> IO (ForeignPtr a)

class Castable a b where
  cast   :: a -> (b -> IO r) -> IO r
  uncast :: b -> (a -> IO r) -> IO r

instance (CppObject a) => Castable (ForeignPtr a) (Ptr a) where
  cast x f = withForeignPtr x f
  uncast x f = fromPtr x >>= f

cast0 :: (Castable a ca) => (IO ca) -> IO a
cast0 f = f >>= \ca -> uncast ca return

cast1 :: (Castable a ca, Castable y cy)
       => (ca -> IO cy) -> a -> IO y
cast1 f a = cast a $ \ca -> f ca >>= \cy -> uncast cy return
...

Support tuple-type of c++

c++'s tuple becomes CppTuple2-instance to access each data on the tuple. The example is below.

class CppTuple2 m where
  type A m
  type B m
  get0 :: m -> IO (A m)
  get1 :: m -> IO (B m)

instance CppTuple2 (Ptr (Tensor,Tensor)) where
  type A (Ptr (Tensor,Tensor)) = Ptr Tensor
  type B (Ptr (Tensor,Tensor)) = Ptr Tensor
  get0 v = [C.throwBlock| at::Tensor* { return new at::Tensor(std::get<0>(*$(std::tuple<at::Tensor,at::Tensor>* v)));}|]
  get1 v = [C.throwBlock| at::Tensor* { return new at::Tensor(std::get<1>(*$(std::tuple<at::Tensor,at::Tensor>* v)));}|]

Support operators of c++

c++ operators are mapped to haskell's functions like python's one. For example, operator+= is assigned to _iadd_. The details of the mapping is this code.

Error Handling

When c++-function of libtorch fail, throw exception.

Operations

Development Environment

  • For now, use stack. (To use cabal-v2, update shell.nix and cabal.project)

CI Environment

Update generated codes.

# Download libtorch-binary and generate 'Declarations.yaml'
> pushd deps
> ./get-deps.sh 
> popd
# Generate ffi-codes to output-directory.
> stack exec codegen-exe
# Check difference and copy the generated codes.
> diff -r output/Aten libtorch-ffi/src/Aten
> cp -r output/Aten libtorch-ffi/src/
# Build and test
> stack test libtorch-ffi

Test

Memory leak

See MemorySpec.hs.

Basic Test of Aten to use this ffi.

See BasicTest.hs.

Notes

Issues

  • Integrate this ffi to hasktorch/hasktorch.
  • (Resolved)Support autograd of libtorch on this ffi.
    • The at::Tensor class in ATen is not differentiable by default. To add the differentiability of tensors the autograd API provides, we must use tensor factory functions from the torch:: namespace instead of the at namespace.
    • For now, this ffi only uses tensor factory functions from the at:: namespace.
    • We will add factory functions from the torch:: namespace.
  • (Resolved)Make a script uploading pined libtorch-binaries for all environments and conditions(linux, mac and win).

FAQ

  • What does generated function's suffix mean? e.g. tts of add_tts.
    • c++ supports overload. Haskell does not support it. We use the suffix not to conflict the names of function on Haskell.
  • Is torch::Tensor the same as at::Tensor?
    • Yes.
  • Why not use fficxx?
    • fficxx does not support managed codes using ForeignPtr.
  • What are native_functions.yaml and nn.yaml?
    • These files is used to generate Declarations.yaml.
  • What is the difference between c-API and c++-API?

References

Policy

Please feel free to update this document and add FAQ.