@chewxy chewxy released this Aug 19, 2018 · 14 commits to master since this release

Ongoing notes:

  • CUDA: Better CUDA support (IN PROGRESS)
    • ColMajor used by default if engine is CUDA. (ColMajor is supported, but defaults to using RowMajor for all the major cuBLAS versions. Careful reasoning of the parameters obviates the need for ColMajor by default, which causes more headaches. It is still supported)
    • Transposition will be automatically done when performing transports back to CPU.
    • cudnn operations supported (IN PROGRESS) (note: these are the ones I use more often hence gets bigger attention):
      • Conv2d
      • Dropout
      • Maxpool2d
      • BatchNorm
      • Rectify
    • Other CUDA related optimizations
      • full cuBLAS support
  • New Ops:
    • BatchNorm
    • InvSqrt
    • CUDA enabled ops in ops/nn (preview for how things will start to look in v0.10.0)
  • New Features:
    • Limited shape inference. Working towards a calculus for shapes (first raised in #96 and #97).
  • Optimizations:
    • Optimizations of basic ops to use engine functions if available, otherwise, fall back to using Apply, which adds a penalty from repeatedly calling functions.
    • Faster VMs (1 of 2 VMs): greedy goroutines grabs gigs from a priority queue. This causes faster execution of code in general. (this is moved to a future version of 0.9.xx):
benchmark                           old ns/op      new ns/op      delta
BenchmarkTapeMachineExecution-8     3129074510     2695304022     -13.86%

benchmark                           old allocs     new allocs     delta
BenchmarkTapeMachineExecution-8     25745          25122          -2.42%

benchmark                           old bytes      new bytes      delta
BenchmarkTapeMachineExecution-8     4804578705     4803784111     -0.02%
  • Code generation: some exported API is now auto generated
  • New Solver : @ynqa added the Momentum solver.
  • Breaking API: Solver now take a slice of ValueGrad instead of Nodes. ValueGrad is an interface, of which a *Node fulfils. An additional utility function NodesToValueGrads has been added to aid with refactoring. This was done for two reasons:
    • The support for BatchNorm operation, which is a verily impure and highly stateful function. The BatchNorm Op has internal states that need to have their gradients updated as well. But the internal state of BatchNorm isn't really part of the expression graph, and really it shouldn't be. Turns out there was a better API for BatchNorm.
    • In the next version, v0.10.0. We aim to do better package organization for managability. With this API breaking change, the solver now is less dependent on the other parts of Gorgonia and can be easily separated.
  • Breaking Semantics: A gorgonia.VM now implements io.Closer. It should be treated as a resource as well as a computation device - the VM must be Close()d in order for the resources acquired by the VM to actually be released. Turns out, automatic resource management is too difficult. Who'd thunk that?
Assets 2