Skip to content

BigDL release 0.3.0

Compare
Choose a tag to compare
@yiheng yiheng released this 08 Nov 03:36
· 6 commits to branch-0.3 since this release

Highlights

  • New protobuf-based model storage format
  • Support model quantization
  • Support sparse tensor and model
  • Easier and broader Tensorflow model load support
  • More layers/operations
  • Apache Spark 2.2 support

New Features

  • Models & Layers & Operations & Loss function
    • Support convlstm3D model
    • Support Variational Auto Encoder
    • Support Unet
    • Support PTB model
    • Add SpatialWithinChannelLRN layer
    • Add 3D-deconv layer
    • Add BifurcateSplitTable layer
    • Add KLD criterion
    • Add Gaussian layer
    • Add Sampler layer
    • Add RNN decoder layer
    • Support NHWC data format in 2D-conv, 2D-pooling layers
    • Support same/valid padding type in 2D-conv and 2D-pooling layers
    • Support dynamic execution flow in Graph
    • Graph node can pass nested tensors
    • Layer/Operation can support different input and output numeric tensor
    • Start to support operations in BigDL, add following operations: LogicalNot, LogicalOr, LogicalAnd, 1D Max Pooling, Squeeze, Prod, Sum, Reshape, Identity, ReLU, Equals, Greater, Less, Switch, Merge, Floor, L2Loss, RandomUniform, Rank, MatMul, SoftMax, Conv2d, Add, Assert, Onehot, Assign, Cast, ExpandDims, MaxPool, Realdiv, BiasAdd, Pad, Tile, StridedSlice, Transpose, Negative, AssignGrad, BiasAddGrad, Deconv2D, Conv2DBackFilter CrossEntropy, MaxPoolGrad, NoOp, RandomUniform, ReluGrad, Select, Sum, Pow, BroadcastGradientArgs, Control Dependency
    • Start to support sparse layers in BigDL, add following sparse layers: SparseLinear, SparseJoinTable, DenseToSparse
  • Tensor
    • Support sparse tensor
    • Support scalar (0-D tensor)
    • Tensor support more numeric type: boolean, short, int, long, string, char, bytestring
    • Tensor don’t display full content in toString when there’re too many elements
  • API change
    • Expose evaluate API to python
    • Add a predictClass API to model to simplify the code when user want to use model in classification
    • Change model.test to model.evaluate in Python
    • Refine Recurrent, BiRecurrent and RnnCell API
    • Sample.features from ndarray to JTensor/List[JTensor]
    • Sample.label from ndarray to JTensor
  • Install & Deploy
    • Support Apache Spark 2.2
    • Add script to run BigDL on Google DataProc platform
    • Refine run-example.sh scripts to run bigdl examples on AWS with build-in Spark
    • Pip install will now auto install spark-2.2
    • Add a docker file
  • Model Save/Load
    • New model persistent format(protobuf based) to provide a better user experience when save/load bigdl models
    • Support load more operations from Tensorflow
    • Support read tensor content from Tensorflow checkpoint
    • Support load a subset of Tensorflow graph
    • Support load Tensorflow preprocessing graph(read/parse tfrecord data, image decoders and queues)
    • Automatically convert data in Tensorflow queue to RDD and feeding model training in BigDL
    • Support load deconv layer from caffe and Tensorflow
    • Support save/load SpatialCrossLRN torch module
  • Training
    • Allow user to modify the optimization algorithm status when resuming the training in Python
    • Allow user to specify optimization algorithms, learning rate and learning rate decay when use BigDL in Spark * ML pipeline
    • Allow user to stop gradient on some layers in backpropagation
    • Allow user to freeze layer parameters in training
    • Add ML pipeline python API, user can use BigDL with ML pipeline in python code

Enhancement

  1. Support model quantization. User can speed up model inference by quantize the model
  2. Display bigdl model in Tensorboard
  3. User can easily convert a sequential model to graph model by invoking new added toGraph method
  4. Remove unnecessary contiguous check in 3D conv
  5. Support global average pooling
  6. Support regularizer in 3D convolution layer
  7. Add regularizer for convlstmpeephole3d
  8. Throw more meaningful messages in layers and criterions
  9. Migrate GRU/LSTM/RNN/LSTM-Peehole definition from sequence to graph
  10. Switch to pytest for python unit tests
  11. Speed up tanh layer
  12. Speed up sigmoid layer
  13. Speed up recurrent layer
  14. Support batch normalization in recurrent
  15. Speedup Python ndarray to scala tensor convertion
  16. Improve gradient sync performance in distributed training
  17. Speedup tensor dot operation with mkl dot
  18. Speedup copy operation in recurrent container
  19. Speedup logsoftmax
  20. Move classes.lst and img_class.lst to the model example folder, so user can easier to find them.
  21. Ensure spark.speculation is set to false to get a better performance in training
  22. Easier to turn on performance data in distributed training log
  23. Optimize memory usage when broadcasting the model
  24. Support mllib vector as feature for BigDL
  25. Support create multiple tensors Sample in python
  26. Support resizing in BytesToBGRImg

Bug Fix

  1. Fix TemporalConv layer cannot return parameter table
  2. Fix some bugs when loading dilated group convolution from caffe
  3. Fix some bugs when loading caffe v1 layers
  4. Fix a bug in TimeDistributed layer
  5. Fix get incorrect execution time in recurrent layers
  6. Fix inplace layer clear state bug
  7. Fix incorrect training data sample count under some input
  8. Remove label check in BytesToGreyImg
  9. Fix a bug in concat table when it contains no layer
  10. Fix a bug in maptable
  11. Fix some typos in document
  12. Use newInstance method to obtain FileSystem