v0.3
First stable release post-consolidation. Separates Flashlight into four parts:
flashlight/lib
contains kernels and standalone utilities for sequence losses, beam search decoding, text processing, and more.flashlight/fl
is the core neural network library using the ArrayFire tensor library.flashlight/app
are applications of the core library to machine learning across domains.flashlight/ext
are extensions on top of Flashlight and ArrayFire that are useful across apps.
Major Features
- Automatic mixed precision training (AMP) -- typed tensor and autograd operators
- Framework for building custom memory managers on top of ArrayFire (docs)
- OneDNN as a backend for primitive operations on the CPU
- New dataset abstractions in
core
(flashlight/fl/dataset
) - Application libraries
- Speech recognition (formerly the wav2letter project)
- Language modeling (autoregressive and masked/BERT-style LMs)
- Image classification (Resnet, VIT)
- Object detection (DETR)
- Audio augmentation library (ASR)
- Tools for training models using iterative pseudo-labeling (IPL) (ASR)
- [early] OpenCL support with both RoCM and Intel
Build Changes/Improvements
- C++ 17 support -- gcc 7/clang 6 required.
- Support for
vcpkg
viaFL_BUILD_STANDALONE
- Consolidation of wav2letter and app-based build selection
- CMake 3.10 minimum, better support for shared objects
- First class support for CUDA and Halide kernels
- Improved support for downloading not-found dependencies (Gloo, KenLM, libsndfile)
- Improved support for dependency management for downstream projects using Flashlight's installed CMake config (
cmake/flashlightConfig.cmake.in
) - Supporting padding in transformer/multihead attention
- SpecAugment for raw waves (implemented vie low-pass filter)
- Conformer Implementation
- Improve autograd for indexing operator (support repeated indices)
- Improve python bindings build, supporting
setup.py install
- A lot of docs.
Improvements/new features in wav2letter (flashlight/app/asr
)
- Fixed padding issues in s2s models: pre-training window, encoder attention, encoder-decoder attention
- Refactor s2s codebase
- Fixes to memory allocations for s2s beam-search decoder (less memory, no OOM issues)
- Fixes to beam-search decoder to support non-empty surround
- Fixes to dataset pipeline + dynamic batching support