Release v0.3 · flashlight/flashlight

First stable release post-consolidation. Separates Flashlight into four parts:

flashlight/lib contains kernels and standalone utilities for sequence losses, beam search decoding, text processing, and more.
flashlight/fl is the core neural network library using the ArrayFire tensor library.
flashlight/app are applications of the core library to machine learning across domains.
flashlight/ext are extensions on top of Flashlight and ArrayFire that are useful across apps.

Major Features

Automatic mixed precision training (AMP) -- typed tensor and autograd operators
Framework for building custom memory managers on top of ArrayFire (docs)
OneDNN as a backend for primitive operations on the CPU
New dataset abstractions in core (flashlight/fl/dataset)
Application libraries
- Speech recognition (formerly the wav2letter project)
- Language modeling (autoregressive and masked/BERT-style LMs)
- Image classification (Resnet, VIT)
- Object detection (DETR)
Audio augmentation library (ASR)
Tools for training models using iterative pseudo-labeling (IPL) (ASR)
[early] OpenCL support with both RoCM and Intel

C++ 17 support -- gcc 7/clang 6 required.
Support for vcpkg via FL_BUILD_STANDALONE
Consolidation of wav2letter and app-based build selection
CMake 3.10 minimum, better support for shared objects
First class support for CUDA and Halide kernels
Improved support for downloading not-found dependencies (Gloo, KenLM, libsndfile)
Improved support for dependency management for downstream projects using Flashlight's installed CMake config (cmake/flashlightConfig.cmake.in)
Supporting padding in transformer/multihead attention
SpecAugment for raw waves (implemented vie low-pass filter)
Conformer Implementation
Improve autograd for indexing operator (support repeated indices)
Improve python bindings build, supporting setup.py install
A lot of docs.

Fixed padding issues in s2s models: pre-training window, encoder attention, encoder-decoder attention
Refactor s2s codebase
Fixes to memory allocations for s2s beam-search decoder (less memory, no OOM issues)
Fixes to beam-search decoder to support non-empty surround
Fixes to dataset pipeline + dynamic batching support