Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add doc/public.md and make more documentation improvements
- Loading branch information
Showing
7 changed files
with
200 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# Gemmlowp's public entry points | ||
|
||
gemmlowp's public interface is defined in | ||
[public/gemmlowp.h](../public/gemmlowp.h). | ||
|
||
## GemmWithOutputPipeline | ||
|
||
The primary public entry point is: `GemmWithOutputPipeline`. | ||
|
||
A usage example is given in | ||
[doc/quantization_example.cc](quantization_example.cc). | ||
|
||
The prototype is: | ||
|
||
``` | ||
template <typename InputScalar, typename OutputScalar, typename BitDepthParams, | ||
MapOrder LhsOrder, MapOrder RhsOrder, MapOrder ResultOrder, | ||
typename OutputPipelineType, typename GemmContextType> | ||
void GemmWithOutputPipeline(GemmContextType* context, | ||
const MatrixMap<const InputScalar, LhsOrder>& lhs, | ||
const MatrixMap<const InputScalar, RhsOrder>& rhs, | ||
MatrixMap<OutputScalar, ResultOrder>* result, | ||
int lhs_offset, int rhs_offset, | ||
const OutputPipelineType& output_pipeline); | ||
``` | ||
|
||
A typical call looks like (from the [usage example](quantization_example.cc)): | ||
|
||
``` | ||
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::uint8_t, | ||
gemmlowp::DefaultL8R8BitDepthParams>( | ||
&gemm_context, uint8_lhs_matrix, uint8_rhs_matrix, | ||
&uint8_result_matrix, lhs_offset, rhs_offset, output_pipeline); | ||
``` | ||
|
||
### Template parameters | ||
|
||
Typically only the 3 first template parameters need to be specified, the rest | ||
being automatically deduced from function parameters: | ||
|
||
* `InputScalar`: The scalar type of the LHS and RHS operands. At the moment, | ||
this must be `std::uint8_t`. | ||
* `OutputScalar`: The scalar type of the LHS and RHS operands. At the moment, | ||
this must be `std::uint8_t`. | ||
* `BitDepthParams`: Defines the bit format of the input and output matrices | ||
and the required accuracy of the computation. At the moment, the only | ||
non-deprecated valid value is `gemmlowp::DefaultL8R8BitDepthParams`. See | ||
[less-than-8-bit.md](less-than-8-bit.md) for other values and the general | ||
idea of this, and how it may become more useful in the future. | ||
|
||
The other template parameters, which typically do not need to be specified, are: | ||
|
||
* `LhsOrder`, `RhsOrder`, `ResultOrder`: the storage orders (row-major or | ||
column-major) of the LHS, RHS, result matrices. See | ||
[public/map.h](../public/map.h). See the below performance note: we | ||
recommend using respectively RowMajor, ColMajor, ColMajor for optimal | ||
performance. | ||
* `OutputPipelineType`: the actual `std::tuple` type of the output pipeline. | ||
See below explanation of the `output_pipeline` parameter, and | ||
[output.md](output.md). | ||
* `GemmContextType`: the type of the `context` parameter. At the moment, this | ||
must be `gemmlowp::GemmContext`. | ||
|
||
### Function parameters | ||
|
||
The function parameters taken by `GemmWithOutputPipeline` are: | ||
|
||
* `context`: The `gemmlowp::GemmContext` object holding state and resources to | ||
be used for this gemmlowp call. | ||
* `lhs`, `rhs`: The LHS and RHS operand matrices. Note that these are | ||
`MatrixMap` objects, mapping external buffers as matrices, not owning data. | ||
See [public/map.h](../public/map.h). | ||
* `result`: pointer to the destination `MatrixMap` object, which must be | ||
already constructed, wrapping the external destination buffer with the | ||
wanted destination matrix shape and storage layout. No memory allocation | ||
will be performed by gemmlowp for the destination buffer. See | ||
[public/map.h](../public/map.h). | ||
* `lhs_offset`, `rhs_offset` are constants added to each matrix entry in the | ||
LHS, RHS matrices respectively, as explained in | ||
[low-precision.md](low-precision.md). This is only the part of the | ||
quantization paradigm explained in [quantization.md](quantization.md) that | ||
needs to be implemented as operations on the operands; everything else is | ||
operations on the result, see `output_pipeline`. | ||
* `output_pipeline` is a `std::tuple` of output stages (see | ||
[public/output_stages.h](../public/output_stages.h)), specifying the output | ||
pipeline (see [output.md](output.md)). This is the part of the quantization | ||
paradigm explained in [quantization.md](quantization.md) that needs to be | ||
implemented as operations on the result matrix. | ||
|
||
### Performance note on storage orders. | ||
|
||
gemmlowp supports arbitrary combinations of storage orders for the LHS, RHS and | ||
result matrices. However, not all are equally optimized for. | ||
|
||
Because gemmlowp is primarily aimed at neural network inference workloads, | ||
optimization focus is on this particular combination of storage orders: | ||
|
||
* `LhsOrder=RowMajor` | ||
* `RhsOrder=ColMajor` | ||
* `ResultOrder=ColMajor` | ||
|
||
The rationale is that the LHS is typically the constant weights of a neural | ||
network layer (e.g. the weights of a Convolutional layer implemented as a matrix | ||
multiplication), while the RHS and result are neural network activations, | ||
respectively the input and output activations of the layer. | ||
|
||
Because the RHS and result are activations, we want them to share the same | ||
storage order -- so that one layer's output activations can be readily used as | ||
the next layer's input activations. Thus, we focus on `RhsOrder=ResultOrder`. | ||
|
||
We also know from general considerations on matrix multiplication that it is | ||
slightly more efficient to have the direction of accumulation (the "depth" | ||
dimension) be the direction of contiguous storage in memory. That means that it | ||
is always going to be slightly easier and more efficient to have | ||
`LhsOrder=RowMajor` and `RhsOrder=ColMajor`. | ||
|
||
Putting this together, we arrive at gemmlowp's focus on the above-described | ||
combination of storage orders. | ||
|
||
Using other storage orders will typically mean taking less efficient paths in | ||
the packing and unpacking stages, see [packing.md](packing.md). The compute | ||
kernel stage ([kernel.md](kernel.md)) is unaffected. | ||
|
||
## GemmWithOutputPipelinePC | ||
|
||
This is a variant where `lhs_offset` and `rhs_offset` may be vectors instead of | ||
scalar. They are then broadcasted against LHS, RHS respectively. | ||
|
||
This is useful for some flavors of neural network inference with "per-channel | ||
quantization", whence the PC suffix. This has been useful in some settings where | ||
a neural network trained in float arithmetic was subsequently quantized. On the | ||
other hand, retraining neural networks for quantized inference tends to remove | ||
the need for per-channel quantization. For that reason, the long-term usefulness | ||
of this entry point is in question. | ||
|
||
## Gemm | ||
|
||
This is gemmlowp's original, now legacy and deprecated, entry point. See the | ||
section of [low-precision.md](low-precision.md) on the legacy quantization | ||
paradigm. Avoid in new code. | ||
|
||
## The eight_bit_int_gemm directory | ||
|
||
As explained in the top-level [README.md](../README.md#public-interfaces), this | ||
is entirely deprecated. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters