Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm transform functions: generator changes #98

Closed

Conversation

mundya
Copy link

@mundya mundya commented Aug 9, 2017

Changes to the generators in meta to match changes made to the optimized Arm assembly in #93.

The formatting of the code that is produced by the generators doesn't match that of meta/transform_kernels_arm_{32|64}.h, so to keep this PR simple I've not included the generated headers. #99 contains both these changes and the generated headers.

Note: I've made some minor changes to test_transform_correctness to allow for slight differences between the actual and expected values which occur as a result of changing the order of some floating point operations.

Update ARM to Arm in AUTHORS and CONTRIBUTORS
Extends the NEON emitters to support fused SIMD multiply-accumulate
instructions. These instructions are used in a modified BiasAdd kernel.

`test_transform_correctness` is adapted to allow for minor differences
between the actual and expected values resulting from re-ordering of the
floating point operations.
Interleaving instructions results in a performance improvement for A57.
 - Extends the generators to allow for additional declarations in
   generated functions.
 - Modifies the Requantize kernel to use FMA instructions
 - `test_transform_correctness` is modified to allow for small
   differences in implementation resulting from changes in floating
   point operation order.
Interleave conversion calls and use prefetching to reduce cache misses
in the Requantize kernel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants