The Rust Compiler calls memcpy for slices of unknown length. There's considerable overhead for that in some cases.
If you know most of you copy operations are not too big you can use fastcpy to speed up your program.
fastcpy provides a faster implementation of memcpy for slices up to 32bytes (64bytes with avx).
It is designed to contain not too much assembly, so the overhead is low.
As fall back the standard memcpy is called
fastcpy employs a double copy trick to copy slices of length 4-32bytes (64bytes with avx).
E.g. Slice of length 6 can be copied with two uncoditional copy operations.
/// [1, 2, 3, 4, 5, 6]
/// [1, 2, 3, 4]
/// [3, 4, 5, 6]