-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Closed
Description
I tried this code:
#![feature(stdarch_x86_avx512)]
use std::arch::x86_64::{__m128i, __m512i, _mm512_add_epi32, _mm512_conflict_epi32, _mm512_cvtepu8_epi32, _mm512_i32gather_epi32, _mm512_i32scatter_epi32, _mm512_popcnt_epi32, _mm512_set1_epi32, _mm_loadu_epi8};
pub unsafe fn histogram_avx512_unsafe(input: &[u8]) -> Vec<u32> {
let mut v = Vec::from([0u32; 256]);
for chunk in input.chunks_exact(16) {
let bytes = _mm_loadu_epi8(chunk.as_ptr().cast());
let vindex = _mm512_cvtepu8_epi32(bytes);
let cmask = _mm512_conflict_epi32(vindex);
let all_one = _mm512_set1_epi32(1);
let counts = _mm512_popcnt_epi32(cmask);
let counts = _mm512_add_epi32(counts, all_one);
let values = _mm512_i32gather_epi32::<1>(vindex, v.as_ptr().cast());
let new_values = _mm512_add_epi32(counts, values);
// we write all the new values here. if I'm right, we can ignore index conflicts after we've gotten the counts,
// due to how scatter is guaranteed to write from the source register's LSB to MSB per Intel's documentation
_mm512_i32scatter_epi32::<1>(v.as_mut_ptr().cast(), vindex, new_values);
}
v
}
#[cfg(test)]
mod tests {
#[test]
fn test() {
use super::*;
let input = [1u8, 2, 3, 3, 3, 6, 7, 8, 9, 10, 10, 12, 13, 14, 15, 16];
let res = unsafe { histogram_avx512_unsafe(&input) };
println!("{:?}", res);
}
}I expected to see this happen: Scatter to write 32-bit integers into memory locations calculated by a base address (the slice address) and an offset vector, scaled by the SCALE factor.
Instead, this happened: Scatter wrote 32-bit integer into memory locations calculated by a base address and the offset vector, scaled by the SCALE factor, but instead of stepping by 32bits, it steps by 8, clobbering neighboring values. A SCALE of 4 instead of 1 shows this and gives correct results.
Intel's description of the operation:
FOR j := 0 to 15
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
Meta
rustc --version --verbose:
binary: rustc
commit-hash: 98aa3624be70462d6a25ed5544333e3df62f4c66
commit-date: 2024-02-08
host: x86_64-pc-windows-msvc
release: 1.78.0-nightly
LLVM version: 17.0.6
Running on Zen4, 7840hs.
Backtrace
N/A no backtrace given, rustc doesn't crash
edit: This affects at least _mm512_i32gather_epi32 as well
Metadata
Metadata
Assignees
Labels
No labels