Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions kernels/optimized/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,22 @@ target_link_libraries(
kernels_util_all_deps
)
target_compile_options(optimized_kernels PUBLIC ${_common_compile_options})

# op_grid_sampler_2d_fp16_hw.cpp uses hardware fp16 NEON intrinsics
# (vcvt_f32_f16 / vld1_f16). Those are part of the ARMv8.2-a+fp16 extension and
# raise SIGILL on chips without it. Scope the `-march` flag to just that
# translation unit. The main op_grid_sampler_2d.cpp (which hosts the runtime
# dispatcher via cpuinfo_has_arm_neon_fp16) and the fp16 software-convert path
# stay on plain ARMv8 so they can run on any chip.
if(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64|arm64" OR ANDROID_ABI STREQUAL
"arm64-v8a"
)
set_source_files_properties(
${EXECUTORCH_ROOT}/kernels/optimized/cpu/op_grid_sampler_2d_fp16_hw.cpp
PROPERTIES COMPILE_OPTIONS "-march=armv8.2-a+fp16"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to split out the native f16 path? Right now, it will potentially SIGILL on ARM hardware without f16 support. If possible, I'd recommend something like this:

  • Move the native f16 impl into a separate source file. Scope the march +fp16 to just this file.
  • Add a variant that does the f16<->f32 conversion in software.
  • In the top-level kernel, check hardware support using cpuinfo_has_arm_neon_fp16 and route to the implementation.

)
endif()

# Build a library for _optimized_kernels_srcs
#
# optimized_ops_lib: Register optimized ops kernels into Executorch runtime
Expand Down
Loading
Loading