Add tiled_reduce function with Kernel and ConvertTo abstraction
Description
This issue establishes the core computational pipeline for multi-vector distances (Chamfer/MaxSim). It introduces the tiled_reduce function alongside the Kernel and ConvertTo abstractions to handle chunked processing and type conversion. To establish a working baseline quickly, the final reduction step will be hardcoded directly within the initial f32 and f16 implementations.
Tasks
Add
tiled_reducefunction withKernelandConvertToabstractionDescription
This issue establishes the core computational pipeline for multi-vector distances (Chamfer/MaxSim). It introduces the
tiled_reducefunction alongside theKernelandConvertToabstractions to handle chunked processing and type conversion. To establish a working baseline quickly, the final reduction step will be hardcoded directly within the initialf32andf16implementations.Tasks
tiled_reducefunction to manage the chunking and routing of vector dimensions, ensuring compatibility with theDistanceFunctionMuttrait.Kernelabstraction to serve as the interface for hardware-specific distance calculations.ConvertToabstraction to manage type conversions seamlessly within the pipeline.f32andf16variants using these abstractions, temporarily hardcoding the reduction logic directly inside these implementations to validate the pipeline.