Add tiled_reduce function with Kernel and ConvertTo abstraction (f16 and f32 impls with hardcoded reduce)

# Add `tiled_reduce` function with `Kernel` and `ConvertTo` abstraction

## Description
This issue establishes the core computational pipeline for multi-vector distances (Chamfer/MaxSim). It introduces the `tiled_reduce` function alongside the `Kernel` and `ConvertTo` abstractions to handle chunked processing and type conversion. To establish a working baseline quickly, the final reduction step will be hardcoded directly within the initial `f32` and `f16` implementations.

## Tasks
- [ ] Implement the `tiled_reduce` function to manage the chunking and routing of vector dimensions, ensuring compatibility with the `DistanceFunctionMut` trait.
- [ ] Define the `Kernel` abstraction to serve as the interface for hardware-specific distance calculations.
- [ ] Define the `ConvertTo` abstraction to manage type conversions seamlessly within the pipeline.
- [ ] Implement the `f32` and `f16` variants using these abstractions, temporarily hardcoding the reduction logic directly inside these implementations to validate the pipeline.
- [ ] Ensure the resulting implementation functions correctly within the standalone API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tiled_reduce function with Kernel and ConvertTo abstraction (f16 and f32 impls with hardcoded reduce) #988

Add `tiled_reduce` function with `Kernel` and `ConvertTo` abstraction

Description

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add tiled_reduce function with Kernel and ConvertTo abstraction (f16 and f32 impls with hardcoded reduce) #988

Description

Add tiled_reduce function with Kernel and ConvertTo abstraction

Description

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add `tiled_reduce` function with `Kernel` and `ConvertTo` abstraction