Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Fixed error in passing sliced arrays via FFI #564

Merged
merged 4 commits into from Nov 1, 2021
Merged

Conversation

jorgecarleitao
Copy link
Owner

This PR closes #538 by re-aligning buffers prior to exporting to the C data interface.

Background

Given two arrays, a and b, on which a is sliced and b has no validity, the computation of equal(a, b) is optimally done by equating the values (bitmap) and cloning the validity of a.

The c data interface allows arrays to be sliced at zero cost via a constant "offset", denoting by how much an array is offsetted by, but it does not allow individual bitmaps to be sliced. Consequently, implementations of the arrow format must "de-offset" the bitmaps to align them.

In the example above, this means that the validity of a cannot be combined with the resulting values bitmap because it has a different offset. Instead, a new validity must be computed that has a zero offset (and thus aligned with the offset of the resulting values bitmap).

This crate currently supports the former compute model because it tracks individual bitmap and buffer offsets. However, doing so forbids us from passing the bitmaps as is via the c data interface, as the c data interface only accepts a single offset per array (and not per buffer).

Solution

This PR de-offsets the bitmaps when they are to be passed to the C data interface, thereby delaying when to de-offset happens: at the c-data interface boundary as opposed to at every compute operation.

Note that this is a relatively small set of use-cases because most of the times the arrays are not sliced.

Alternatives

The alternative is to fallback to what other implementations do: de-offset the buffers on every operation. However, this resulted in a 15% performance drop in some of the kernels and a much more complex implementation of kernels since they all have to remember to de-offset bitmaps.

@jorgecarleitao jorgecarleitao added the bug Something isn't working label Nov 1, 2021
@codecov
Copy link

codecov bot commented Nov 1, 2021

Codecov Report

Merging #564 (e51c918) into main (5fc843d) will decrease coverage by 0.11%.
The diff coverage is 43.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #564      +/-   ##
==========================================
- Coverage   78.92%   78.81%   -0.12%     
==========================================
  Files         382      395      +13     
  Lines       24276    25075     +799     
==========================================
+ Hits        19160    19763     +603     
- Misses       5116     5312     +196     
Impacted Files Coverage Δ
src/array/binary/mod.rs 81.01% <ø> (+2.26%) ⬆️
src/array/boolean/ffi.rs 0.00% <0.00%> (ø)
src/array/boolean/mod.rs 69.38% <ø> (-0.62%) ⬇️
src/array/dictionary/mod.rs 56.89% <ø> (+10.22%) ⬆️
src/array/fixed_size_binary/ffi.rs 0.00% <0.00%> (ø)
src/array/fixed_size_binary/mod.rs 53.73% <ø> (+3.73%) ⬆️
src/array/fixed_size_list/ffi.rs 0.00% <0.00%> (ø)
src/array/fixed_size_list/mod.rs 65.00% <ø> (+5.29%) ⬆️
src/array/map/ffi.rs 0.00% <0.00%> (ø)
src/array/mod.rs 58.77% <ø> (+2.63%) ⬆️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fc843d...e51c918. Read the comment docs.

@jorgecarleitao jorgecarleitao merged commit e4c37f9 into main Nov 1, 2021
@jorgecarleitao jorgecarleitao deleted the fix_ffi2 branch November 1, 2021 19:42
@jorgecarleitao jorgecarleitao changed the title Fix FFI of sliced arrays Fixed error in passing sliced arrays via FFI Nov 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Buffer Cannot Exceed Existing Length - Pyo3
1 participant