Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Made bitmap not cache null count #563

Merged
merged 1 commit into from
Nov 2, 2021
Merged

Made bitmap not cache null count #563

merged 1 commit into from
Nov 2, 2021

Conversation

jorgecarleitao
Copy link
Owner

This PR removes the cached null_count on Bitmap, making the computation only happen on request.

This makes the struct simpler and removes the need to compute the count even if the count will not be used.

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Oct 31, 2021
@jorgecarleitao
Copy link
Owner Author

what do you think @ritchie46 @houqp @Dandandan ?

@codecov
Copy link

codecov bot commented Oct 31, 2021

Codecov Report

Merging #563 (296838f) into main (5fc843d) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #563      +/-   ##
==========================================
+ Coverage   78.92%   78.98%   +0.06%     
==========================================
  Files         382      385       +3     
  Lines       24276    24704     +428     
==========================================
+ Hits        19160    19513     +353     
- Misses       5116     5191      +75     
Impacted Files Coverage Δ
src/bitmap/immutable.rs 84.61% <100.00%> (-1.50%) ⬇️
src/io/ipc/read/array/union.rs 70.49% <0.00%> (-4.51%) ⬇️
src/array/union/mod.rs 79.27% <0.00%> (-0.34%) ⬇️
tests/it/array/mod.rs 100.00% <0.00%> (ø)
tests/it/compute/bitwise.rs 100.00% <0.00%> (ø)
benches/bitwise.rs 0.00% <0.00%> (ø)
src/compute/bitwise.rs 100.00% <0.00%> (ø)
src/array/display.rs 53.47% <0.00%> (+2.37%) ⬆️
src/array/utf8/mutable.rs 88.17% <0.00%> (+2.91%) ⬆️
src/datatypes/mod.rs 81.65% <0.00%> (+4.03%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fc843d...296838f. Read the comment docs.

@ritchie46
Copy link
Collaborator

What do you think about a lazy cache? Same to init_validities(). That we only set the null count at first request. I can imagine that we have long living arrays that are base for many computations.

For instance in Polars, almost every operation starts with a null_count check to see if we can use a fast path.

@jorgecarleitao
Copy link
Owner Author

Doesn't lazy initialization require &mut self?

For instance in Polars, almost every operation starts with a null_count check to see if we can use a fast path.

shouldn't it start with if let Some(bitmap) = array.validity()? Or is it common to have arrays with only null values?

@ritchie46
Copy link
Collaborator

Doesn't lazy initialization require &mut self?

For instance in Polars, almost every operation starts with a null_count check to see if we can use a fast path.

shouldn't it start with if let Some(bitmap) = array.validity()? Or is it common to have arrays with only null values?

The series has potentially multiple chunks, so we check null_count. But indeed, I could add the branch check for the validity and return 0 if it is missing. Thanks, now it should not matter that much.

@ritchie46
Copy link
Collaborator

ritchie46 commented Nov 1, 2021

Just for clarity, I think it is a good enhancement. 👍 :)

Certainly when iterating lists, this saves quite some compute, I think.

@jorgecarleitao jorgecarleitao merged commit f435f1f into main Nov 2, 2021
@jorgecarleitao jorgecarleitao deleted the nulls branch November 2, 2021 16:33
jorgecarleitao added a commit that referenced this pull request Nov 2, 2021
@jorgecarleitao jorgecarleitao restored the nulls branch November 2, 2021 20:10
@jorgecarleitao
Copy link
Owner Author

jorgecarleitao commented Nov 2, 2021

Reverted because it triggered a major regression in polars, so more care is needed here. Thanks to @ritchie46 for pointing it out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants