Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle ip adresses in term aggregation #2319

Merged
merged 2 commits into from
Mar 14, 2024
Merged

handle ip adresses in term aggregation #2319

merged 2 commits into from
Mar 14, 2024

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Feb 23, 2024

Stores IpAdresses during the segment term aggregation via u64 representation
and convert to u128(IpV6Adress) via downcast when converting to intermediate results.

Enable Downcasting on ColumnValues
Expose u64 variant for u128 encoded data via open_u64_lenient method.
Remove lifetime in VecColumn, to avoid 'static lifetime requirement coming
from downcast trait.

#1689

@PSeitz PSeitz force-pushed the term_agg_ip branch 2 times, most recently from 564da58 to 7a86ce1 Compare February 23, 2024 04:32
Stores IpAdresses during the segment term aggregation via u64 representation
and convert to u128(IpV6Adress) via downcast when converting to intermediate results.

Enable Downcasting on `ColumnValues`
Expose u64 variant for u128 encoded data via `open_u64_lenient` method.
Remove lifetime in VecColumn, to avoid 'static lifetime requirement coming
from downcast trait.
/// # Notice
/// In case there are new codecs added, check for usages of `CompactSpaceDecompressorU64` and
/// also handle the new codecs.
pub fn open_u128_as_u64(mut bytes: OwnedBytes) -> io::Result<Arc<dyn ColumnValues<u64>>> {
Copy link
Collaborator

@fulmicoton fulmicoton Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename the method? Right now it suggest it is a simple coercion? In reality you are returning the compact space.
Maybe

open_u128_compact_space_as_u64

or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to open_u128_as_compact_u64

@@ -63,7 +63,10 @@ impl BlockwiseLinearEstimator {
if self.block.is_empty() {
return;
}
let line = Line::train(&VecColumn::from(&self.block));
let column = VecColumn::from(std::mem::take(&mut self.block));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this needed? I don't understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending the ColumnValues trait with another trait (any trait, even an empty one) requires lifetimes to be 'static. This requirement is no problem when working on the values instead of references, so I changed VecColumn to a owned version.

.accessor
.values
.clone()
.downcast_arc::<CompactSpaceU64Accessor>()
Copy link
Collaborator

@fulmicoton fulmicoton Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that not doing the segment collection on u128 is the right way here, but I am surprised downcasting logic is required here. Are there really no better way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem here is with regards to generics.
We need a codec specific method on a specific trait variant ColumnValues<u64>.

We could return ColumnValuesU64Ext instead of ColumnValues<u64>, which would include the method

trait U64ColumnValuesExt: ColumnValues<u64> {
    compact_to_u128(&self, compact: u32) -> u128 {
        unimplemented!() // unimplemented for all expect compact space
    }
}

Having codec specific methods on a common trait is a little bit weird.
Casting is also not great, but having the mechanism may also allow some performance gains by enabling inlining in some cases.

In both cases we will need to know the underlying codec before calling the codec specific method.

@PSeitz PSeitz merged commit b0e6556 into main Mar 14, 2024
4 checks passed
@PSeitz PSeitz deleted the term_agg_ip branch March 14, 2024 08:41
PSeitz added a commit that referenced this pull request Apr 10, 2024
* handle ip adresses in term aggregation

Stores IpAdresses during the segment term aggregation via u64 representation
and convert to u128(IpV6Adress) via downcast when converting to intermediate results.

Enable Downcasting on `ColumnValues`
Expose u64 variant for u128 encoded data via `open_u64_lenient` method.
Remove lifetime in VecColumn, to avoid 'static lifetime requirement coming
from downcast trait.

* rename method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants