Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: replace f32,f64 with ordered_float #119

Merged
merged 1 commit into from
Jul 27, 2023

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Jul 27, 2023

simple solution for #116.

Why I introduce ordred_float diretly

The order: spec said we need to use -NaN < -Infinity < -value < -0 < 0 < value < Infinity < NaN.

I find that it's unapprpriate to implement this order in the custom float directly. If we implement -0 < 0 for the custom float, which means that CustomFloat(-0) < CustomFloat(0) will return true. Then what result of CustomFloat(-0) == CustomFloat(0) should return?:

  1. true => it's confict with CustomFloat(-0) < CustomFloat(0)
  2. flase => why -0 is unequal 0

I think this order is specific for the iceberg partition order. So this order rule should implement as a custom order trait or implement for AnyValue. E.g.

// implement as a custom order trait
impl PartitionOrder for OrderedF32 {
    ...
}

The hash: iceberg has hash specs for float value that: hashLong(doubleToLongBits(double(v))

Maybe we should do something extra here instead of store StructValue in hashmap directly.

I find that this hash is specific for the bucket transform. https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements:~:text=For%20hash%20function%20details%20by%20type%2C%20see%20Appendix%20B. So we also can implemment as specific hash trait rather than general hash trait. E.g.

impl BucketHash for OrderedF32 {
    // convert it to a 32bit hash
    fn hash() -> u32 {
        
    }
}

@ZENOTME ZENOTME requested review from Xuanwo and liurenjie1024 and removed request for Xuanwo July 27, 2023 06:55
@ZENOTME ZENOTME marked this pull request as ready for review July 27, 2023 07:09
@liurenjie1024
Copy link
Contributor

LGTM, in java we can define comparator/hasher to customized logic. In rust, the partition value doesn't have to be StructValue, but a newtype PartitionValue(StructValue), then we can define comparison and hash logic for partition.

@liurenjie1024 liurenjie1024 merged commit 1da1e0c into icelake-io:main Jul 27, 2023
3 checks passed
@ZENOTME ZENOTME deleted the ordered_float branch July 27, 2023 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants