Trait bounds for persistence #1229

gz · 2024-01-05T01:21:08Z

DBData now has an additional constraint where we make sure the archived variant of something that implements DBData also satisfies this constraint:

<T as ArchivedDBData>::Repr: Ord + PartialOrd<T>

Implement PartialOrd between Option/ArchivedOption. rkyv/rkyv#448 (merged, not released)
Expose rkyv features as features for chrono users. chronotope/chrono#1368 (merged, not released)
PartialOrd fix for ArchivedVec. rkyv/rkyv#462 (merged, not released)
Expose mutually exclusive rkyv features for size. paupino/rust-decimal#637
Implement traits on the ArchivedDecimal. paupino/rust-decimal#639
size-of: https://github.com/gz/size-of/tree/chrono-fixes (waiting for chrono to be released to make it into a PR)

Ideally we would also have Clone for the Archived types, but I'm not sure it will get merged into rkyv so currently I removed the Clone bound again until this is resolved:

Implement Clone for ArchivedString. rkyv/rkyv#453

Some other things this changes:

gdelt used ArcStr for interned strings until now, but with persistence this wouldn't work anyways so I removed it from dbsp and gdelt
we can't store Rust tuples anymore directly since they don't satisfy the PartialOrd constraint and we can't implement it ourselves due to rust trait rules so we added Tup{2,3,4} types
we can't use isize for weights or usize as types that goes into an Ord* data-structure anymore (since the serialization is platform dependent, so replaced them all with u64/i64)
chrono by default enabled the size-32 feature of rkyv, the PR I sent to them let us enable the size-64 feature which means we can serialize/deserialize things correctly on x86-64

Is this a user-visible change (yes/no): no

blp · 2024-01-05T16:55:43Z

This should allow me to make my storage format faster! Currently it does more deserialization than would be ideal.

I have not looked at this yet.

blp · 2024-01-05T17:00:45Z

@gz Even before I start reading the changes, I can tell how much work this was from the commit message. You had to coordinate with multiple other maintainers and crates, and deal with (presumably) so many compiler trait errors. Thank you so much!

blp · 2024-01-05T17:11:48Z

Just to make sure: ultimately, we will not be using a forked version of rkyv, right? (I assume that's just an intermediate step.)

gz · 2024-01-05T18:45:15Z

Thanks for the kind words <3.. Yea this patch was a bit of a nightmare for sure, glad it's now pretty much ready

ultimately, we will not be using a forked version of rkyv, right?

yes we should be able to get all patches in rkyv .. except for maybe the one that adds Clone to ArchivedString and ArchivedVec but if we need to I feel we can work around that

blp · 2024-01-05T19:04:55Z

yes we should be able to get all patches in rkyv .. except for maybe the one that adds Clone to ArchivedString and ArchivedVec but if we need to I feel we can work around that

That's probably for the best, if needed, because otherwise we can't update dbsp on crates.io.

gz · 2024-01-16T16:53:42Z

@mihaibudiu

so here are the necessary changes for the compiler for my PR:
high level change is tha some arguments that were a tuple before now need to be called using dbsp::utils::Tup2(a, b) instead of (a, b)

these operators are affected:

group/lag.rs changed to

pub fn lag<OV, PF>(
        &self,
        offset: usize,
        project: PF,
    ) -> Stream<RootCircuit, OrdIndexedZSet<B::Key, Tup2<B::Val, OV>, B::R>>

pub fn lead<OV, PF>(
        &self,
        offset: usize,
        project: PF,
    ) -> Stream<RootCircuit, OrdIndexedZSet<B::Key, Tup2<B::Val, OV>, B::R>>

then in operator/index.rs

    pub fn index<K, V>(&self) -> Stream<C, OrdIndexedZSet<K, V, CI::R>>
    where
        K: DBData,
        V: DBData,
        CI: BatchReader<Key = Tup2<K, V>, Val = (), Time = ()>,

    pub fn index_generic<CO>(&self) -> Stream<C, CO>
    where
        CI: BatchReader<Key = Tup2<CO::Key, CO::Val>, Val = (), Time = (), R = CO::R>,
        CO: Batch<Time = ()>,

    pub fn index_with<K, V, F>(&self, index_func: F) -> Stream<C, OrdIndexedZSet<K, V, CI::R>>
    where
        CI: BatchReader<Time = (), Val = ()>,
        F: Fn(&CI::Key) -> Tup2<K, V> + Clone + 'static,
        K: DBData,
        V: DBData,

impl<CI, CO> UnaryOperator<CI, CO> for Index<CI, CO>
where
    CO: Batch<Time = ()>,
    CI: BatchReader<Key = Tup2<CO::Key, CO::Val>, Val = (), Time = (), R = CO::R>,
{

impl<CI, CO, F> UnaryOperator<CI, CO> for IndexWith<CI, CO, F>
where
    CO: Batch<Time = ()>,
    CI: BatchReader<Val = (), Time = (), R = CO::R>,
    F: Fn(&CI::Key) -> Tup2<CO::Key, CO::Val> + 'static,

operators/sample.rs:

    pub fn stream_sample_unique_key_vals(
        &self,
        sample_size: &Stream<RootCircuit, usize>,
    ) -> Stream<RootCircuit, OrdZSet<(B::Key, B::Val), B::R>>


impl<T> BinaryOperator<T, usize, OrdZSet<Tup2<T::Key, T::Val>, T::R>> for SampleUniqueKeyVals<T>

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>

Also generate dbsp tuples using the macro. Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

gz · 2024-01-18T02:12:22Z

will merge this so it is out of my sight

blp · 2024-01-18T02:30:34Z

Hurray! 🎉

gz force-pushed the trait-boundsv2 branch from 2872974 to dbb7bc7 Compare January 11, 2024 21:33

ryzhyk approved these changes Jan 11, 2024

View reviewed changes

gz mentioned this pull request Jan 16, 2024

Persistent batch and trace types. #1250

Merged

gz force-pushed the trait-boundsv2 branch from d74871d to a5c4634 Compare January 16, 2024 05:53

gz force-pushed the trait-boundsv2 branch from a5c4634 to e6a1b27 Compare January 16, 2024 19:11

gz and others added 5 commits January 17, 2024 16:54

Small step for humanity, small step for storage.

9fd5586

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>

Nexmark refactoring.

33d57d6

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>

Use patched 3rd libraries everywhere.

68fdb70

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>

Integrate tuple macro in dbsp.

d3800e0

Also generate dbsp tuples using the macro. Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>

Adapt compiler to use new DBSP APIs

0e3c994

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

gz force-pushed the trait-boundsv2 branch from 6b757b0 to 0e3c994 Compare January 18, 2024 01:01

gz marked this pull request as ready for review January 18, 2024 02:12

gz merged commit db75fa0 into main Jan 18, 2024
5 checks passed

gz deleted the trait-boundsv2 branch January 18, 2024 02:12

gz mentioned this pull request Jan 18, 2024

Revert patched 3rd party libraries #1271

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trait bounds for persistence #1229

Trait bounds for persistence #1229

gz commented Jan 5, 2024 •

edited

blp commented Jan 5, 2024

blp commented Jan 5, 2024

blp commented Jan 5, 2024

gz commented Jan 5, 2024 •

edited

blp commented Jan 5, 2024

gz commented Jan 16, 2024 •

edited

gz commented Jan 18, 2024

blp commented Jan 18, 2024

Trait bounds for persistence #1229

Trait bounds for persistence #1229

Conversation

gz commented Jan 5, 2024 • edited

blp commented Jan 5, 2024

blp commented Jan 5, 2024

blp commented Jan 5, 2024

gz commented Jan 5, 2024 • edited

blp commented Jan 5, 2024

gz commented Jan 16, 2024 • edited

gz commented Jan 18, 2024

blp commented Jan 18, 2024

gz commented Jan 5, 2024 •

edited

gz commented Jan 5, 2024 •

edited

gz commented Jan 16, 2024 •

edited