-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primary key support #826
Primary key support #826
Conversation
.into_iter() | ||
.chain(input_batches().into_iter()) | ||
.chain(input_batches().into_iter().map(move |batch| { | ||
let mut expected_batches = input_batches().into_iter().chain(input_batches()).chain( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous version is more readable
@@ -1,6 +1,7 @@ | |||
//! Reference implementation of batch and trace traits for use in testing. | |||
//! | |||
//! So far, only methods/traits used in tests have been implemented. | |||
#![allow(clippy::type_complexity)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have clippy rules changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and I also ran it on parts of the code not previously linted.
/// * `KD` - Key type in the input byte stream. Keys will get deserialized into instances of `KD` and then converted to `K`. | ||
/// * `V` - value type of the input collection. | ||
/// * `VD` - Value type in the input byte stream. Values will get deserialized into instances of `VD` and then converted to `K`. | ||
pub fn register_input_map<K, KD, V, VD, R, F>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is R used for weights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
historical reasons. Inherited it from Frank's code.
/// behaves like a map (i.e., has exactly one key with weight 1 per value) | ||
/// to the catalog. | ||
/// | ||
/// Creates neighborhood and quantiles circuits. Drops the key component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these extra things should be optional. but I said that already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and it's not hard to make it optional. We might add this feature soon. Ideally, there would be a way to enable debugging features like this at runtime, but we don't yet have the machinery for this.
/// | ||
/// Equivalent to `self.map(|(k, v)| (k, v)).stream_sample_keys()`, | ||
/// but is more efficient. | ||
#[allow(clippy::type_complexity)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just make a type alias for OrdZSet<(B::Key, B::Val), B::R>
and reuse it across the various functions
type ZSet<B> = OrdZSet<(B::Key, B::Val), B::R>
where
B: BatchReader<Time = ()>;
let mut builder = | ||
<<OrdZSet<_, _> as Batch>::Builder>::with_capacity((), sample_with_vals.len()); | ||
for key in sample_with_vals.into_iter() { | ||
builder.push((key, HasOne::one())); | ||
} | ||
builder.done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably optimize this since OrdZSet
holds the keys and weights in different vecs, sample_with_vals
is already in the proper form, we'd just need to fill another vec with ones (same goes for stream_unique_key_val_quantiles()
with some tweaks) and then call OrdZSet::from_columns(sample_with_vals, ones)
. If the collection we sample from is consolidated then our values should also be sorted, unique and non-zero (and therefore valid zsets)
crates/dbsp/src/operator/sample.rs
Outdated
trace_handle, | ||
)) | ||
} | ||
|
||
use proptest::{collection::vec, prelude::*}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use should be at the top of the module
r#"{"insert":{"id":1, "s": "1"}} | ||
{"insert":{"id":2, "s": "2"}}"# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use the json macro + .to_string()
for these json literals, could make things nicer (especially the following json literals)
Added a method to register input streams created using `add_input_map` in the catalog. Such a stream represents a table with a unique or primary key and upsert modification semantics. Entries are deleted by key and updated or modified by value (inserting a value whose key already exists in the collection removes the existing value associated with this key). In addition to the input stream, the caller must provide a function to extract keys from values. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Added an integration test to test primary key tables end-to-end. Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Thanks, @Kixiron , I applied most of the suggestions. |
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Support for deleting and updating records by primary key (see #772).
This PR implements upsert update semantics for tables with primary keys, specifically:
TODOs:
Is this a user-visible change (yes/no): yes