Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary key support #826

Merged
merged 9 commits into from
Oct 6, 2023
Merged

Primary key support #826

merged 9 commits into from
Oct 6, 2023

Conversation

ryzhyk
Copy link
Contributor

@ryzhyk ryzhyk commented Oct 2, 2023

Support for deleting and updating records by primary key (see #772).

This PR implements upsert update semantics for tables with primary keys, specifically:

  • An insert into such a table automatically (and quietly) deletes any existing record with the same key.
  • To delete a record, the client must only specify the value of the key, not the entire record.

TODOs:

  • Documentation
  • Support primary keys in JIT.

Is this a user-visible change (yes/no): yes

@ryzhyk ryzhyk changed the title Primary key support [Draft] Primary key support Oct 2, 2023
@ryzhyk ryzhyk changed the title [Draft] Primary key support Primary key support Oct 3, 2023
@ryzhyk ryzhyk added SQL compiler Related to the SQL compiler JIT API Distributed system APIs adapters Issues related to the adapters crate User-facing For PRs that lead to Feldera-user visible changes and removed JIT labels Oct 3, 2023
@ryzhyk ryzhyk added this to the October 10, 2023 milestone Oct 3, 2023
@ryzhyk ryzhyk marked this pull request as ready for review October 3, 2023 20:43
.into_iter()
.chain(input_batches().into_iter())
.chain(input_batches().into_iter().map(move |batch| {
let mut expected_batches = input_batches().into_iter().chain(input_batches()).chain(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous version is more readable

@@ -1,6 +1,7 @@
//! Reference implementation of batch and trace traits for use in testing.
//!
//! So far, only methods/traits used in tests have been implemented.
#![allow(clippy::type_complexity)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have clippy rules changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and I also ran it on parts of the code not previously linted.

/// * `KD` - Key type in the input byte stream. Keys will get deserialized into instances of `KD` and then converted to `K`.
/// * `V` - value type of the input collection.
/// * `VD` - Value type in the input byte stream. Values will get deserialized into instances of `VD` and then converted to `K`.
pub fn register_input_map<K, KD, V, VD, R, F>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is R used for weights?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

historical reasons. Inherited it from Frank's code.

/// behaves like a map (i.e., has exactly one key with weight 1 per value)
/// to the catalog.
///
/// Creates neighborhood and quantiles circuits. Drops the key component
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these extra things should be optional. but I said that already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and it's not hard to make it optional. We might add this feature soon. Ideally, there would be a way to enable debugging features like this at runtime, but we don't yet have the machinery for this.

crates/dbsp/src/operator/input.rs Outdated Show resolved Hide resolved
crates/dbsp/src/operator/input.rs Outdated Show resolved Hide resolved
crates/dbsp/src/operator/input.rs Show resolved Hide resolved
///
/// Equivalent to `self.map(|(k, v)| (k, v)).stream_sample_keys()`,
/// but is more efficient.
#[allow(clippy::type_complexity)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just make a type alias for OrdZSet<(B::Key, B::Val), B::R> and reuse it across the various functions

type ZSet<B> = OrdZSet<(B::Key, B::Val), B::R>
where
    B: BatchReader<Time = ()>;

crates/dbsp/src/operator/sample.rs Outdated Show resolved Hide resolved
crates/dbsp/src/operator/sample.rs Outdated Show resolved Hide resolved
Comment on lines +310 to +317
let mut builder =
<<OrdZSet<_, _> as Batch>::Builder>::with_capacity((), sample_with_vals.len());
for key in sample_with_vals.into_iter() {
builder.push((key, HasOne::one()));
}
builder.done()
Copy link
Contributor

@Kixiron Kixiron Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably optimize this since OrdZSet holds the keys and weights in different vecs, sample_with_vals is already in the proper form, we'd just need to fill another vec with ones (same goes for stream_unique_key_val_quantiles() with some tweaks) and then call OrdZSet::from_columns(sample_with_vals, ones). If the collection we sample from is consolidated then our values should also be sorted, unique and non-zero (and therefore valid zsets)

trace_handle,
))
}

use proptest::{collection::vec, prelude::*};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use should be at the top of the module

Comment on lines +1004 to +1005
r#"{"insert":{"id":1, "s": "1"}}
{"insert":{"id":2, "s": "2"}}"#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the json macro + .to_string() for these json literals, could make things nicer (especially the following json literals)

Leonid Ryzhyk and others added 8 commits October 6, 2023 10:58
Added a method to register input streams created using `add_input_map`
in the catalog.  Such a stream represents a table with a unique or
primary key and upsert modification semantics.  Entries are deleted by
key and updated or modified by value (inserting a value whose key
already exists in the collection removes the existing value associated
with this key).

In addition to the input stream, the caller must provide a function to
extract keys from values.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@gmail.com>
Added an integration test to test primary key tables end-to-end.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
@ryzhyk
Copy link
Contributor Author

ryzhyk commented Oct 6, 2023

Thanks, @Kixiron , I applied most of the suggestions.

Signed-off-by: Leonid Ryzhyk <leonid@feldera.com>
@ryzhyk ryzhyk merged commit a462a63 into main Oct 6, 2023
5 checks passed
@ryzhyk ryzhyk deleted the primary_key branch October 6, 2023 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapters Issues related to the adapters crate API Distributed system APIs SQL compiler Related to the SQL compiler User-facing For PRs that lead to Feldera-user visible changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants