-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map semantics for tables with primary keys. #772
Comments
A more detailed summary of proposed JIT input handle API changes. The current API in pub struct JsonZSetHandle {
handle: CollectionHandle<Row, i32>,
deserialize_fn: DeserializeJsonFn,
vtable: &'static VTable,
updates: Vec<(Row, i32)>,
} The first thing we probably want to do is generalize this type so it works for all formats, not just JSON. This requires a generalized definition of Next, we add two more types of handles: pub struct SetHandle {
handle: CollectionHandle<Row, bool>,
deserialize_fn: DeserializeFn,
vtable: &'static VTable,
updates: Vec<(Row, bool)>,
}
pub struct MapHandle {
handle: CollectionHandle<Row, Option<Row>>,
deserialize_fn: DeserializeFn,
vtable: &'static VTable,
updates: Vec<(Row, Option<Row>)>,
key_func: ????
}
|
Moving to the next milestone (we still need to integrate #866 with the SQL compiler) |
@mihaibudiu , some examples of the new IR format from #866 "Source": {
"layout": {
"Map": [3, 2]
},
"kind": "ZSet",
"table": "T"
} "Source": {
"layout": { "Set": 1 },
"kind": "ZSet",
"table": "T"
} |
@mihaibudiu , one more TODO for the SQL compiler that I missed: we need to add primary key information to the schema file, e.g., {
"name" : "PART",
"fields" : [ {
"name" : "ID",
"case_sensitive" : false,
"columntype" : {
"type" : "BIGINT",
"nullable" : false
}
}, {
"name" : "NAME",
"case_sensitive" : false,
"columntype" : {
"type" : "VARCHAR",
"nullable" : true,
"precision" : -1
}
} ],
"primary_key": ["ID"]
} The order of column names in the primary key definition must match the layout you generate for JIT. |
I think #891 completes the SQL compiler side of this issue. |
We currently use set semantics for all input tables, i.e., duplicate insertions and deletions are ignored and all weights are 1.
In many applications, including CDC, we need to delete or update records by key. This is already supported by DBSP via the
add_input_map
method. What's missing is:Implementation plan:
Source
->SourceZSet
,SourceMap
->SourceIndexedZSet
or some other choice of names that will distinguish these existing operators from the new operators (see below)SourceSet
node type backed byCircuit::add_input_set
SourceMap
node type backed byCircuit::add_input_map
SourceIndexedZSet
andSourceMap
need to define schemas for both keys and values. ([JIT] Set & Map Sources #866 )JsonZSetHandle
. We need to addJsonSetHandle
,JsonIndexedZSetHandle
andJsonMapHandle
. Corresponding APIs in a separate comment below. ([JIT] Set & Map Sources #866 )register_input_map
method to the static Rust API (Primary key support #826)struct
definition for the primary key. It will be exactly the same as for the value type, but will only contain primary key columns.Catalog::register_input_map
instead ofregister_input_set
for tables with a primary key. Example (KEY
is the key struct name):SourceMap
node instead ofSourceZSet
(which means the compiler will also need to create a layout for the key type). [SQL] support for tables with primary keys in JIT #891The text was updated successfully, but these errors were encountered: