Map semantics for tables with primary keys. #772

ryzhyk · 2023-09-25T03:10:40Z

We currently use set semantics for all input tables, i.e., duplicate insertions and deletions are ignored and all weights are 1.

In many applications, including CDC, we need to delete or update records by key. This is already supported by DBSP via the add_input_map method. What's missing is:

Implementation plan:

The text was updated successfully, but these errors were encountered:

ryzhyk · 2023-09-25T19:48:23Z

A more detailed summary of proposed JIT input handle API changes.

The current API in facade/handle.rs:

pub struct JsonZSetHandle {
    handle: CollectionHandle<Row, i32>,
    deserialize_fn: DeserializeJsonFn,
    vtable: &'static VTable,
    updates: Vec<(Row, i32)>,
}

The first thing we probably want to do is generalize this type so it works for all formats, not just JSON. This requires a generalized definition of DeserializeFn that hides serde_json::Value inside a JSON-specific closure.

Next, we add two more types of handles:

pub struct SetHandle {
    handle: CollectionHandle<Row, bool>,
    deserialize_fn: DeserializeFn,
    vtable: &'static VTable,
    updates: Vec<(Row, bool)>,
}

pub struct MapHandle {
    handle: CollectionHandle<Row, Option<Row>>,
    deserialize_fn: DeserializeFn,
    vtable: &'static VTable,
    updates: Vec<(Row, Option<Row>)>,
    key_func: ????
}

key_func is the tricky part here. It is needed to insert new (key, value) pairs to the map. In practice, input data contains only the value, and the key is implicitly defined by a subset of columns of the value. The key_func implements this value-to-key mapping. Information needed to generate this function must be included in the SourceMap deserialization demand.

ryzhyk · 2023-10-11T04:05:02Z

Moving to the next milestone (we still need to integrate #866 with the SQL compiler)

ryzhyk · 2023-10-11T21:06:24Z

@mihaibudiu , some examples of the new IR format from #866

"Source": {
        "layout": {
            "Map": [3, 2]
        },
        "kind": "ZSet",
        "table": "T"
  }

  "Source": {
        "layout": { "Set": 1 },
        "kind": "ZSet",
        "table": "T"
  }

ryzhyk · 2023-10-11T21:12:47Z

@mihaibudiu , one more TODO for the SQL compiler that I missed: we need to add primary key information to the schema file, e.g.,

 {
    "name" : "PART",
    "fields" : [ {
      "name" : "ID",
      "case_sensitive" : false,
      "columntype" : {
        "type" : "BIGINT",
        "nullable" : false
      }
    }, {
      "name" : "NAME",
      "case_sensitive" : false,
      "columntype" : {
        "type" : "VARCHAR",
        "nullable" : true,
        "precision" : -1
      }
    } ],
   "primary_key": ["ID"]
  }

The order of column names in the primary key definition must match the layout you generate for JIT.

mihaibudiu · 2023-10-20T16:59:58Z

I think #891 completes the SQL compiler side of this issue.

ryzhyk added SQL compiler Related to the SQL compiler JIT adapters Issues related to the adapters crate labels Sep 25, 2023

ryzhyk added this to the v0.1.5 milestone Sep 25, 2023

ryzhyk assigned ryzhyk, mihaibudiu and Kixiron Sep 25, 2023

This was referenced Sep 25, 2023

Support upsert semantics for tables. #455

Closed

Support add_input_set and add_input_map operators. #730

Closed

ryzhyk mentioned this issue Oct 3, 2023

Primary key support #826

Merged

Kixiron mentioned this issue Oct 10, 2023

[JIT] Set & Map Sources #866

Merged

ryzhyk modified the milestones: October 10, 2023, October 24, 2023 Oct 11, 2023

mihaibudiu mentioned this issue Oct 12, 2023

Emit primary key information as metadata from compiler #877

Merged

ryzhyk modified the milestones: October 24, 2023, November 07, 2023 Oct 24, 2023

ryzhyk closed this as completed Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map semantics for tables with primary keys. #772

Map semantics for tables with primary keys. #772

ryzhyk commented Sep 25, 2023 •

edited

Loading

ryzhyk commented Sep 25, 2023

ryzhyk commented Oct 11, 2023

ryzhyk commented Oct 11, 2023

ryzhyk commented Oct 11, 2023

mihaibudiu commented Oct 20, 2023

Map semantics for tables with primary keys. #772

Map semantics for tables with primary keys. #772

Comments

ryzhyk commented Sep 25, 2023 • edited Loading

ryzhyk commented Sep 25, 2023

ryzhyk commented Oct 11, 2023

ryzhyk commented Oct 11, 2023

ryzhyk commented Oct 11, 2023

mihaibudiu commented Oct 20, 2023

ryzhyk commented Sep 25, 2023 •

edited

Loading