Provide a way to compute updates to a set of results when sources change #4

mitar · 2019-01-07T02:04:59Z

So I am doing something similar, but it is not really incremental updates to materialized view. I want to have reactive query and send updates to the query results to the client. So in some way I would prefer if the logic which computes updates to materialized view could be abstracted out. So that there would be a way to monitor source tables and when they change I would get information how a materialized view would be updated, but not really update any materialized view. I can then update my cache and send information about the update also to the client.

ntqvinh · 2019-01-07T12:15:27Z

Thanks.

You can find the information here https://link.springer.com/article/10.1134/S0361768816050066 (i don't know if i can share it or not!!! but you can download if you have account). It is about generating trigger functions and triggers as well in PL/PGSQL, but it is useful to understand the algorithm and the codes of the tool (generator) and the being-generated triggers/trigger functions.

You can build the tool, and the try the tool as provided example to generate the triggers (and trigger functions) for IVM: https://github.com/ntqvinh/PgMvIncrementalUpdate/tree/master/IncSyncUpdate/example.

Best regards,

mitar · 2019-01-07T19:14:39Z

Thanks for sharing the link to the paper. I have read it and have few comments.

In the paper you state:

In PostgreSQL, trigger for statement do not see changed records. Only trigger for each row can see the changed one and processes only one record. If n records of a table are inserted, updated or deleted by one command, triggers for each row will be fired n times.

This is not true anymore. Since PostgreSQL 10 statement triggers can define "transition relations" where you can get all changed rows at once. See documentation:

The REFERENCING option enables collection of transition relations, which are row sets that include all of the rows inserted, deleted, or modified by the current SQL statement. This feature lets the trigger see a global view of what the statement did, not just one row at a time. This option is only allowed for an AFTER trigger that is not a constraint trigger; also, if the trigger is an UPDATE trigger, it must not specify a column_name list. OLD TABLE may only be specified once, and only for a trigger that can fire on UPDATE or DELETE; it creates a transition relation containing the before-images of all rows updated or deleted by the statement. Similarly, NEW TABLE may only be specified once, and only for a trigger that can fire on UPDATE or INSERT; it creates a transition relation containing the after-images of all rows updated or inserted by the statement.

Also, in the paper you claim that you use REMOVE and INSERT instead of UPDATE on changed rows? This is surprising to me because in my benchmarks, when I was optimizing current REFRESH MATERIALIZED VIEW CONCURRENTLY I found out that using UPDATE is better than doing REMOVE and INSERT. I tested the case where one column value is changed. Removing whole row and inserting it back with new column value was slower than just modifying that column value with an update.

ntqvinh · 2019-01-08T05:03:18Z

Many thanks for information.

We (i and another researcher) started the first codes in 2002. Those codes generate triggers in PL/PGSQL for undertaking incremental updates of MV based on SPJ queries. In 2007, i started the new codes from scratch adding support of aggregations. It took me too much time to publish the paper. Only for-each-row triggers in PostgreSQL provide changing/changed rows when the paper was accepted. It was changing time period for that feature, so that i would do mistake.

Anyway, it it easy to change the codes to support bulk processing.

Concern REFRESH MATERIALIZED VIEW CONCURRENTLY, as i know, it do full refresh, so that it clear the MV table, then re-execute 'behind' query and fill the result to MV table. So that, if you divide update into delete and then insert, you execute the query two time instead of once.

ntqvinh · 2019-01-08T08:38:54Z

It is for MVs with aggregations. I suggested also some optimizations so that not necessary to divide update into insert and delete.

For SPJ MVs, because we suggest to add at least one key of the MV query, so that we can do insert/update/delete directly on the MV according to the changes in base tables without division of update into delete and then insert.

mitar · 2019-01-08T09:00:58Z

Concern REFRESH MATERIALIZED VIEW CONCURRENTLY, as i know, it do full refresh, so that it clear the MV table, then re-execute 'behind' query and fill the result to MV table. So that, if you divide update into delete and then insert, you execute the query two time instead of once.

So REFRESH MATERIALIZED VIEW currently runs the query, stores it into a new table, and then swaps the old table with new table.

REFRESH MATERIALIZED VIEW CONCURRENTLY runs the query, stores it into a temporary table, computes diff between old and new table, and then:

in the past: DELETEs all different rows, and INSERTs new and changed rows
in my patch (after testing, which showed that it performs faster when only one column is changed): DELETEs all rows which are not in the new table anymore, UPDATEs all changed rows, and INSERTs all new rows

So in my testing, the second approach of doing UPDATE for changed rows showed as faster then DELETE and INSERT. Also, for any triggers on materialized view it was better because it was clearer what is the change which is happening to data.

Do you plan any updates to this project, for example, updates to take into the account new "transition relations" feature?

ntqvinh · 2019-01-10T06:48:52Z

Dear Mitar,

You have very good job. Congratulation.

Yes, we are now working on recursive MV with CTE and query rewriting. We will update the codes to support triggers for statement shortly.

Every collaborations are welcome....

mitar · 2019-01-10T06:57:54Z

Great!

I will first have to get this code to run. :-)

mitar mentioned this issue Jan 7, 2019

Use ideas from Incremental View Maintenance to know what has changed tozd/node-reactive-postgres#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a way to compute updates to a set of results when sources change #4

Provide a way to compute updates to a set of results when sources change #4

mitar commented Jan 7, 2019

ntqvinh commented Jan 7, 2019 •

edited

mitar commented Jan 7, 2019 •

edited

ntqvinh commented Jan 8, 2019 •

edited

ntqvinh commented Jan 8, 2019

mitar commented Jan 8, 2019

ntqvinh commented Jan 10, 2019 •

edited

mitar commented Jan 10, 2019

Provide a way to compute updates to a set of results when sources change #4

Provide a way to compute updates to a set of results when sources change #4

Comments

mitar commented Jan 7, 2019

ntqvinh commented Jan 7, 2019 • edited

mitar commented Jan 7, 2019 • edited

ntqvinh commented Jan 8, 2019 • edited

ntqvinh commented Jan 8, 2019

mitar commented Jan 8, 2019

ntqvinh commented Jan 10, 2019 • edited

mitar commented Jan 10, 2019

ntqvinh commented Jan 7, 2019 •

edited

mitar commented Jan 7, 2019 •

edited

ntqvinh commented Jan 8, 2019 •

edited

ntqvinh commented Jan 10, 2019 •

edited