Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to compute updates to a set of results when sources change #4

Open
mitar opened this issue Jan 7, 2019 · 7 comments
Open

Comments

@mitar
Copy link

mitar commented Jan 7, 2019

So I am doing something similar, but it is not really incremental updates to materialized view. I want to have reactive query and send updates to the query results to the client. So in some way I would prefer if the logic which computes updates to materialized view could be abstracted out. So that there would be a way to monitor source tables and when they change I would get information how a materialized view would be updated, but not really update any materialized view. I can then update my cache and send information about the update also to the client.

@ntqvinh
Copy link
Owner

ntqvinh commented Jan 7, 2019

Thanks.

You can find the information here https://link.springer.com/article/10.1134/S0361768816050066 (i don't know if i can share it or not!!! but you can download if you have account). It is about generating trigger functions and triggers as well in PL/PGSQL, but it is useful to understand the algorithm and the codes of the tool (generator) and the being-generated triggers/trigger functions.

You can build the tool, and the try the tool as provided example to generate the triggers (and trigger functions) for IVM: https://github.com/ntqvinh/PgMvIncrementalUpdate/tree/master/IncSyncUpdate/example.

Best regards,

@mitar
Copy link
Author

mitar commented Jan 7, 2019

Thanks for sharing the link to the paper. I have read it and have few comments.

In the paper you state:

In PostgreSQL, trigger for statement do not see changed records. Only trigger for each row can see the changed one and processes only one record. If n records of a table are inserted, updated or deleted by one command, triggers for each row will be fired n times.

This is not true anymore. Since PostgreSQL 10 statement triggers can define "transition relations" where you can get all changed rows at once. See documentation:

The REFERENCING option enables collection of transition relations, which are row sets that include all of the rows inserted, deleted, or modified by the current SQL statement. This feature lets the trigger see a global view of what the statement did, not just one row at a time. This option is only allowed for an AFTER trigger that is not a constraint trigger; also, if the trigger is an UPDATE trigger, it must not specify a column_name list. OLD TABLE may only be specified once, and only for a trigger that can fire on UPDATE or DELETE; it creates a transition relation containing the before-images of all rows updated or deleted by the statement. Similarly, NEW TABLE may only be specified once, and only for a trigger that can fire on UPDATE or INSERT; it creates a transition relation containing the after-images of all rows updated or inserted by the statement.

Also, in the paper you claim that you use REMOVE and INSERT instead of UPDATE on changed rows? This is surprising to me because in my benchmarks, when I was optimizing current REFRESH MATERIALIZED VIEW CONCURRENTLY I found out that using UPDATE is better than doing REMOVE and INSERT. I tested the case where one column value is changed. Removing whole row and inserting it back with new column value was slower than just modifying that column value with an update.

@ntqvinh
Copy link
Owner

ntqvinh commented Jan 8, 2019

Many thanks for information.

We (i and another researcher) started the first codes in 2002. Those codes generate triggers in PL/PGSQL for undertaking incremental updates of MV based on SPJ queries. In 2007, i started the new codes from scratch adding support of aggregations. It took me too much time to publish the paper. Only for-each-row triggers in PostgreSQL provide changing/changed rows when the paper was accepted. It was changing time period for that feature, so that i would do mistake.

Anyway, it it easy to change the codes to support bulk processing.

Concern REFRESH MATERIALIZED VIEW CONCURRENTLY, as i know, it do full refresh, so that it clear the MV table, then re-execute 'behind' query and fill the result to MV table. So that, if you divide update into delete and then insert, you execute the query two time instead of once.

@ntqvinh
Copy link
Owner

ntqvinh commented Jan 8, 2019

It is for MVs with aggregations. I suggested also some optimizations so that not necessary to divide update into insert and delete.

For SPJ MVs, because we suggest to add at least one key of the MV query, so that we can do insert/update/delete directly on the MV according to the changes in base tables without division of update into delete and then insert.

@mitar
Copy link
Author

mitar commented Jan 8, 2019

Concern REFRESH MATERIALIZED VIEW CONCURRENTLY, as i know, it do full refresh, so that it clear the MV table, then re-execute 'behind' query and fill the result to MV table. So that, if you divide update into delete and then insert, you execute the query two time instead of once.

So REFRESH MATERIALIZED VIEW currently runs the query, stores it into a new table, and then swaps the old table with new table.

REFRESH MATERIALIZED VIEW CONCURRENTLY runs the query, stores it into a temporary table, computes diff between old and new table, and then:

  • in the past: DELETEs all different rows, and INSERTs new and changed rows
  • in my patch (after testing, which showed that it performs faster when only one column is changed): DELETEs all rows which are not in the new table anymore, UPDATEs all changed rows, and INSERTs all new rows

So in my testing, the second approach of doing UPDATE for changed rows showed as faster then DELETE and INSERT. Also, for any triggers on materialized view it was better because it was clearer what is the change which is happening to data.

Do you plan any updates to this project, for example, updates to take into the account new "transition relations" feature?

@ntqvinh
Copy link
Owner

ntqvinh commented Jan 10, 2019

Dear Mitar,

You have very good job. Congratulation.

Yes, we are now working on recursive MV with CTE and query rewriting. We will update the codes to support triggers for statement shortly.

Every collaborations are welcome....

@mitar
Copy link
Author

mitar commented Jan 10, 2019

Great!

I will first have to get this code to run. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants