Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a column to the base table should invalidate prepared statements for views #16392

Closed
nyh opened this issue Dec 12, 2023 · 13 comments · Fixed by #16897
Closed

Adding a column to the base table should invalidate prepared statements for views #16392

nyh opened this issue Dec 12, 2023 · 13 comments · Fixed by #16897
Labels
area/materialized views Field-Tier2 Important issues as per FieldEngineering request type/bug
Milestone

Comments

@nyh
Copy link
Contributor

nyh commented Dec 12, 2023

In https://github.com/scylladb/scylla-enterprise/issues/3493#issuecomment-1850936534, @tgrabiec noticed that:

"I've noticed that adding a column to the base table does not invalidate prepared statements for the views. So select * query doesn't see the new column. Only after scylla restarts queries start to see it. So it can happen that depending on which host you contact, you may see the new column or not."

@mykaul
Copy link
Contributor

mykaul commented Jan 9, 2024

This sounds quite severe, no? I think it should be prioritized. @nyh - any rough estimates on the complexity of a fix for this?

@mykaul mykaul added the Field-Tier2 Important issues as per FieldEngineering request label Jan 9, 2024
@mykaul mykaul added this to the 6.1 milestone Jan 9, 2024
@tzach
Copy link
Contributor

tzach commented Jan 15, 2024

@nyh I understand the issue is visable when creating a MV with "SELECT *" and using "SELECT *" from MV.
Is it the case?

This remind me of https://stackoverflow.com/questions/3639861/why-is-select-considered-harmful

@mykaul
Copy link
Contributor

mykaul commented Jan 16, 2024

@nyh - ping - do we have any thoughts on this? Just an estimate would be great - assigning it to you (temporarily) for consideration.

@nyh
Copy link
Contributor Author

nyh commented Jan 17, 2024

This sounds quite severe, no? I think it should be prioritized. @nyh - any rough estimates on the complexity of a fix for this?

As I said I just quoted @tgrabiec's statement about this being a bug - I didn't reproduce it myself and I don't know, if this bug exists, how severe it is. I'm not really an expert in prepared statement invalidation and why was important to have it for non-views - I assume the same reasons why we need prepared-statement-invalidation would apply to views (@mykaul maybe you know who the expert on this might be).

@mykaul
Copy link
Contributor

mykaul commented Jan 17, 2024

@eliransin - is the above in your team? Who can look at this?

@mykaul mykaul unassigned nyh Jan 17, 2024
@eliransin
Copy link
Contributor

First step IMO is to reproduce the issue (if even exists):
IIUC this should reproduce it quite easily:

  1. Start a 3 node cluster
  2. Create a materialized view that "select *" the rows
  3. Create a prepared statement on the view that "select *" the row
  4. Write a row
  5. Issue the prepared statement from 3 and validate that we get 1 row and that it is correct
  6. Add a column to the base table
  7. Write another row with the new column
  8. Issue the prepared statement from 3 and validate that we get 2 row and that they have the new value set where it should.

Will make an attempt on this.

@eliransin
Copy link
Contributor

eliransin commented Jan 17, 2024

Reproduced with cql-pytest:

def test_mv_prepared_statement_with_altered_base(scylla_only, cql, test_keyspace):
    with new_test_table(cql, test_keyspace, 'id int PRIMARY KEY, v1 int') as base:
        with new_materialized_view(cql, table=base, select='*', pk='id', where='id IS NOT NULL') as view:
            base_query = cql.prepare(f"SELECT * FROM {base} WHERE id=?")
            view_query = cql.prepare(f"SELECT * FROM {view} WHERE id=?")
            cql.execute(f"INSERT INTO {base} (id,v1) VALUES (0,0)")
            assert cql.execute(base_query,[0]) == cql.execute(view_query,[0])
            cql.execute(f"ALTER TABLE {base} ADD (v2 int)")
            cql.execute(f"INSERT INTO {base} (id,v1,v2) VALUES (1,1,1)")
            assert list(cql.execute(base_query,[1])) == list(cql.execute(view_query,[1]))

This fails with:

>               assert list(cql.execute(base_query,[1])) == list(cql.execute(view_query,[1]))
E               assert [Row(id=1, v1=1, v2=1)] == [Row(id=1, v1=1)]
E                 At index 0 diff: Row(id=1, v1=1, v2=1) != Row(id=1, v1=1)

Working on a fix

eliransin pushed a commit to eliransin/scylla that referenced this issue Jan 17, 2024
This bug was discovered as part of another bug investigation,
if the base table changes, for example a row is added or removed,
the prepared statements of the materialized views doesn't change,
this will result in returning incomplete information until the prepared
statements cache is invalidated for some other reason or until the node
is rebooted.
This patch fixes it by accounting for the fact that a materialized view
select query also depends on it's base table.

Fixes scylladb#16392

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
eliransin pushed a commit to eliransin/scylla that referenced this issue Jan 17, 2024
Issue scylladb#16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
@eliransin
Copy link
Contributor

eliransin commented Jan 17, 2024

Fix submitted 🙂

@eliransin
Copy link
Contributor

@nyh had very good questions about why didn't this happen automatically and I think he is right. This fix in #16830 will work. However, it is not the correct point for fixing this so I will try to figure it out using the correct point this time.
@mykaul you asked for an estimation, I think it will take a day or two to debug and lock in on the actual fix.

eliransin pushed a commit to eliransin/scylla that referenced this issue Jan 21, 2024
Issue scylladb#16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
eliransin pushed a commit to eliransin/scylla that referenced this issue Jan 21, 2024
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes scylladb#16392

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
eliransin pushed a commit to eliransin/scylla that referenced this issue Jan 21, 2024
Issue scylladb#16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
avikivity added a commit that referenced this issue Jan 21, 2024
…nges.' from Eliran Sinvani

When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes #16392

This series also adds a test which fails without this fix and passes when the fix is applied.

Closes #16897

* github.com:scylladb/scylladb:
  Add test for mv prepared statements invalidation on base alter
  query processor: treat view changes at least as table changes
@eliransin
Copy link
Contributor

This bug exists since very long ago, so all live versions needs a backport 🙂 (I checked 5.1 and above).
It looks like the fix itself is going to merge cleanly since this code looks untouched on older versions.
The test is not guarantied to merge cleanly since there was a lot of development around test.py during the past months.

@mykaul mykaul modified the milestones: 6.1, 6.0 Jan 23, 2024
@mykaul mykaul added the backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed label Jan 23, 2024
@mykaul mykaul added the backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed label Jan 23, 2024
@mykaul
Copy link
Contributor

mykaul commented Jan 23, 2024

@scylladb/scylla-maint please backport

@nyh
Copy link
Contributor Author

nyh commented Jan 23, 2024

@scylladb/scylla-maint please backport

I'll do it now.

nyh pushed a commit that referenced this issue Jan 23, 2024
…nges.' from Eliran Sinvani

When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes #16392

This series also adds a test which fails without this fix and passes when the fix is applied.

Closes #16897

* github.com:scylladb/scylladb:
  Add test for mv prepared statements invalidation on base alter
  query processor: treat view changes at least as table changes

(cherry picked from commit 5810396)
@nyh
Copy link
Contributor Author

nyh commented Jan 23, 2024

Backported to next-5.4 (df18433) - both the code and test backported cleanly (but this took ages to do).
Backported to next-5.2 (351d6d6) - both the code and test backported with only minor conflicts needed to be fixed in the test.

nyh pushed a commit that referenced this issue Jan 23, 2024
…nges.' from Eliran Sinvani

When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes #16392

This series also adds a test which fails without this fix and passes when the fix is applied.

Closes #16897

* github.com:scylladb/scylladb:
  Add test for mv prepared statements invalidation on base alter
  query processor: treat view changes at least as table changes

(cherry picked from commit 5810396)
@mykaul mykaul removed backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed labels Jan 28, 2024
dgarcia360 pushed a commit to dgarcia360/scylla that referenced this issue Apr 30, 2024
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.

Fixes scylladb#16392

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
dgarcia360 pushed a commit to dgarcia360/scylla that referenced this issue Apr 30, 2024
Issue scylladb#16392 describes a bug where when a base table is altered, it's
materialized views prepared statements are not invalidated which in turn
causes them to return missing data.
This test reproduces this bug and serves as a regression test for this
problem.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/materialized views Field-Tier2 Important issues as per FieldEngineering request type/bug
Projects
None yet
5 participants