Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend): support background ddl for materialized views #12355

Merged
merged 19 commits into from
Sep 19, 2023

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Sep 15, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Part of #8051

Support running background ddl.

CREATE TABLE t(v1 int);
insert into t select * from generate_series(1, 1000000);
SET BACKGROUND_DDL=true;
-- Should return immediately:
CREATE MATERIALIZED VIEW m as select * from t;

We can support it without persistence first.

I prefer to keep this PR small, and work towards an e2e working background ddl for mviews first. Later we will handle sinks and indexes as well. And separately for persistence: #12167

Implementation details

  • Extend CreateMaterializedViewRequest to take background_ddl as a parameter.
  • On the meta side, we run prepare + build stream job first.
  • Then to avoid blocking the response to frontend,
    we spawn the stream job creation task in a tokio thread.
  • If there's any errors during creation we just emit meta logs for it.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

You should treat this as an experimental feature. It still requires persistence of background ddl portion to be considered complete, as well as more testing. The release note in #12167 will indicate it.

Users can now set the session variable BACKGROUND_DDL to a boolean value. If true, CREATE MATERIALIZED VIEW will execute in the background, and users should expect to immediately get the typical CREATE MATERIALIZED VIEW back as a response.

If false, CREATE MATERIALIZED VIEW will execute as per normal.

As an example:

CREATE TABLE t(v1 int);
insert into t select * from generate_series(1, 1000000);
flush;
SET BACKGROUND_DDL=true;
-- Should return immediately:
CREATE MATERIALIZED VIEW m as select * from t;

You can use SHOW JOBS to monitor its progress:

-- Call show jobs, may have a small delay before progress shows up.
-- This will display the progress of creating the materialized view m.
dev=> show jobs;
  Id  |                   Statement                   | Progress 
------+-----------------------------------------------+----------
 1002 | CREATE MATERIALIZED VIEW m AS SELECT * FROM t | 24.68%
(1 row)

Jobs running in the background can be cancelled:

dev=> cancel jobs 1002;
  Id  
------
 1002
(1 row)

dev=> show jobs;
 Id | Statement | Progress 
----+-----------+----------
(0 rows)

Eventually CREATE SINK and CREATE INDEX will support this as well.
Subsequent PRs for these will include a release note.

Note that if cluster crashes while ddl is still running, it is currently not recoverable. But there's work in progress to support persistence of background ddl.

@kwannoel kwannoel marked this pull request as draft September 15, 2023 15:34
@kwannoel kwannoel changed the title feat(frontend): support background ddl [WIP] feat(frontend): support background ddl for materialized views Sep 18, 2023
@kwannoel kwannoel marked this pull request as ready for review September 18, 2023 07:43
src/meta/src/rpc/service/ddl_service.rs Outdated Show resolved Hide resolved
src/meta/src/rpc/service/ddl_service.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
One thing non-related to this PR about recovery, as you said in slack channel: "If user does not set background ddl session variable, we don’t persist backfill", if so then we need a way to distinguish between jobs that need to be recovered and jobs that need to be cleaned.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May mark this PR as "user-facing" and write some release notes for the doc team. :)

src/meta/src/rpc/ddl_controller.rs Outdated Show resolved Hide resolved
@kwannoel kwannoel force-pushed the kwannoel/background-ddl-frontend branch from 9f08abb to 1068125 Compare September 18, 2023 11:01
@kwannoel kwannoel added user-facing-changes Contains changes that are visible to users and removed ci/run-e2e-standalone-tests labels Sep 18, 2023
@kwannoel kwannoel force-pushed the kwannoel/background-ddl-frontend branch from 457d7f3 to 6d7676a Compare September 18, 2023 13:42
@kwannoel

This comment was marked as resolved.

- caused by not cleaning up failed build stream job and register source
@kwannoel
Copy link
Contributor Author

Test failure caused by failing to cleanup stream job on failure. Need to clean up the build_stream_job step as well.

@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Merging #12355 (b6863fc) into main (4dadb7c) will decrease coverage by 0.02%.
Report is 1 commits behind head on main.
The diff coverage is 10.67%.

@@            Coverage Diff             @@
##             main   #12355      +/-   ##
==========================================
- Coverage   69.76%   69.74%   -0.02%     
==========================================
  Files        1424     1424              
  Lines      236247   236330      +83     
==========================================
+ Hits       164826   164837      +11     
- Misses      71421    71493      +72     
Flag Coverage Δ
rust 69.74% <10.67%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/frontend/src/catalog/catalog_service.rs 3.57% <0.00%> (-0.09%) ⬇️
src/meta/src/manager/notification.rs 73.06% <ø> (ø)
src/meta/src/rpc/ddl_controller.rs 0.00% <0.00%> (ø)
src/meta/src/rpc/service/ddl_service.rs 0.00% <0.00%> (ø)
src/rpc_client/src/meta_client.rs 4.09% <0.00%> (-0.01%) ⬇️
src/common/src/session_config/mod.rs 29.60% <25.00%> (-0.10%) ⬇️
src/frontend/src/test_utils.rs 72.43% <66.66%> (-0.07%) ⬇️
src/frontend/src/handler/create_mv.rs 93.67% <85.71%> (-0.23%) ⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants