feat(frontend): Size of Object Functions (`pg_table_size`, `pg_relation_size`, `pg_indexes_size`) [Draft] #9013

erichgess · 2023-04-06T00:23:23Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Draft PR

This Draft PR only implements pg_table_size the other two functions will be easily implemented using the same code, but I wanted to get feedback on the current implementation while completing the other 2 functions.

This PR implements the functions pg_table_size, pg_relation_size, and pg_indexes_size for literal value arguments. This PR does not implement support for using references (such as a column name) as the argument for these functions.

In this PR, the above functions are computed entirely on the Frontend nodes by using the local Catalog to convert the function argument to a TableID. That TableID is then used to look up the stats for the table, index, or relation within a local copy of the HummockVersionStats which are collected by the Meta nodes. To compute value of the size of an object, the total_key_size and total_value_size values for an object are added together and returned by the function.

The functions pg_table_size, pg_relation_size, and pg_indexes_size are implemented within the Frontend node as part of the Binder type. These functions can only take a literal integer value or a literal varchar. If an integer is given as the argument it is treated as an Object ID and the function simply attempts to find an entry in HummockVersionStats with the same ID. If a varchar is given then the funtion uses the ObjectName parser to convert the value of the varchar into an ObjectName. The ObjectName is then used to look up the TableId so that the associated stats can be found. By using the Parser any valid format for an object name in PG SQL can be used as the value of the varchar literal (e.g. '"my table"', 'public.foo', and 'public."my table"' are all valid arguments). If the object is found in HummockVersionStats then the total size of the object (keys + values) is returned.

A "virtual" table rw_table_stats is implemented, which acts as an interface between the query execution engine and the HummockVersionStats data pushed from the Meta node and that contains the table stats data. Calls to pg_*_size functions get converted to queries on this table and are executed by the query engine in local mode.

In order for the Frontend nodes to have a local copy of the HummockVersionStats and new Meta node notification event was added called HummockStats which the Meta nodes use to send table stat updates to Frontend and Compute nodes. These events are generated whenever tables are compacted or an epoch is committed.

Checklist For Contributors

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
~~I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).~~
I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

SQL commands, functions, and operators

Release note

This PR adds two Postgres functions: pg_table_size and pg_indexes_size. It also expands the domain of values that the ::regclass operation can be applied to: it will work with object names that include their parent schema (e.g., 'public.test'::regclass) it will also work with integers (when applied to an integer it will simply resolve to the integer value itself, mirroring Postgres).

pg_table_size: this function will return the amount of space, in total, taken up by the specified table. If given the name of an index it will return the total size of that index.
pg_indexes_size: this will return the total space taken up by all indexes on a given table. If given the name of an index, it will return 0. If you want the size of a specific index you can use pg_table_size.

…m which will send table stat data to subscribers. Stubs in the Notification observers were added which will eventually handle updating the local Stats state data

…alog.

…add logic that will update this field when the Frontend receives a HummockVersionStats notification

…in Catalog

… as part of the table name.

src/frontend/src/binder/select.rs

src/meta/src/hummock/manager/mod.rs

hzxa21 · 2023-04-06T03:13:20Z

src/frontend/src/binder/select.rs

+                // We use the full parser here because this function needs to accept every legal way
+                // of identifying a table in PG SQL as a valid value for the varchar
+                // literal.  For example: 'foo', 'public.foo', '"my table"', and
+                // '"my schema".foo' must all work as values passed pg_table_size.
+                let mut tokenizer = Tokenizer::new(name);


It looks a littble bit weird to me to use parser in the binder. cc @xiangjinwu Is this a good practice? Any suggestion?

Yes it is wired, but also somewhat reasonable because we are doing introspection here. PG even have a SQL function parse_ident but it looks similar to SplitIdentifierString. My suggestion would be to extract this to a utility function outside binder, and still use parser inside to avoid the burden of maintaining duplicate implementation. This does sound like cheating and makes no difference at run time. 😂

I thought about writing something custom for this, but we need to support any valid representation of a table which would mean building a parallel parser for just table names that are passed as varchar literals and, critically, having to always remember that any change to the SQL parser would have to be reflected in the parallel parser. I couldn't come up with a good reason that would be worth that maintenance load.

Putting into a helper function makes sense to me: what would be the best place to put it?

src/frontend/src/binder/select.rs

…_size can get the Table object and pull the list of indexes for a table)

…ble.

erichgess · 2023-04-07T19:20:42Z

src/frontend/src/binder/select.rs

+        let (schema_name, table_name) =
+            Self::resolve_schema_qualified_name(&self.db_name, object_name)?;
+
+        self.bind_table(schema_name.as_deref(), &table_name, None)


Should I use bind the table whose size is being looked up? I did this because using bind_table was the existing on Binder that would provide the TableId needed to query rw_table_stats. But binding the table from the pg_table_size argument strikes me as less than safe and, instead, I should use a function that looks up the Table information but does not bind the table to the query compiler context.

erichgess · 2023-04-28T02:01:35Z

@yezizp2012 @hzxa21 is there a way to have the e2e test script ci/scripts/e2e-test-parallel-in-memory.sh skip a specific test? The tests for table and index size fail when run "in memory" because, since it's in-memory, there's no space taken up by the table and all calls to pg_table_size and pg_indexes_size return 0.

This is assuming that e2e_test/batch/catalog is the best place to put tests for pg_table_size. There doesn't appear to be a more appropriate place.

lmatz · 2023-05-16T08:43:28Z

Hi @erichgess, sorry for the late response.

After we implement risinglightdb/sqllogictest-rs#177, then we can skip the test and merge the PR soon.

lmatz · 2023-06-09T13:40:18Z

risinglightdb/sqllogictest-rs#179 is merged,
once there is a new release of sqllogictest-rs, we can modify the test cases accordingly

TennyZhuang · 2023-06-28T01:42:12Z

It seems that a new version of sqllogictest has been released, but this PR has been forgotten.

@erichgess Do you have time to resolve the conflicts? Or I can help you.
@wangrunji0408 @lmatz Can you help push for this PR to be merged? e.g. sqllogictest related issues.

lmatz · 2023-06-28T15:09:10Z

It seems that a new version of sqllogictest has been released, but this PR has been forgotten.

@erichgess Do you have time to resolve the conflicts? Or I can help you. @wangrunji0408 @lmatz Can you help push for this PR to be merged? e.g. sqllogictest related issues.

I can, but I don't know how to push changes into erichgess:7766-size-of-db-objects, is it even possible?

TennyZhuang · 2023-06-30T04:01:55Z

I can, but I don't know how to push changes into erichgess:7766-size-of-db-objects, is it even possible?

It's allowed by default. https://stackoverflow.com/questions/63341296/github-pull-request-allow-edits-by-maintainers

lmatz · 2023-06-30T09:14:03Z

dev=> create table t (v1 int);
CREATE_TABLE
dev=> insert into t values (3);
INSERT 0 1
dev=> SELECT pg_table_size('t');
 pg_table_size 
---------------
            51
(1 row)

dev=> SELECT pg_indexes_size('t');
 pg_indexes_size 
-----------------
               0
(1 row)

dev=> create index t_idx on t (v1);
CREATE_INDEX
dev=> flush;
FLUSH
dev=> select pg_indexes_size('t');
 pg_indexes_size 
-----------------
              43
(1 row)

Conflicts solved

lmatz

LGTM(pass the tests)

@hzxa21 @xiangjinwu @xxchan @TennyZhuang

yezizp2012

LGTM!

erichgess added 17 commits March 29, 2023 14:19

Added a HummockVersionStats event to the Meta push notification syste…

e9b9490

…m which will send table stat data to subscribers. Stubs in the Notification observers were added which will eventually handle updating the local Stats state data

Send stats notifications to just Compute and FE nodes.

a111fdb

Add stubs for implementing the virutal rw_table_stats table in rw_cat…

a55c983

…alog.

Add a field to store the HummockVersionStats to the Catalog type and …

9e321f5

…add logic that will update this field when the Frontend receives a HummockVersionStats notification

Frontend reads table stats from the HummockVersionStats value stored …

be1c885

…in Catalog

naming

79e73d7

Delete unused import

51c264c

Storage Nodes should not receive HummockVersionStats notifications.

3353b86

Send HummockVersionStats notification after compaction

c2a777d

WIP: convert pg_table_size into a select query on rw_table_stats

3eca9a5

Removing temporary print statements

78274b0

Convert varchar to an ObjectName to allow for schema names to be used…

663b0f6

… as part of the table name.

Use the SqlParser to parse the table argument of pg_table_size

c3448a2

pg_table_size supports object IDs or object names as input parameters

8b52510

Move helper function to end of impl block

4ad8a0a

Merge branch 'main' into 7766-size-of-db-objects

7f30c4c

Better variable name

9e28491

erichgess mentioned this pull request Apr 6, 2023

implement system administration functions that get the size of table, index, and MV #7766

Closed

4 tasks

hzxa21 reviewed Apr 6, 2023

View reviewed changes

erichgess changed the title ~~7766 Size of Objects (pg_table_size, pg_relation_size, pg_indexes_size) [Draft]~~ feat(frontend): Size of Object Functions (pg_table_size, pg_relation_size, pg_indexes_size) [Draft] Apr 6, 2023

erichgess added 9 commits April 6, 2023 09:53

Code clean up

83b5e9c

Code clean up

07c36fb

Remove logic to push Table Stats updates to Compute nodes.

1003bd1

Add helper function for finding an object by name (so that pg_indexes…

9218d4f

…_size can get the Table object and pull the list of indexes for a table)

WIP: compute the total size of indexes on a table.

698e315

WIP: working out how to compute the total size of all indexes on a ta…

af12b0e

…ble.

Code clean up

df2054d

Code clean up

5782f71

Code clean up

fb9ea68

erichgess commented Apr 7, 2023

View reviewed changes

Increase sleep time in e2e test to get parallel e2e tests working

6304609

xxchan mentioned this pull request May 16, 2023

bin: enhance skipif risinglightdb/sqllogictest-rs#177

Closed

wangrunji0408 mentioned this pull request Jun 8, 2023

feat: add label for skipif and onlyif conditions risinglightdb/sqllogictest-rs#179

Merged

TennyZhuang self-requested a review June 28, 2023 01:38

lmatz added 5 commits June 30, 2023 13:28

Merge branch 'main' into 7766-size-of-db-objects

b50e216

Update mod.rs

42860fb

Update select.rs

4e332bd

Update mod.rs

b284a36

Update select.rs

0f60144

lmatz mentioned this pull request Jul 1, 2023

chore: add in-memory label to slt tests #10678

Merged

8 tasks

lmatz added 2 commits July 3, 2023 14:20

Merge branch 'main' into 7766-size-of-db-objects

2798f48

add skipif in slt test file

1de7d9f

lmatz approved these changes Jul 3, 2023

View reviewed changes

lmatz requested a review from yezizp2012 July 3, 2023 07:06

lmatz added user-facing-changes Contains changes that are visible to users type/feature labels Jul 3, 2023

yezizp2012 approved these changes Jul 3, 2023

View reviewed changes

lmatz added this pull request to the merge queue Jul 3, 2023

Merged via the queue into risingwavelabs:main with commit 5de8c44 Jul 3, 2023
40 of 41 checks passed

This was referenced Jul 3, 2023

Document pg_table_size, pg_indexes_size risingwavelabs/risingwave-docs#1003

Closed

Tracking: more metrics for join operator #10760

Open

abhyuday26 mentioned this pull request Jul 7, 2023

pg_relation_size() not working #10816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(frontend): Size of Object Functions (`pg_table_size`, `pg_relation_size`, `pg_indexes_size`) [Draft] #9013

feat(frontend): Size of Object Functions (`pg_table_size`, `pg_relation_size`, `pg_indexes_size`) [Draft] #9013

erichgess commented Apr 6, 2023 •

edited by lmatz

hzxa21 Apr 6, 2023

xiangjinwu Apr 6, 2023

erichgess Apr 6, 2023

erichgess Apr 7, 2023

erichgess commented Apr 28, 2023

lmatz commented May 16, 2023

lmatz commented Jun 9, 2023

TennyZhuang commented Jun 28, 2023 •

edited

lmatz commented Jun 28, 2023

TennyZhuang commented Jun 30, 2023

lmatz commented Jun 30, 2023

lmatz left a comment •

edited

yezizp2012 left a comment

feat(frontend): Size of Object Functions (pg_table_size, pg_relation_size, pg_indexes_size) [Draft] #9013

feat(frontend): Size of Object Functions (pg_table_size, pg_relation_size, pg_indexes_size) [Draft] #9013

Conversation

erichgess commented Apr 6, 2023 • edited by lmatz

What's changed and what's your intention?

Checklist For Contributors

Checklist For Reviewers

Documentation

Types of user-facing changes

Release note

hzxa21 Apr 6, 2023

Choose a reason for hiding this comment

xiangjinwu Apr 6, 2023

Choose a reason for hiding this comment

erichgess Apr 6, 2023

Choose a reason for hiding this comment

erichgess Apr 7, 2023

Choose a reason for hiding this comment

erichgess commented Apr 28, 2023

lmatz commented May 16, 2023

lmatz commented Jun 9, 2023

TennyZhuang commented Jun 28, 2023 • edited

lmatz commented Jun 28, 2023

TennyZhuang commented Jun 30, 2023

lmatz commented Jun 30, 2023

lmatz left a comment • edited

Choose a reason for hiding this comment

yezizp2012 left a comment

Choose a reason for hiding this comment

feat(frontend): Size of Object Functions (`pg_table_size`, `pg_relation_size`, `pg_indexes_size`) [Draft] #9013

feat(frontend): Size of Object Functions (`pg_table_size`, `pg_relation_size`, `pg_indexes_size`) [Draft] #9013

erichgess commented Apr 6, 2023 •

edited by lmatz

TennyZhuang commented Jun 28, 2023 •

edited

lmatz left a comment •

edited