-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): Size of Object Functions (pg_table_size
, pg_relation_size
, pg_indexes_size
) [Draft]
#9013
Conversation
…m which will send table stat data to subscribers. Stubs in the Notification observers were added which will eventually handle updating the local Stats state data
…add logic that will update this field when the Frontend receives a HummockVersionStats notification
… as part of the table name.
src/frontend/src/binder/select.rs
Outdated
// We use the full parser here because this function needs to accept every legal way | ||
// of identifying a table in PG SQL as a valid value for the varchar | ||
// literal. For example: 'foo', 'public.foo', '"my table"', and | ||
// '"my schema".foo' must all work as values passed pg_table_size. | ||
let mut tokenizer = Tokenizer::new(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a littble bit weird to me to use parser in the binder. cc @xiangjinwu Is this a good practice? Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is wired, but also somewhat reasonable because we are doing introspection here. PG even have a SQL function parse_ident
but it looks similar to SplitIdentifierString
. My suggestion would be to extract this to a utility function outside binder, and still use parser inside to avoid the burden of maintaining duplicate implementation. This does sound like cheating and makes no difference at run time. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about writing something custom for this, but we need to support any valid representation of a table which would mean building a parallel parser for just table names that are passed as varchar literals and, critically, having to always remember that any change to the SQL parser would have to be reflected in the parallel parser. I couldn't come up with a good reason that would be worth that maintenance load.
Putting into a helper function makes sense to me: what would be the best place to put it?
pg_table_size
, pg_relation_size
, pg_indexes_size
) [Draft]
…_size can get the Table object and pull the list of indexes for a table)
src/frontend/src/binder/select.rs
Outdated
let (schema_name, table_name) = | ||
Self::resolve_schema_qualified_name(&self.db_name, object_name)?; | ||
|
||
self.bind_table(schema_name.as_deref(), &table_name, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I use bind the table whose size is being looked up? I did this because using bind_table was the existing on Binder that would provide the TableId needed to query rw_table_stats. But binding the table from the pg_table_size argument strikes me as less than safe and, instead, I should use a function that looks up the Table information but does not bind the table to the query compiler context.
@yezizp2012 @hzxa21 is there a way to have the e2e test script This is assuming that |
Hi @erichgess, sorry for the late response. After we implement risinglightdb/sqllogictest-rs#177, then we can skip the test and merge the PR soon. |
risinglightdb/sqllogictest-rs#179 is merged, |
It seems that a new version of sqllogictest has been released, but this PR has been forgotten. @erichgess Do you have time to resolve the conflicts? Or I can help you. |
I can, but I don't know how to push changes into |
It's allowed by default. https://stackoverflow.com/questions/63341296/github-pull-request-allow-edits-by-maintainers |
Conflicts solved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM(pass the tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
5de8c44
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Draft PR
This PR implements the functions
pg_table_size
,pg_relation_size
, andpg_indexes_size
for literal value arguments. This PR does not implement support for using references (such as a column name) as the argument for these functions.In this PR, the above functions are computed entirely on the Frontend nodes by using the local
Catalog
to convert the function argument to aTableID
. ThatTableID
is then used to look up the stats for the table, index, or relation within a local copy of theHummockVersionStats
which are collected by the Meta nodes. To compute value of the size of an object, thetotal_key_size
andtotal_value_size
values for an object are added together and returned by the function.The functions
pg_table_size
,pg_relation_size
, andpg_indexes_size
are implemented within the Frontend node as part of theBinder
type. These functions can only take a literal integer value or a literal varchar. If an integer is given as the argument it is treated as an Object ID and the function simply attempts to find an entry inHummockVersionStats
with the same ID. If a varchar is given then the funtion uses theObjectName
parser to convert the value of the varchar into anObjectName
. TheObjectName
is then used to look up theTableId
so that the associated stats can be found. By using the Parser any valid format for an object name in PG SQL can be used as the value of the varchar literal (e.g.'"my table"'
,'public.foo'
, and'public."my table"'
are all valid arguments). If the object is found inHummockVersionStats
then the total size of the object (keys + values) is returned.A "virtual" table
rw_table_stats
is implemented, which acts as an interface between the query execution engine and theHummockVersionStats
data pushed from the Meta node and that contains the table stats data. Calls topg_*_size
functions get converted to queries on this table and are executed by the query engine in local mode.In order for the Frontend nodes to have a local copy of the
HummockVersionStats
and new Meta node notification event was added calledHummockStats
which the Meta nodes use to send table stat updates to Frontend and Compute nodes. These events are generated whenever tables are compacted or an epoch is committed.Checklist For Contributors
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934)../risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note
This PR adds two Postgres functions:
pg_table_size
andpg_indexes_size
. It also expands the domain of values that the::regclass
operation can be applied to: it will work with object names that include their parent schema (e.g.,'public.test'::regclass
) it will also work with integers (when applied to an integer it will simply resolve to the integer value itself, mirroring Postgres).pg_table_size
: this function will return the amount of space, in total, taken up by the specified table. If given the name of an index it will return the total size of that index.pg_indexes_size
: this will return the total space taken up by all indexes on a given table. If given the name of an index, it will return0
. If you want the size of a specific index you can usepg_table_size
.