-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement system administration functions that get the size of table, index, and MV #7766
Comments
Just curious. Why would it happen? When will a user think it's a must to query metrics through SQL? |
Maybe only have a terminal. |
We already maintain key-value total size on a per table basis and persist it in meta store (#6390). We can expose it via SQL. Is this an urgent issue? It seems like a candidate for good-first-issue. |
Intend to let the Cloud user be able to discover sizes by themselves. Last time, @zwang28 manually checked the Cloud Grafana dashboard(not accessible by users) and As Cloud is supporting showing information contained in the catalog right now, once this feature is implemented, Cloud can directly show the information on the web console instead of Grafana. Not urgent in the sense that it blocks anything |
May I hop on this issue? |
Sure @erichgess! Feel free to raise question if there is any, thanks a lot |
@lmatz thanks! May be a few days delayed on working on this; as my laptop died this morning. |
While I'm waiting on getting my laptop fixed, I wanted to confirm that these are the function specs:
Can the functionality used by the Prometheus metrics (mentioned in the earlier posts) be used to implement these functions? |
@lmatz Will not be able to get to this issue until the weekend, I hope that's not a problem. |
We already store the physical key count of a table here so I think we can leverage them instead of relying on Prometheus (in fact the Prometheus metrics also come from the version stat we maintained).
No problem. Take your time. Let us know if you have any question. |
Cool, this makes sense to me. I figured the Prom metrics used some existing feature to derive their metric data and that these functions just needed to use those same "primitives", glad to know that's the case. |
Hi @erichgess Any updates? |
Hey I'm very sorry, I got behind on this I will take care of it this
weekend.
…On Wed, Mar 22, 2023, 4:39 AM Eric Fu ***@***.***> wrote:
Hi @erichgess <https://github.com/erichgess> Any updates?
—
Reply to this email directly, view it on GitHub
<#7766 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3OQG736LVXN7MRY2MIACDW5K3FXANCNFSM6AAAAAAUUZIH6E>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@hzxa21 I have two questions. I've been grepping through the code and have not been able to find good examples of the following two operations:
|
These functions run in frontend, while the requested data The most related thing is I am not an expert on this part. Please correct me if wrong. @hzxa21 |
My initial design for this was to add a new function to
This function would take the TableId (looked up in the frontend) then query But I wasn't able to find where TableIds are looked up while grepping through the frontend code. This was based upon my initial, though wrong, understanding that I assume that the correct design is to query a Meta node and get the stats for a table and so I see two possible designs:
Design As a follow up question: is |
+1. I think we can enhance the hummock snapshot notification to include Some more background here: currently we leverage a push-based notification to propagate meta information from meta service to frontend and compute node. Each frontend node and compute node has its own observer manager to listen changes from meta service. Meta service will push metadata updates to these observer managers via a streaming RPC. For example, catalog and hummock snapshot are pushed to all frontend nodes via this mechnism. |
This makes sense to me. So, with this design we would then have the frontend resolve calls to functions like Are there issues with consistency across frontend nodes if caches for some nodes have missed notifications? The Meta Service doc mentions that the master node must wait for other nodes to |
Though this approach works, I suggest we can do a step further to maintain our own system table to show all information from hummock stats instead of just the size via To be more specific, I suggest we implement a |
actually there might be another issue with this approach, postgres=# create table t1(a int);
CREATE TABLE
postgres=# create table t2(a real);
CREATE TABLE
postgres=# create table t3(a varchar);
CREATE TABLE
postgres=# insert into t3 values('t1');
INSERT 0 1
postgres=# insert into t3 values('t2');
INSERT 0 1
postgres=# select pg_table_size(a) from t3;
pg_table_size
---------------
0
0
(2 rows) postgres=# insert into t1 values (7);
INSERT 0 1
postgres=# insert into t2 values (6.28);
INSERT 0 1
postgres=# select pg_table_size(a) from t3;
pg_table_size
---------------
8192
8192
(2 rows) for use cases like these, we might need to go through the function call framework and do some RPC on the compute node side. |
After reading through the code more carefully, I think my above questions are not relevant. From what I understand,
|
@hzxa21 so we would update Something like this: // We will have two epoch after decouple
message HummockSnapshot {
// Epoch with checkpoint, we will read durable data with it.
uint64 committed_epoch = 1;
// Epoch without checkpoint, we will read real-time data with it. But it may be rolled back.
uint64 current_epoch = 2;
HummockVersionStats stats = 3; // Add stats to the Snapshot message
}
Or would it be better to add a new event type to Then for for frontend nodes I update The Compute node's observer does not logic to handle |
IMO, adding a new event type seems clearer since only a subset of observers are intersted in this event. |
I'm working through the implementation of this feature and there are a couple questions I have. For storing the stats data on Frontend nodes, I'm considering which struct should be responsible for owning the HummockVersionStats data that's sent from the Meta node. There seem to be two options, either store a shared reference to the table stats in For the Compute nodes, I've been reading through the code under |
Agree with you. Either place looks to me and I slightly prefer
Sounds like you were going to read these stats on Compute Nodes? Currently, we only read these metadata-related things in Frontend Node. For example,
Assuming you follow the 2, then you might need to add a reference to HummockVersionStats into Binder's Context. Didn't go through the details so feel free to correct me. |
@hzxa21 mentioned that we'd want to push the data to the compute nodes. I'd assumed it was because some queries would go to the compute nodes. If the queries to a system table would never go to a compute node then I don't know why we'd need to push stats data to them, but I would love to learn. Executing only on the Frontend would simplify this task. |
Summarize the discussion on slack: https://risingwave-community.slack.com/archives/C03BW6YSBPB/p1680634594673689 Supporting general query via system table would be hard. What's really hard is the resolving identifier part (i.e., So we'd better KISS first: Support only varchar literals and evaluate entirely on the Frontend. |
For some database tools like Bytebase, gather some statistics by executing |
@h3n4l Thanks for mentioning this. I think it's somewhat unavoidable since supporting such admin features (in a PostgreSQL-compatible way) are quite difficult 🥲. AFAIK cockroachdb hasn't support this for years neither. BTW, would you mind sharing an example about how Bytebase uses such queries to implement what kind of features? |
@h3n4l would you happen to know how Bytebase queries that data? I'm curious because my proposal was to break this ticket into two parts:
Depending on how Bytebase works part 1 may be sufficient for it to be fully functional. |
Hi @xxchan , I strongly agree with
For now, Bytebase just displays this simple information, like the following: https://demo.bytebase.com/db/shop-7015#overview Missing this information doesn't matter, but you know, people always want to align the interfaces(aka what other databases can have, they better have too). Anyway, Risingwave is a great product, there's no doubt about it. |
Hi @erichgess, If you want to know the technical details, take a look at this part of the code built by Bytebase for PostgreSQL. Part I seems to work well, but I'm not sure if there is additional work required to go from |
Yes, it looks like this would require the full implementation from step 2 rather than the simpler implementation from of step 1. |
Added a draft PR here: #9013 |
@hzxa21 @lmatz We could have this be an alias for Also, do we want to implement |
didn't find a function that gets the size of a single index though?
It does not conflict with the metrics currently collected by Prometheus and shown on Grafana. I think this is primarily driven by Devs' demands. And it shows a series of data points along time instead of a snapshot. It can be in whatever form that is helpful for development and debugging, decided by Devs.
With these functions completed, Cloud will be able to integrate them into the Cloud catalog.
The text was updated successfully, but these errors were encountered: