Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add metrics for chunk and system storage space #360

Open
happybeing opened this issue Jun 8, 2023 · 6 comments
Open

feature: add metrics for chunk and system storage space #360

happybeing opened this issue Jun 8, 2023 · 6 comments

Comments

@happybeing
Copy link
Contributor

The logs used to include the following metrics which I displayed in vdash and think would be useful to have again. So I wonder if they can be added to the metrics module in safenode:

  • space used for chunks
  • space used for registers
  • maximum space for safe data (this was a fixed parameter, later to be set by the user)
  • system storage total
  • system storage free

I can't get these from the local system because there may be multiple safenode processes running, and vdash monitors multiple nodes by displaying any one of any number of available safenode.log files.

@happybeing
Copy link
Contributor Author

@joshuef I'm replying to your request for feedback on logfile messages to keep things in one place. The following are messages strings I am currently matching for the given metrics, while the OP suggests additional metrics it may be useful to have but which we did have at one stage.

PUT: "Wrote record to disk"
GET: "Retrieved record from disk"
REGISTER EDIT: "Editing Register success!" (untested)
ERROR: any logfile message of type 'ERROR'

I have a crude mechanism for categorising node status as: Stopped|Connecting|Connected|Disconnected. I don't know if a node could provide a more definitive state message as a periodic output to the logfile (not just on change), but if so that would improve accuracy.

I'm not sure what would be helpful to add in other areas both for devs, if you use vdash at all? Or for users interested in monitoring the working and performance of their nodes, but it would be nice to find some metrics to at least reflect ongoing activity of different network features related to each node, so things like register edits, CRDT and DBC related activity, and of course node earnings!

Of course I'm open to any suggestions for things that your team would like.

@joshuef
Copy link
Contributor

joshuef commented Jul 12, 2023

space used for chunks
space used for registers

Would be hard as they're all records now. (hard as in reading from disk). So i'd be inclined to just leave that to a sys level check of the record_store dir. Not sure how you feel about that?

We have eg: https://github.com/maidsafe/safe_network/blob/main/sn_node/src/log_markers.rs#L21

Connecting would just be everything before that. Though perhaps we can add an initial message when we start attempting to connect to the first peers (could be done via #518)

could provide a more definitive state message as a periodic output to the logfile

Hmm, outwith of connecting/stopped it would (in theory) just always be connected. Not sure if that's that useful? (Or are you imagining more states? We have some kbucket logs of peers counts that may he more granual about the state of connectivity? https://github.com/maidsafe/safe_network/blob/main/sn_networking/src/event.rs#L458

can't say we as a team use vdash as yet. Everything is headless and we're just looking at grabbing basic stats of nodes to determine any major issues that may be in play thus far

@happybeing
Copy link
Contributor Author

happybeing commented Jul 12, 2023

sys level check of the record_store dir. Not sure how you feel about that?

Yes, whatever is easy and useful.

For state: starting,connecting(ed) etc I'm envisaging it as a proxy for "things seem to be ok, or not". So losing connectivity, being stopped etc. or for anything that might reasonably happen that the operator might want to know about.

So while connected a periodic "all ok" type message could be logged and I would flag it as a problem if this wasn't seen for too long. As well as showing any other states beforehand.

I'm really not sure what is best here, but think it is useful to have something in the dashboard that the operator can look at and instantly go, oh a that's not right.

Showing number of peers sounds good. Any other suggestions welcome as I don't spend much time analysing logs or thinking about the ATM.

I can work with what we have but wanted to see if you thought it worthwhile exposing more general state like info.

Thanks for looking at this.

@joshuef
Copy link
Contributor

joshuef commented Jul 26, 2023

For state: starting,connecting(ed) etc I'm envisaging it as a proxy for "things seem to be ok, or not". So losing connectivity, being stopped etc. or for anything that might reasonably happen that the operator might want to know about.

For the mo, I think the kbucket logs are a proxy there. If we lose everything / are in decline something is up. As we get to know the network, the kbucket may fluctuate a bit, but really should not be descreasing in peer count. Anyone out should be replaced as long as the network is healthy eg.

@happybeing
Copy link
Contributor Author

So if I display peer count and a max peer count, maybe red on some condition.

What would you suggest?

@joshuef
Copy link
Contributor

joshuef commented Dec 13, 2023

There is a new NetworkInfo struct coming from libp2p which we now log on peer/connection changes. That's a nice sumary that may be useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants