Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if nodes participate in SCP for uptime stats #49

Open
bartekn opened this issue Jun 28, 2018 · 2 comments
Open

Check if nodes participate in SCP for uptime stats #49

bartekn opened this issue Jun 28, 2018 · 2 comments

Comments

@bartekn
Copy link
Contributor

bartekn commented Jun 28, 2018

Currently we make a simple connection to the node to check if it's up. It means that even if something is broken but the node accepts incoming connections it's status will be saved as up.

The easiest idea I have is to:

  1. Create a new monitoring stellar-core instance.
  2. Generate the config file using nodes.js file, add all nodes to quorum set. Remove history entries (except local one that will obviously do nothing).
  3. Check /quorum endpoint missing field. If a node is there it means it's down.
  4. Regenerate a config file with new nodes every X hours. If the list has changed, restart core.

Questions:

  • What will happen when len(missing)>fail_at? Does the node continue updating quorum information?
  • Will it continue to work with, say, 200 nodes in quorum set? I tested this with 39 nodes we currently have in the Dashboard and it's been working fine (for a couple minutes so far).

CC: @MonsieurNicolas @vogel

@jedmccaleb
Copy link

I'm hoping someone makes a much better quorum explorer. It was a project called out in this latest SBC. https://www.stellar.org/blog/announcing-the-7th-stellar-build-challenge/ if no one steps up in the next bit we will take this on ourselves.

@MonsieurNicolas
Copy link
Contributor

This is not quorum explorer, but a monitor; but yes, I agree it would be nice to be able to reuse work from the build challenge.

I think that the simplest, most reliable way to do this is.

on the quorum front

We just need to follow the main SDF nodes for quorum; we still need to include the other nodes in order to white list them.

After that, the /quorum endpoint should return information on all those validators

UNSAFE_QUORUM=true
[QUORUM_SET]
THRESHOLD_PERCENT=100

[QUORUM_SET.SDF]
THRESHOLD_PERCENT=51
VALIDATORS=[
    "$sdf1", "$sdf2", "$sdf3"
]

[QUORUM_SET.OTHERS]
THRESHOLD_PERCENT=1
VALIDATORS=[
    "all other keys from dashboard"
]

force to connect to validators being monitored

this will cause the /peers endpoint to return information on the monitored nodes

something like this:

TARGET_PEER_CONNECTIONS=80
PREFERRED_PEERS = [ "sdf1.stellar.org:12345", "..." ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants