Exploring the Current Client Protocol Architecture #122

rauljordan · 2018-05-16T05:05:20Z

Hi all,

As more of our sharding client code is being created in our fork, it is critical to understand the design considerations of the current Ethereum nodes baked into go-ethereum. In particular, our notary/proposer clients need to be designed with good event loop management, pluggable services, and solid entry points for p2p functionality built in. As a case study, we will be looking at lightsync nodes as they are currently implemented in geth, understand their full responsibilities, and figure out the bigger picture behind the design considerations of their architecture.

The key question we will be asking ourselves is: what exactly happens when we start a light client? What are the design considerations that came into play when designing the code that gets the light client to work?

We will cap off this document by determining what aspects of the protocols in geth we can use as part of our sharding clients. We have an opportunity to write clean, straightforward code that does not have a massive number of file dependencies and complicated configs as geth currently does.

Let’s dive in.

Case Study: Light Client Nodes

Ethereum’s light client sync mode allows users to spin up a geth node that only downloads block headers and relies on merkle proofs to verify specific parts of the state tree as needed. Light peers are extremely commonplace and critical components in the Ethereum network today. Their architecture serves as a great starting point for anyone extending or redesigning geth in a secure, concurrent, and performant way.

Unfortunately, the current geth code is very hard to read, has a ton of dependencies across packages, and contains obscure configuration options. This doc will attempt to explain light client sync from start to finish, light node peer-to-peer networking, and other responsibilities of the protocol.

How is a Light Node Triggered?

Launching a geth light node is as easy as:

$ geth --syncmode="light"

Upon the command being executed, the main function within go-ethereum/cmd/geth/main.go runs as follows:

func main() {
  if err := app.Run(os.Args); err != nil {
    fmt.Fprintln(os.Stderr, err)
    os.Exit(1)
  }
}

This triggers the urfave/cli external package’s Run function, which will trigger the geth function a few lines below main().

func geth(ctx *cli.Context) error {
  node := makeFullNode(ctx)
  startNode(ctx, node)
  node.Wait()
  return nil
}

Based on the cli context, this function initializes a node instance, which is a critical entry point. Let’s take a look at how makeFullNode does this.

In go-ethereum/cmd/geth/config.go:

func makeFullNode(ctx _cli.Context) _node.Node {
  stack, cfg := makeConfigNode(ctx)

  utils.RegisterEthService(stack, &cfg.Eth)
// a bunch of other services are configured below…
…
// then it returns the node, which is a var called a “stack”,
// representing a protocol stack of the node (i.e. p2p services, rpc, etc.).
  return stack
}

Two important functions are at play here:

makeConfigNode returns a configuration object that uses the cli context to fetch relevant command line flags and returns a node instance + a configuration object instance.
utils.RegisterEthService is a function that, based on the command line flags from the context, will use configuration options to add a Service object to the node instance we just declared above. In this case, the cli context contains the --syncmode="light" flag that we will be using to setup a light client protocol instead of a full Ethereum node.

Let's see makeConfigNode in go-ethereum/cmd/geth/config.go:

func makeConfigNode(ctx _cli.Context) (_node.Node, gethConfig) {

  // Load defaults.
  cfg := gethConfig{
    Eth:       eth.DefaultConfig,
    Shh:       whisper.DefaultConfig,
    Node:      defaultNodeConfig(),
    Dashboard: dashboard.DefaultConfig
  }

  // Load config file.
  if file := ctx.GlobalString(configFileFlag.Name); file != "" {
    if err := loadConfig(file, &cfg); err != nil {
      utils.Fatalf("%v", err)
    }
  }

  // Apply flags.
  utils.SetNodeConfig(ctx, &cfg.Node)
  stack, err := node.New(&cfg.Node)
  if err != nil {
    utils.Fatalf("Failed to create the protocol stack: %v", err)
  }
  utils.SetEthConfig(ctx, stack, &cfg.Eth)
  if ctx.GlobalIsSet(utils.EthStatsURLFlag.Name) {
    cfg.Ethstats.URL = ctx.GlobalString(utils.EthStatsURLFlag.Name)
  }

  utils.SetShhConfig(ctx, stack, &cfg.Shh)
  utils.SetDashboardConfig(ctx, &cfg.Dashboard)

  return stack, cfg

}

Cool, so this function just sets up some basic, default configurations to start a node. This sets up some basic, familiar options we have in the Ethereum network.

var DefaultConfig = Config{
	SyncMode: downloader.FastSync,
	Ethash: ethash.Config{
		CacheDir:       "ethash",
		CachesInMem:    2,
		CachesOnDisk:   3,
		DatasetsInMem:  1,
		DatasetsOnDisk: 2,
	},
	NetworkId:     1,
	LightPeers:    100,
	DatabaseCache: 768,
	TrieCache:     256,
	TrieTimeout:   5 _ time.Minute,
	GasPrice:      big.NewInt(18 _ params.Shannon),

    TxPool: core.DefaultTxPoolConfig,
    GPO: gasprice.Config{
    	Blocks:     20,
    	Percentile: 60,
    },

}

The utils.SetEthConfig(ctx, stack, &cfg.Eth) line is what will modify the cfg option based on command line flags. In this case, if SyncMode is set to light, then the config is updated to reflect that flag. Then, we go into the actual code that initializes a Light Protocol instance and registers it as the node's ETH service.

In go-ethereum/cmd/flags.go:

// RegisterEthService adds an Ethereum client to the stack.
func RegisterEthService(stack _node.Node, cfg _eth.Config) {

  var err error
  if cfg.SyncMode == downloader.LightSync {
    err = stack.Register(func(ctx _node.ServiceContext) (node.Service, error) {
      return les.New(ctx, cfg)
    })
  } else {
    err = stack.Register(func(ctx _node.ServiceContext) (node.Service, error) {
      fullNode, err := eth.New(ctx, cfg)
      if fullNode != nil && cfg.LightServ > 0 {
        ls, \_ := les.NewLesServer(fullNode, cfg)
        fullNode.AddLesServer(ls)
      }
      return fullNode, err
    })
  }
  if err != nil {
    Fatalf("Failed to register the Ethereum service: %v", err)
  }

}

So here, if the config option for the downloader is set to LightSync, which was set in the makeConfigNode function we saw before, we register a Service object into the node (referred to as stack in the code above). Nodes contain an array of Service instances that all implement useful functions we will come back to later. In this case, the service a LightEthereum instance that gives us all the functionality we need to run a light client.

How Do These Attached Services Start Running?

Here's where everything actually ties together. If you go back to the main function in go-ethereum/cmd/geth/main.go,

func geth(ctx *cli.Context) error {

  node := makeFullNode(ctx)
  startNode(ctx, node)
  node.Wait()
  return nil

}

the startNode func actually kicks things off.

// startNode boots up the system node and all registered protocols, after which
// it unlocks any requested accounts, and starts the RPC/IPC interfaces and the
// miner.
func startNode(ctx _cli.Context, stack _node.Node) {

  // Start up the node itself
  utils.StartNode(stack)

  // a lot of stuff below is related to wallet opening/closing events and setting up
  // full node mining functionality...
  ...
}

When we look at utils.StartNode in go-ethereum/cmd/utils/cmd.go:

func StartNode(stack *node.Node) {

  if err := stack.Start(); err != nil {
    Fatalf("Error starting protocol stack: %v", err)
  }

  // stuff below handles signal interrupts to stop the service...
  ...
}

...we see the actual code that starts off a node! Let's explore. In go-ethereum/node/node.go, a lot of things happen (simplified for readability):

func (n *Node) Start() error {

  n.lock.Lock()
  defer n.lock.Unlock()

  // Short circuit if the node's already running
  if n.server != nil {
    return ErrNodeRunning
  }
  if err := n.openDataDir(); err != nil {
    return err
  }

  // Initialize the p2p server. This creates the node key and
  // discovery databases.
  n.serverConfig = n.config.P2P
  n.serverConfig.PrivateKey = n.config.NodeKey()
  n.serverConfig.Name = n.config.NodeName()
  n.serverConfig.Logger = n.log

  // setting up more config stuff...
  ...

  // sets up a peer to peer server instance!
  running := &p2p.Server{Config: n.serverConfig}
  n.log.Info("Starting peer-to-peer node", "instance", n.serverConfig.Name)

  services := make(map[reflect.Type]Service)

  // serviceFuncs is an internal slice updated in a node whenever node.Register() is called!
  for _, constructor := range n.serviceFuncs {

    // Create a new context for the particular service
    ctx := &ServiceContext{
      config:         n.config,
      services:       make(map[reflect.Type]Service),
      EventMux:       n.eventmux,
      AccountManager: n.accman,
    }

    // does some stuff for threaded access...
    ...
   
    // Construct and save the service
    service, err := constructor(ctx)

    // sets up the service and adds it to the services slice defined above...
    ...

    // updates the services slice
    services[kind] = service
  }

  // this uses the .Protocols() property of each attached service (yes, LightEthereum has this defined)
  // and attaches it to the running p2p server instance.
  for _, service := range services {
    running.Protocols = append(running.Protocols, service.Protocols()...)
  }

  // this starts the p2p server!
  if err := running.Start(); err != nil {
    ...
  }
  // Start each of the services
  for kind, service := range services {
    // Start the next service, stopping all previous upon failure
    if err := service.Start(running); err != nil {
      ...
    }
  }

  // code below starts some RPC stuff and cleans up the node when it exits...

  return nil
}

Aha! So this is the function that iterates over each attached service and runs the .Start() function for each! The LightEthereum instance that was attached as a service to the node implements the Service interface that contains a .Start() function. This is how it all fits together!

The Light Ethereum Package

We will focusing our attention on the go-ethereum/les package in this section, as this is the service that is attached to the running node upon launching a geth instance with the --syncmode="light" flag.

The light client needs to implement the Service interface defined in go-ethereum/node/service.go as follows:

type Service interface {

  // Protocols retrieves the P2P protocols the service wishes to start.
  Protocols() []p2p.Protocol

  // APIs retrieves the list of RPC descriptors the service provides.
  APIs() []rpc.API

  // Start is called after all services have been constructed and the networking
  // layer was also initialized to spawn any goroutines required by the service.
  Start(server *p2p.Server) error

  // Stop terminates all goroutines belonging to the service, blocking until they
  // are all terminated.
  Stop() error
  
}

The core of the entire light client is written in go-ethereum/les/backend.go. This is where we find the functions required to satisfy this Service interface, alongside the code that initializes an actual LightEthereum instance in a function known called New.

func New(ctx _node.ServiceContext, config _eth.Config) (_LightEthereum, error) {
  
  // sets up the chainDB and genesis configuration for the light node...
  chainDb, err := eth.CreateDB(ctx, config, "lightchaindata")
  if err != nil {
    return nil, err
  }
  chainConfig, genesisHash, genesisErr := core.SetupGenesisBlock(chainDb, config.Genesis)
 
  ...

  log.Info("Initialised chain configuration", "config", chainConfig)

  leth := &LightEthereum{
    ...
  }

  // sets up a transaction relayer, a server pool, and info retrieval systems

  leth.relay = NewLesTxRelay(peers, leth.reqDist)
  leth.serverPool = newServerPool(chainDb, quitSync, &leth.wg)
  leth.retriever = newRetrieveManager(peers, leth.reqDist, leth.serverPool)
  
  ...

  // sets up the light tx pool
  leth.txPool = light.NewTxPool(leth.chainConfig, leth.blockchain, leth.relay)

  // sets up a protocol manager: we'll get into this shortly...
  if leth.protocolManager, err = NewProtocolManager(...); err != nil {
    return nil, err
  }

  // sets up the light ethereum APIs for RPC interactions
  leth.ApiBackend = &LesApiBackend{leth, nil}
 
  ...

  return leth, nil

}

Let's see what the light client's .Start() function does and how it sets up the p2p stack:

func (s _LightEthereum) Start(srvr _p2p.Server) error {

  ...

  log.Warn("Light client mode is an experimental feature")
  s.netRPCService = ethapi.NewPublicNetAPI(srvr, s.networkId)

  ...

  s.serverPool.start(srvr, lesTopic(s.blockchain.Genesis().Hash(), protocolVersion))
  ...
  return nil
  
}

Light Protocol Event Loop

The creation of the LightEthereum instance kicks off a bunch of goroutines, but where the actual sync and retrieval of state occurs is in the creation of a ProtocolManager in the New function.

In go-ethereum/les/handler.go, we see at the bottom of the NewProtocolManager function, code that runs some event loops:

if lightSync {
		manager.downloader = downloader.New(downloader.LightSync, chainDb, manager.eventMux, nil, blockchain, removePeer)
		manager.peers.notify((*downloaderPeerNotify)(manager))
		manager.fetcher = newLightFetcher(manager)
	}

In this case, we the instance starts a new downloader instance and a newLightFetcher, which work in tandem with the p2p layer to sync the state and respond to RPC requests that trigger events on peers or respond to incoming messages from peers.

The implementation diverges into a variety of files at this point, but an important aspect of the les package is the usage of on-demand requests or ODR's. Through the p2p light server, nodes receive requests that are processed via goroutines such as in the example below.

In go-ethereum/les/odr_requests.go:

func (r _TrieRequest) Validate(db ethdb.Database, msg _Msg) error {

  log.Debug("Validating trie proof", "root", r.Id.Root, "key", r.Key)

  switch msg.MsgType {
  case MsgProofsV1:
    proofs := msg.Obj.([]light.NodeList)
    if len(proofs) != 1 {
      return errInvalidEntryCount
    }
    nodeSet := proofs[0].NodeSet()
    // Verify the proof and store if checks out
    if _, err, _ := trie.VerifyProof(r.Id.Root, r.Key, nodeSet); err != nil {
      return fmt.Errorf("merkle proof verification failed: %v", err)
    }
    r.Proof = nodeSet
    return nil

  case MsgProofsV2:
    proofs := msg.Obj.(light.NodeList)
    // Verify the proof and store if checks out
    nodeSet := proofs.NodeSet()
    reads := &readTraceDB{db: nodeSet}
    if _, err, _ := trie.VerifyProof(r.Id.Root, r.Key, reads); err != nil {
      return fmt.Errorf("merkle proof verification failed: %v", err)
    }
    // check if all nodes have been read by VerifyProof
    if len(reads.reads) != nodeSet.KeyCount() {
      return errUselessNodes
    }
    r.Proof = nodeSet
    return nil

  default:
    return errInvalidMessageType
  }

}

The node in question has the capacity to immediately respond to a message received via other peers, which is a critical piece of functionality we will need the more we elaborate on our notary/proposer clients.

Key Takeaways

Overall, taking full advantage of Go's concurrency primitives along with mutexes for managing services is a great benefit of working with the geth client. We should maintain the pluggability of Services via a Service-like interface and allow for easy management and testing of relevant code.

What we should avoid, however, is the extremely dependent spaghetti code around configuration options. There is a lot of hetereogeneity around configuring structs in the geth client, with packages often following their own approaches compared to others throughout the project. We should aim to constrain all configuration to a single, initial entrypoint and avoid redundancy of .Start() methods. After reading this code, it often feels like the geth team really drove themselves into a corner here. We have the opportunity to keep things simple, DRY, and performant.

We have to leverage the powerful constructs shown above in our notary/proposer implementations to make the most out of Go. Please let me know your thoughts below as to how we can improve upon what the go-ethereum team has done.

Let's go for it.

The text was updated successfully, but these errors were encountered:

terencechain · 2018-05-16T17:25:30Z

Great read! it definitely opened my eyes. We should turn your write up into a Medium article to benefit broader audiences.

Take Notary as an example, do we think the following is the right path?

makeNotaryNode() gets notary config by calling makeConfigNotaryNode() and registers notary services by calling registerNotaryService(). Then makeNotaryNode() starts notary node via startNotaryNode()
Within startNotaryNode()we iterates each notary services and start(). Notary should implement the Service interface under service.go. While creation of the notaryEthereum service instance should kick off go routines such as downloader, fetcher.. etc

rauljordan · 2018-05-16T17:47:25Z

I'm not a fan of the complicated config setups that are spread across multiple files, but I'm a fan of having a bunch of services that implement a certain interface be attached to our notary client with each of them having a .Start() func. I don't think we should copy exactly what they did, but instead trim it down as much as possible and keep the good parts that leverage concurrency.

rauljordan · 2018-05-16T21:45:49Z

Hey all,

So thinking more about this as I transition into creating a PR for #109, here is a proposal I would like to make for our sharding clients moving forward. As our code base grows, it's important to think about how we can best leverage concurrency, event management, and simple configuration options that don't cause any headaches to those reading our work. There are elements of the light node design that I'd like to incorporate into our system for spinning up notary/proposer clients. Here are the ideas:

Design

The key idea is that our sharding entry point will spin up a ShardingClient struct, which is analogous to geth spinning up an instance of a Node.

There is a key command line flag called ClientType that specifies if the client will be a Notary or Proposer instead
The main entry point sets up a ShardingClient instance
- attaches configuration options from cli flags
- contains a utility function that registers a ShardingService: either Notary or Proposer
- starts each of these services in a concurrency-safe fashion
Notary and Proposer instances implement a ShardingService interface that defines common methods to both, including, but not limited to:

  type Service interface {

    // Protocols retrieves the P2P protocols the service wishes to start.
    Protocols() []p2p.Protocol
    
    // APIs retrieves the list of RPC descriptors the service provides
    APIs() []rpc.API

    // Start is called after all services have been constructed and the networking
    // layer was also initialized to spawn any goroutines required by the service.
    Start(server *p2p.Server) error

    // Stop terminates all goroutines belonging to the service, blocking until they
    // are all terminated.
    Stop() error
  }

The idea of attaching services this way to the sharding client allows service life-cycle management to be the responsibility of the sharding client itself. Moreover, every single goroutine pertaining to a service can be spun up and contained within its .Start() method.

The .Start() function will open a local shardchaindb file storage, and spin up notaries and proposers' respective p2p ServerPools and ProtocolManager's .Start() methods
- ServerPool kickstarts an event loop that handles peer discovery, new connections, and disconnections from peers
- ProtocolManager is struct that handles notaries and proposers' respective event loops (i.e. interacting with the SMC, the voting process, etc.), their corresponding serverPools, their chaindb, txpools, and message requests/responses from other peers.

A ProtocolManager interface allows for a well-defined set of responsibilities and goroutines executed by notaries and proposers.

The lifecycle of notaries and proposers in the p2p network can be handled via a callback as in the les package that deals with the handshake between peers, and an eternal loop of responding to incoming messages via the ProtocolManager's handleMsg functionality.

Clients React to Each Other Via the ProtocoManager's handleMsg Function

There is a fixed set of messages sharding clients can respond to and send. We can follow the same approach as done in the les package's ProtocolManager.handleMsg function to do this.

Overall, I suggest we keep configurations in a single place, without many dependencies across files, and we document everything extensively. Let me know your thoughts.

@prestonvanloon @terenc3t @nisdas @enriquefynn @Magicking

terencechain · 2018-05-16T23:17:39Z

Looks good. In regard to how clients interact with each other, check out this LES flow control writeup: https://github.com/zsfelfoldi/go-ethereum/wiki/Client-Side-Flow-Control-model-for-the-LES-protocol

rosulucian · 2018-12-12T16:27:43Z

Raul, do you know of any other similar docs that could help a new developer peek inside the geth architecture& design?
Great job on this one, though! It really helped me get a broad picture of the geth design (not limited to the light client)

rauljordan added the Discussion Simply a thread for talking about stuff label May 16, 2018

rauljordan self-assigned this May 16, 2018

rauljordan added this to To do in Validator Client via automation May 16, 2018

terencechain mentioned this issue May 18, 2018

Implement Proposers Submitting Collations onto SMC #111

Merged

3 tasks

This was referenced May 18, 2018

Revamp the Sharding Client Entry Points / Architecture #126

Closed

Rearchitecting the Sharding Node, its Lifecycle, and Services #127

Merged

rauljordan closed this as completed May 24, 2018

Validator Client automation moved this from To do to Done May 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploring the Current Client Protocol Architecture #122

Exploring the Current Client Protocol Architecture #122

rauljordan commented May 16, 2018

terencechain commented May 16, 2018

rauljordan commented May 16, 2018

rauljordan commented May 16, 2018

terencechain commented May 16, 2018

rosulucian commented Dec 12, 2018

Exploring the Current Client Protocol Architecture #122

Exploring the Current Client Protocol Architecture #122

Comments

rauljordan commented May 16, 2018

Case Study: Light Client Nodes

How is a Light Node Triggered?

How Do These Attached Services Start Running?

The Light Ethereum Package

Light Protocol Event Loop

Key Takeaways

terencechain commented May 16, 2018

rauljordan commented May 16, 2018

rauljordan commented May 16, 2018

Design

terencechain commented May 16, 2018

rosulucian commented Dec 12, 2018