feat(system-service-deployer): introduce new system service deployment system [fixes NET-487] #1623

kmd-fl · 2023-06-05T17:09:33Z

New crate system-service-deployer. Installs all system services.
Move decider and aqua-ipfs config to the node's config. Added default values for easier running.
Refactor other small things a bit. Maybe, I needed to do it in a separate commit/pr, sorry.

Atm in the pr there are some after-testing stuff (like crates with "-test" suffix) that I didn't remove yet, but I'll do it.

Some description of the PR.

Service Configuration

Service configuration for aqua-ipfs and decider is moved to the node's config:

[system_services]
enable = [ 
    "aqua-ipfs",
     "registry"
]

[system_services.decider]
decider_period_sec = 1000
worker_period_sec = 123

[system_services.aqua_ipfs]
local_api_multiaddr = "/ip4/127.0.0.1/tcp/5001"

UPD: added enabled services

We may additionally list the system services we want to be deployed. By default, we enable ALL system services.

Note that atm newly disabled services aren't removed yet

Default values that we use in our network are used as default values, so right now, we don't need
to change the config in the nox-distro.

However, configuration for trust-graph (fluence certificates) is distributed within the crate, since
we decided that this configuration is part of the trust-graph and not of the node.

Old ENVs are supported now and overrides the configuration file.

System Service Distribution

There's no unified approach for this atm since I don't know how to provide a unified library for it.

I come up with the following approach. These structures are now created at the system service deployer

// For describing services
struct ServiceDistro {
    modules: HashMap<&'static str, &'static [u8]>,
    config: &'static [u8],
    name: &'static str, // is used as alias
}

// For describing spells
struct SpellDistro {
    name: &'static str, // is used a alias
    air: &'static str, // air script of the spell
    kv: HashMap<&'static str, JValue>, // initial KV of the spell
    trigger_config: TriggerConfig,
}

It's a slightly more difficult situation for spells since we need somehow to provide initial values from the node's config
in the format for a spell to understand.

Atm for decider, we provide a structure with the config to the decider distro crate; the crate takes the values
and returns us SpellDistro with spell-compatible KV init data.

I also wanted to create a PackageDistro which would unite spells and services of one package (like with decider),
but it doesn't unify properly because of the need to initialize aqua-ipfs and trust-graph.

Running deployer

System services deployment happens after running the main node loop. This is required for subscribing
system spells.

There's no specific reason for it. Maybe we need to move it to the main node loop, to the initialization phase,
where all the node's subsystems are started. This will allow us to stop the node easily if the system service deployer
fails.

ATM if the deployer wasn't able to deploy everything on the first try, it will stop trying again, and will do nothing

I will change it in the following PR.

Deployment process

Atm we need to install 3 stand-alone services and 1 spell with an aux service:

aqua-ipfs, requires initialization with local and external api multi addresses of an IPFS node, provided in the node config; note that these values aren't very much in use in our spells, maybe we can remove them
registry
trust-graph, requires initialization with the Fluence certificates distributed with the service
decider and its connector, require a load of initial values for the spell provided in the node config; the connector itself doesn't need any initialization

Deployment of every service and spell happens similarly:

Detect if we need to install/update the service
Remove old service/spell
Install/Update
Initialize if needed

In the code, the 1 and 2 steps are united in one function, deploy_system_service and deploy_system_spell.

Find existing services and spells

The rule is simple: two services/spells are the same if they share the same alias.

For services, if the new blueprint is different from the old one, install a new service

For spells, we don't compare blueprints, only scripts, and trigger configs. If the script or the config
is different, we update the spell.

Remove old service/spell

If we found existing service/sepll and detected that we need to update it, now first we will remove the old instance.

For services, we just plainly try to remove a service.

For spells, first, we try to unsubscribe them from the triggers. We use the same function as for Spell.remove. If removing fails,
we try just to unsubscribe the spell and update its trigger config to an empty trigger config to avoid resubscribing on restart.

ATM we install a new service/spell despite how the removing processes ended

I think we will change it in the following PRs.

Service/Spell Deployment

Service installation doesn't require much:

Add modules of the service to the module repo to get a blueprint
Create a service
Add an alias

For spells:

Create a new spell (using the function as in Spell.install)
Add an alias

System Service Owner

The owner of system services (aka controller) is the node itself, services are controlled by HOST_PEER_ID.
Management PeerId is used only for assigning aliases.

This is done with way because worker_id and owner_id in the current node implementation are considered to be the same entities.

It would be nice to be able to control the system services with Management PeerId.

Initialization

Initialization happens after installation. It's not a unified process, it requires manual
implementation. For example, aqua-ipfs requires to set API multiaddr by calling two functions
set_local_api_multiaddr and set_external_api_multiaddr; on the other hand, trust-graph
wants us to call set_root with an address of a root node and also call insert_cert for
every provided certificate from the provided array.

Previously, service initialization was implemented via air scripts which were provided by the services.
We removed this for the node's simplification.

One of the approaches to unify this initialization step is to ask for the services to provide
setup or init functions, so the system deployer could do smth like

fn initialize_service(service: ServiceDistro, service_config: Option<ServiceSpecificConfig>) {
    let init_data_json = service.init_data(service_config);
	self.call_service(service.name, service.init_function_name, service.init_data_json);
	...
}

Cons:

Required rewriting of the existing services
Probably, we need to avoid services initialization and leave this possibility only to spells

Service Calls

During the deployment process, we use

const SYSTEM_SERVICE_DEPLOYER_TTL: u64 = 60_000;

as TTL for the service calls.

So, ATM if the service is too long to call, the whole initialization will fail. Good to know, right?

Questions

Do we need to stop the node if the system deployer fails?
- Yes, we want.
Do we want to allow service initialization? Do we want to leave this option to spells only (via init data or KV)?
- Yes, we want to allow system service initialization.
Do we want to assign the HOST_PEER_ID as worker_id for the system services? Can't assign Management PeerId.
- For now, we will use HOST_PEER_ID as an owner of all system services and spells.
Do we want to remove old services? Do we want to remove old spells?
- Yes, we want to remove old services. We want to try to update old spells.
How to update a spell without erasing the old state? How to check if the update is a breaking change for the old state?
- We will try to implement it in the next PRs.
Do we want to be able to re-initialize or re-install services/spells on the changes in the system service config which is a part of the node's config? Some services cannot be re-initialized, only reinstalled (like aqua-ipfs)
- We want to try when applicable. Try to re-initialize spells KV on each run.

Cargo.toml

crates/server-config/src/defaults.rs

crates/server-config/src/system_services_config.rs

particle-builtins/src/builtins.rs

particle-node/src/lib.rs

particle-node/src/node.rs

sorcerer/src/lib.rs

sorcerer/src/spell_builtins.rs

crates/system-service-deployer/src/lib.rs

folex

Well done!

folex · 2023-06-06T18:08:18Z

[system_services_config.decider]

Maybe remove _config part? Just [system_services.decider] would look nicer as a config parameter

crates/nox-tests/tests/builtin.rs

crates/server-config/src/system_services_config.rs

crates/system-services/src/lib.rs

nox/src/node.rs

kmd-fl added 3 commits June 5, 2023 17:29

Add configs for system services

fc6dd56

Add basic deployer

2e0a7c1

Redeploy system builtins when needed

d918c11

kmd-fl requested review from gurinderu, folex and justprosh June 5, 2023 17:09