Nearup rewrite #81

chefsale · 2020-07-02T08:01:40Z

Currently the nearup is written in Python without any external packages, so a lot of the logic is cumbersome and implemented in a sub-optimal way. Nearup is in the critical path for our network and a lot of validators are depending on it.

In the current state this is a liability and we should put effort into making it production ready and reliable.

Main proposal was rewriting the whole nearup application in Rust making it strongly typed and compiled, so we can catch issues faster at compile time. As well rust provides all the necessary crates needed to make this happen:

Shell related:
Github API
- Rust Crates: https://crates.io/search?page=1&per_page=10&q=github%20api
- GraphQL (https://crates.io/search?q=graphql)

The other option was to use GoLang, but that would introduce another language and ecosystem which is unnecessary, as it can be all done in rust.

The general idea is to rewrite nearup and replace our current deployments of devnet, betanet and testnet with nearup everywhere. This would deprecate a lot of the complicated deployment setup we have and we could just deploy either prebaked nearup images (docker/packer).

Nearup would provide the ability to be configured in a way:

to automatically update itself or not
to automatically run the latest version of a specified release phase (stable, rc, beta) or a specific version
provide support for joining betanet, testnet, localnet or other custom network if needed
support a canary mode which we could use to test out nodes for a specified period of time (this would be a replacement for devnet)
- this would be used on every commit to validate that master is still backwards compatible and working
- this would be used to run a regular node on TestNet: node syncs from scratch and keeps up with the head
- this would be used to run a validator node on Testnet: node starts from existing state and make sure there is no block production failures and this node is continues to be a validator
- run an RPC node on Testnet : make sure there is no failures in RPC calls

As we plan to support flag to force next protocol version in neard this should be also supported in the nearup configuration.
neard --protocol_version=6` or `neard --next_protocol

cc: @bowenwang1996 @ailisp @damons @frol

The text was updated successfully, but these errors were encountered:

mfornet · 2020-07-02T14:01:04Z

If we write nearup in rust how are we going to distribute it. The current one-liner for nearup is really cool, but ofc if relies on users having python working out of the box.
I guess we could still have similar one-liner and:

ask if they want to download pre-compiled binary or
Install nearup from crates.io with cargo install (potentially installing cargo first).

chefsale · 2020-07-02T15:56:30Z

Yes, I think we could do both crate + a cool script which does it or even we could add it to the distro specific package managers, add support for:

apt-get install nearup
dnf install nearup
etc...

That would be nice as well.

ailisp · 2020-07-02T17:00:14Z

I don't suggest we maintain a apt and dnf repo, we need self update nearup, with apt/dnf this needs sudo. A binary + a nearup-init.sh sounds good, it's how rustup works (rustup-init.sh/rust-init.bat + rustup binary)

ailisp · 2020-07-02T17:09:57Z

I agree with a binary+shell script to distribute but I don't think rewrite in rust is necessary

Python has decent packaging solution, AppImage, or https://www.pantsbuild.org/index.html, https://buck.build/, https://bazel.build/ suggested by @chefsale
we don't have to rust to avoid bugs, the major bugs of an devops tool like nearup is not type errors, but imo a series of integration activity: downloading, github integration, subprocess controlling, etc. even in rust it needs to be covered with same suite of integration test but rewrite in rust some and adding new features take more time than use python, so I don't think it's a good idea.
python has also prove its ability and rich ecosystem for using in industrial strength devops tools (gcloud-cli, aws cli, azure cli, ansible, saltstack, openstack are all written in python), but rust have not (rustup's logic is simpler compare to this tools, even compare to nearup)

chefsale · 2020-07-02T17:21:16Z

Agree with @ailisp on maybe sticking to Python, as well. Happy to go with either solutions, happy to hear other peoples opinions, obviously there's cons and pros :)

frol · 2020-07-02T17:33:33Z

Even though I dream to have nearup, near-shell, and rainbow cli implemented in Rust, I want to make sure we weigh all the pros and cons of a rewrite.

we don't have to rust to avoid bugs, the major bugs of an devops tool like nearup is not type errors, but imo a series of integration activity

Python is great for happy-path scripting, but handling corner cases (real world is scary) requires all-catching try-except (a single call to requests.get may throw a myriad of types of exceptions ranging from low-level native Python exceptions to high-level errors), while Rust make all of that explicit (you may choose to ignore the errors with .unwrap() / .expect(), but you can easily identify them in next iterations when you ready to handle them).

I believe that the maintainability [reliability over time after refactorings] of Rust code is much greater than Python. Also, once you get from PoC to a reliable CLI in Python, the amount of code is the same or even more than in Rust, I believe.

Still, we already have the implementation in Python, so we should be careful about re-implementation.

/cc @ilblackdragon @nearmax @khorolets

chefsale · 2020-07-02T17:54:11Z

I agree on that as well, still we have to take into account that we cannot really reuse much of the current code, so personally I believe it would be easier to start from scratch either in python or rust.

ailisp · 2020-07-02T17:56:39Z

Python is great for happy-path scripting, but handling corner cases (real world is scary) requires all-catching try-except (a single call to requests.get may throw a myriad of types of exceptions ranging from low-level native Python exceptions to high-level errors), while Rust make all of that explicit (you may choose to ignore the errors with .unwrap() / .expect(), but you can easily identify them in next iterations when you ready to handle them).

Unfortunately python is not java and it's hard to find all possible exceptions could raise from an lib function :( So in a robust python package, inclined to use only std functions or libraries that has well wrapped and documented type of exceptions. requests unfortunately is not, so we have to wrap it, enforce write python error handling in rust-like way:

a module can only raise exception defined in the same module, raise anything else is a bug of this module
higher level module call low level module functions, must catch low level module exception (and only these exceptions) and
- if don't handle, wrap low level module error into a error class defined in current module
- or handle it.

Ignore handle some error exception is implicit means unwrap in rust :(

I believe that the maintainability [reliability over time after refactorings] of Rust code is much greater than Python. Also, once you get from PoC to a reliable CLI in Python, the amount of code is the same or even more than in Rust, I believe.

So this is possibly true, unless in practice some parts never fail. And write same amount of (well error handled) code in python is faster than write in rust, especially in this use case, rust static check can't help detect errors of integrating things, need edit-recompile-test quite a few times.

frol · 2020-07-05T22:13:29Z

After a brief discussion with @ilblackdragon, we identified that before shooting for any major refactoring on nearup side, we should make sure that neard (nearcore) CLI is good enough to be running without nearup in the first place. After that, we can draw the requirements for nearup and decide on the language and packaging strategy.

chefsale · 2020-07-06T09:03:07Z

So, can you provide more context on what would be needed to be done in the nearcore CLI side? cc: @ilblackdragon @frol

chefsale added enhancement labels Jul 2, 2020

chefsale self-assigned this Jul 2, 2020

frol mentioned this issue Jul 16, 2020

neard init should NOT use the same key-pair for validator-key and test.near access key near/nearcore#2995

Open

chefsale added the T-SRE label Jul 16, 2020

frol mentioned this issue Aug 13, 2020

Introduce --boot-nodes argument to neard init near/nearcore#3156

Closed

chefsale closed this as completed Sep 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nearup rewrite #81

Nearup rewrite #81

chefsale commented Jul 2, 2020 •

edited

Loading

mfornet commented Jul 2, 2020

chefsale commented Jul 2, 2020

ailisp commented Jul 2, 2020

ailisp commented Jul 2, 2020

chefsale commented Jul 2, 2020

frol commented Jul 2, 2020

chefsale commented Jul 2, 2020

ailisp commented Jul 2, 2020

frol commented Jul 5, 2020

chefsale commented Jul 6, 2020

Nearup rewrite #81

Nearup rewrite #81

Comments

chefsale commented Jul 2, 2020 • edited Loading

mfornet commented Jul 2, 2020

chefsale commented Jul 2, 2020

ailisp commented Jul 2, 2020

ailisp commented Jul 2, 2020

chefsale commented Jul 2, 2020

frol commented Jul 2, 2020

chefsale commented Jul 2, 2020

ailisp commented Jul 2, 2020

frol commented Jul 5, 2020

chefsale commented Jul 6, 2020

chefsale commented Jul 2, 2020 •

edited

Loading