Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Script #22

Closed
Sipheren opened this issue Jul 3, 2023 · 21 comments
Closed

Update Script #22

Sipheren opened this issue Jul 3, 2023 · 21 comments

Comments

@Sipheren
Copy link

Sipheren commented Jul 3, 2023

HI,

I tried to use the update script with this and I ended up with a few issues, would appreciate some help if possible.

Firstly, the script wouldnt run as line 17 has an issue:

david@plsnode01:~/pulsechain-validator$ sh update-client.sh 
update-client.sh: 17: Syntax error: "(" unexpected

So I just commended out this line:

#function sigint() {
#    exit 1
#}
david@plsnode01:~/pulsechain-validator$ sh update-client.sh 
-e ARE YOU SURE YOU WANT TO GO OFFLINE TO STOP, UPDATE AND RESTART PULSECHAIN CLIENTS ON THE VALIDATOR?

-e * it could take 30 - 60 minutes to complete -- depending mostly on bandwidth and server specs *

Hit [Enter] to Continue OR Ctrl+C to Cancelupdate-client.sh: 28: read: arg count
-e 
Step 1: Stop PulseChain clients (Geth and Lighthouse)
[sudo] password for david: 
sudo: node: command not found
-e 
Step 2: Pull updates and rebuild clients

fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
info: using existing install for 'stable-x86_64-unknown-linux-gnu'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.70.0 (90c541806 2023-05-31)

fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
-e 
Step 3: Starting PulseChain clients
-e 
Process is complete

So I dont think it ended up doing anything.

Thanks

@gazaz
Copy link

gazaz commented Jul 3, 2023

Same with me - Didn't get the line 17 syntax error , but did get the rest of the update error you had. Stopped and nothing else had updated.

ARE YOU SURE YOU WANT TO GO OFFLINE TO STOP, UPDATE AND RESTART PULSECHAIN CLIENTS ON THE VALIDATOR?

  • it could take 30 - 60 minutes to complete -- depending mostly on bandwidth and server specs *

Hit [Enter] to Continue OR Ctrl+C to Cancel

Step 1: Stop PulseChain clients (Geth and Lighthouse)
[sudo] password for xxxxxxxx:
sudo: node: command not found

Step 2: Pull updates and rebuild clients

fatal: not a git repository (or any of the parent directories): .git
info: using existing install for 'stable-x86_64-unknown-linux-gnu'
info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'

stable-x86_64-unknown-linux-gnu unchanged - rustc 1.69.0 (84c898d65 2023-04-16)

fatal: not a git repository (or any of the parent directories): .git

Step 3: Starting PulseChain clients

Process is complete

@wishbonesr
Copy link

@gazaz, @Sipheren,
There are actually several problems.
Line 42: sudo -u $NODE_USER $NODE_USER -c ...etc.
Should read: sudo -u $NODE_USER bash -c ...etc.

Additionally, there is a flaw in the setup script which causes this script to fail to make the git pull; Which doesn't copy the hidden .git repository metadata to the destination /opt/geth, or /opt/lighthouse.
I would add to the [config] section Line 41 of pulsechain-validator-setup.sh the following
shopt -s dotglob
One might take fore granted that this is a common item to enable for admins, and might be something added to ~/.bashrc

Solution:
Rebuild the server from scratch or manually remove and rebuild the geth and lighthouse repositories. Follow the steps in the setup script, but add the shopt -s dotglob.
Don't forget to re-add the symlink - see pulsechain-validator-setup.sh, Line 199

@rhmaxdotorg
Copy link
Owner

Thanks all for the details!

And of course apologies, I guess the update script wasn't tested as well as I thought after I made some major updates to the setup script a while back. Working on it!

Great insight on the shopt as well! Will update and test scripts again.

update-client.sh has been updated with the primary bug which was the typo fix:

cf3bba6

Also updated the setup script:

547a6e4

Gimme some time and I'll work on the steps and post them so you don't have to rebuild the clients, ETA today sometime if all goes well.

@Sipheren
Copy link
Author

Sipheren commented Jul 3, 2023

Awesome, thanks for all the replies.

@gazaz
Copy link

gazaz commented Jul 4, 2023

Brilliant thanks for working on it !

@rhmaxdotorg
Copy link
Owner

rhmaxdotorg commented Jul 4, 2023

@wishbonesr thank you so much for the detailed analysis, super helpful.

It looks like because the script does (line 137) sudo mv $GETH_REPO_NAME/* $GETH_DIR (instead of using cp -R most places, which I think it used to be in the past) it's not picking up the hidden git files and then the script removes them from the home directory, so no chance to copy them over post-setup (which could have been a quick solution for people now).

So I do agree the most straightforward way going forward is to do a quick rebuild, which thankfully should take around the same time and running the update client would have. It's fast because by default the reset-validator.sh script keeps blockchain data, so no need to resync everything (however it may take a few hours or more to sync back what it needs). The only other thing is you'll need to import your keys to lighthouse again, which should only take a few minutes for most folks (who aren't running 100+ validators, if you are, sorry for the 100+ times you need to type in your password, but thank you very, very much for your service :).

Here's what I've just did to update my validator.

  1. Do a git pull in the pulsechain-validator scripts directory to upgrade to the latest scripts (or delete the scripts directory and git clone https://github.com/rhmaxdotorg/pulsechain-validator.git to download the scripts folder again then chmod +x *.sh in the directory to make them executable)
  2. Do pico reset-validator.sh and change line 6 from I_KNOW_WHAT_I_AM_DOING=false to I_KNOW_WHAT_I_AM_DOING=true, then ctrl+x to exit, it will ask you to save so say y and Enter to save the changes.
  3. Now ./reset-validator.sh and hit Enter to reset the validator (may take up to 30 seconds)
  4. Run the setup again (if you want to find your original setup command in history by doing grep pulsechain-validator-setup.sh ~/.bash_history) with ./pulsechain-validator-setup.sh 0xfee-address 11.22.ip.addr
  5. Copy your validator_keys (eg. first command below assumes they are in your /home/ubuntu directory, if not then modify the command to copy them from where ever they are located) to the node user's home director and import them.
sudo cp -R ~/validator_keys /home/node
sudo chown -R node:node /home/node/validator_keys

sudo -u node bash
cd ~

/opt/lighthouse/lighthouse/lh account validator import --directory ~/validator_keys --network=pulsechain

After you've entered the wallet password for each validator and it's complete, you should see process completes successfully.

Successfully imported keystore.
Successfully updated validator_definitions.yml.

Successfully imported X validators (0 skipped).

Then start the lighthouse validator client and after the (hopefully brief) syncing completes, you should be back to validating!

sudo systemctl start lighthouse-validator

You can check the status of the clients using this command (hitting enter or spacebar to scroll down for more status important):

sudo systemctl status geth lighthouse-beacon lighthouse-validator

So just to be clear, if you are wanting to update your clients, the update script will not work without doing the quick rebuild which means you get the updated scripts, reset the validator (it keeps most of the blockchain data automatically) and run the setup script again and then you will be back validating on the latest and greatest network on earth (with the latest client updates too as the script pulls and uses the new clients).

After a few hours, the validator was back to Active, 99% effectiveness and earning fees again.

Again, apologies for the issue updating clients, and of course I pushed fixes as soon as a I could.

@gazaz
Copy link

gazaz commented Jul 4, 2023

Worked a treat thanks, validator updated and back printing PLS :)

Thanks for the hard work.

@rhmaxdotorg
Copy link
Owner

rhmaxdotorg commented Jul 4, 2023

Also, there have been suggestions of a way to avoid the "quick rebuild" by doing something like this.

  1. Pull the latest update-client.sh script (fixed typo that prevented it from running properly)
  2. Clone the geth and lighthouse repos (may require at the specific version you're running running, for example v2.2.0)
  3. Copy the hidden files to the appropriate /opt client home directories
  4. Run the update-client.sh script

If anyone decides to tinker with this and verify it works, feel free to let us know the process. However, even with a few hours downtime of the validator, it still seems like the safest/most tested method as of now is the minimal rebuild process described in the prior post.

@Sipheren
Copy link
Author

Sipheren commented Jul 5, 2023

@wishbonesr thank you so much for the detailed analysis, super helpful.

It looks like because the script does (line 137) sudo mv $GETH_REPO_NAME/* $GETH_DIR (instead of using cp -R most places, which I think it used to be in the past) it's not picking up the hidden git files and then the script removes them from the home directory, so no chance to copy them over post-setup (which could have been a quick solution for people now).

So I do agree the most straightforward way going forward is to do a quick rebuild, which thankfully should take around the same time and running the update client would have. It's fast because by default the reset-validator.sh script keeps blockchain data, so no need to resync everything (however it may take a few hours or more to sync back what it needs). The only other thing is you'll need to import your keys to lighthouse again, which should only take a few minutes for most folks (who aren't running 100+ validators, if you are, sorry for the 100+ times you need to type in your password, but thank you very, very much for your service :).

Here's what I've just did to update my validator.

1. Do a `git pull` in the pulsechain-validator scripts directory to upgrade to the latest scripts (or delete the scripts directory and `git clone https://github.com/rhmaxdotorg/pulsechain-validator.git` to download the scripts folder again then `chmod +x *.sh` in the directory to make them executable)

2. Do `pico reset-validator.sh` and change line 6 from `I_KNOW_WHAT_I_AM_DOING=false` to `I_KNOW_WHAT_I_AM_DOING=true`, then `ctrl+x` to exit, it will ask you to save so say `y` and Enter to save the changes.

3. Now `./reset-validator.sh` and hit Enter to reset the validator (may take up to 30 seconds)

4. Run the setup again (if you want to find your original setup command in history by doing `grep pulsechain-validator-setup.sh ~/.bash_history`) with ./pulsechain-validator-setup.sh 0xfee-address 11.22.ip.addr

5. Copy your validator_keys (eg. first command below assumes they are in your /home/ubuntu directory, if not then modify the command to copy them from where ever they are located) to the node user's home director and import them.
sudo cp -R ~/validator_keys /home/node
sudo chown -R node:node /home/node/validator_keys

sudo -u node bash
cd ~

/opt/lighthouse/lighthouse/lh account validator import --directory ~/validator_keys --network=pulsechain

After you've entered the wallet password for each validator and it's complete, you should see process completes successfully.

Successfully imported keystore.
Successfully updated validator_definitions.yml.

Successfully imported X validators (0 skipped).

Then start the lighthouse validator client and after the (hopefully brief) syncing completes, you should be back to validating!

sudo systemctl start lighthouse-validator

You can check the status of the clients using this command (hitting enter or spacebar to scroll down for more status important):

sudo systemctl status geth lighthouse-beacon lighthouse-validator

So just to be clear, if you are wanting to update your clients, the update script will not work without doing the quick rebuild which means you get the updated scripts, reset the validator (it keeps most of the blockchain data automatically) and run the setup script again and then you will be back validating on the latest and greatest network on earth (with the latest client updates too as the script pulls and uses the new clients).

After a few hours, the validator was back to Active, 99% effectiveness and earning fees again.

Again, apologies for the issue updating clients, and of course I pushed fixes as soon as a I could.

Thanks for this, I am looking to go through this myself on the weekend. I have run the reset script once before during testing, works fine and doesn't take all that long to be back up and synced.

Also, major thanks for providing this repo and maintaining it in the first place, was wasting a lot of time trying to get everything setup manually or using dockers, this script was a godsend :)

Cheers

@Sipheren
Copy link
Author

Sipheren commented Jul 5, 2023

Also, there have been suggestions of a way to avoid the "quick rebuild" by doing something like this.

1. Pull the latest `update-client.sh` script (fixed typo that prevented it from running properly)

2. Clone the geth and lighthouse repos (may require at the specific version you're running running, for example v2.2.0)

3. Copy the hidden files to the appropriate /opt client home directories

4. Run the `update-client.sh` script

If anyone decides to tinker with this and verify it works, feel free to let us know the process. However, even with a few hours downtime of the validator, it still seems like the safest/most tested method as of now is the minimal rebuild process described in the prior post.

EDIT: All good, just had to re-add the metrics flags and that to the .service files, reload the daemon and restart the services. :)

Question, after the validator-reset script is run, should the Grafana dashboards all just kick back in or do I need to remove and re-run that script also?

They are all these and setup how I like but none seem to be getting any data, guessing the new install of geth and that doesnt link up the the db or something?

@wishbonesr
Copy link

In case some folks find themselves unable to update this repo to their node (because you needed to edit the the safety switches in the scripts), git will tell you that you need to commit first. Since this is a one way operation, it's ok to set the HEAD of the clone back to the last clone/pull. Do this to avoid having to purge and re-clone via the url.
Ex. - do this in your local "pulsechain-validator" repository folder.

git reset --hard
git pull

@wishbonesr
Copy link

wishbonesr commented Jul 6, 2023

@rhmaxdotorg,
To help close this out, I spun up a new instance, and went through all the steps, plus the update. All appears fine, and this issue resolved.
Though it appears the corrections have addressed this Issue; Just so you know, it appears that lighthouse dependency on libsecp256k1, is not compiling on this EC2 instance.
Spun up a VBox VM (same OS/ver), no issues.

EC2 - test instance error during install and/or update:

 make
cargo install --path lighthouse --force --locked \
        --features "jemalloc" \
        --profile "release" \

  Installing lighthouse v2.3.0 (/opt/lighthouse/lighthouse)
    Updating crates.io index
warning: package `hermit-abi v0.3.1` in Cargo.lock is yanked in registry `crates-io`, consider running without --locked
   Compiling libsecp256k1 v0.7.1
error: could not compile `libsecp256k1` (lib)

Caused by:
  process didn't exit successfully: `rustc --crate-name libsecp256k1 --edition=2018 /home/node/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libsecp256k1-0.7.1/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --diagnostic-width=132 --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="hmac"' --cfg 'feature="hmac-drbg"' --cfg 'feature="sha2"' --cfg 'feature="static-context"' --cfg 'feature="std"' --cfg 'feature="typenum"' -C metadata=46bf597dfbc4d6db -C extra-filename=-46bf597dfbc4d6db --out-dir /opt/lighthouse/target/release/deps -L dependency=/opt/lighthouse/target/release/deps --extern arrayref=/opt/lighthouse/target/release/deps/libarrayref-160eae64aec5652b.rmeta --extern base64=/opt/lighthouse/target/release/deps/libbase64-9730bbaf724d0579.rmeta --extern digest=/opt/lighthouse/target/release/deps/libdigest-f552d1e4eba7ee5f.rmeta --extern hmac_drbg=/opt/lighthouse/target/release/deps/libhmac_drbg-db97445cb76fa586.rmeta --extern libsecp256k1_core=/opt/lighthouse/target/release/deps/liblibsecp256k1_core-e3baea4ccd3f1c72.rmeta --extern rand=/opt/lighthouse/target/release/deps/librand-01696e725a8c1af3.rmeta --extern serde=/opt/lighthouse/target/release/deps/libserde-c6f169afd9b20a69.rmeta --extern sha2=/opt/lighthouse/target/release/deps/libsha2-4b6da5a5c83321d4.rmeta --extern typenum=/opt/lighthouse/target/release/deps/libtypenum-8b1200c22ebb4b54.rmeta --cap-lints allow` (signal: 9, SIGKILL: kill)
error: failed to compile `lighthouse v2.3.0 (/opt/lighthouse/lighthouse)`, intermediate artifacts can be found at `/opt/lighthouse/target`
make: *** [Makefile:48: install] Error 101

@rhmaxdotorg
Copy link
Owner

Good to hear, @wishbonesr!

For the EC2 instance, were you using Ubuntu Linux (22.04) or Amazon Linux (probably the first choice/default)?

I've only tested/supported Ubuntu 22.04 for the script, so it may work on other OSes, but not strictly supported.

However, from googling it looks like sudo apt-get install libsecp256k1-0 would install it via system package manager on Ubuntu, but not sure about other Linux distros.

If you did come across this using Ubuntu 22.04, this package could be added to the APT_PACKAGES list in the script. Let me know if you want to test that while you're seeing the error, doesn't seem like it would hurt to add it anyways, but just trying to keep it as minimal packages as necessary to support the validators.

@wishbonesr
Copy link

wishbonesr commented Jul 6, 2023

@rhmaxdotorg,
It was Ubuntu 22.04
I'll probably give it another shot this weekend, as I've already term'd the test instance last night.
I have a feeling a tweak of the lighthouse make file will also be required to eliminate libsecp256k1, if installing straight from jammy ports...but yeah, I'll get back with you on that.

Note: I could only afford one validator, and it's already up and running (so no urgency on my part) - just wanted to help out.
If I can replicate the lighthouse compile failure, I'll start another issue for libsecp256k1, so this issue isn't polluted, and can be closed.

@nicogranuja
Copy link

@wishbonesr

I ran into the same issue when using the t2.micro free tier ec2 instance, the issue went away when I switched to the same instance type I have for my validator, which leads me to believe the compilation error is caused due to resource constraints.

On the other hand, I was able to test fixing the git repos and rebuilding the clients on a test instance. The steps I took were:

1. Change to be node user `sudo -u node bash & cd ~`
   1. git clone https://gitlab.com/pulsechaincom/lighthouse-pulse
   2. git clone git clone https://gitlab.com/pulsechaincom/go-pulse 
2. Copy the `.git` folder `copy -R {repo}/.git /opt/{geth|lighthouse}`
3. Go to /opt/geth and lighthouse and run `git reset --hard` 
4. Add `data` folder to `.gitignore` for both repos (Since next step removes untracked files, and we effectively fast forwarded current HEAD to main doing the reset hard)
5. Remove the files and folders that are not tracked anymore by running `git clean -f -d` (If we don't remove untracked files at least for GETH there will be duplicate structs, causing a compilation error)
6. Run the update script from this repo
7. After running the update script copy `cp /opt/lighthouse/target/release/lighthouse /opt/lighthouse/lighthouse/lh` (Is this normal after upgrading? Can someone please confirm?)
8. Cleanup the cloned repos

Due to step 7 I have not run these steps in my actual validator can someone please confirm that new lighthouse binary is not where the service config file expects it to be? Or is there something wrong with this method?

@nicogranuja
Copy link

nicogranuja commented Jul 7, 2023

Ok I have run the above process on a live validator service, and wrote a helper script make sure to run it after switching users: sudo -u node bash

The following method avoids the first alternative and prevents you from having to re-add all the keys so it is handy if you have a lot of validators, also, see closing thoughts on ideas on how you can reduce the validator downtime during updates about ~99%

set -eo pipefail

cd /home/node

function cleanup() {
   echo "Cleanup started"

   git reset --hard # Set all files back to match `main``
   echo -e "\ndata/" >> .gitignore # add data folder to the gitignore so its not cleaned up by following command
   echo -e "\nlighthouse/lh" >> .gitignore # add symbolic link created by setup script to gitignore
   git clean -f -d # Remove files that are not in `main` anymore and not tracked by git
   git reset --hard # Reset gitignore
}

echo "Cloning go pulse repo..."

git clone https://gitlab.com/pulsechaincom/go-pulse 
cp -R go-pulse/.git /opt/geth
pushd /opt/geth
cleanup
popd

echo "Cloning lighthouse repo..."

git clone https://gitlab.com/pulsechaincom/lighthouse-pulse
cp -R lighthouse-pulse/.git /opt/lighthouse
pushd /opt/lighthouse
cleanup
popd

echo "Removing cloned repos..."

rm -rf go-pulse && rm -rf lighthouse-pulse

echo "Done now run the update script"

After it's done you can pull the latest from repo: https://github.com/rhmaxdotorg/pulsechain-validator and run the update-client.sh script.

Closing thoughts

The fact that the lighthouse binary is created elsewhere gives us the flexibility to update it without barely any downtime, Rust building is dog slow, taking about 45 mins - 1 hour whereas go builds (go-pulse [geth]) are very fast and don't have this problem, in the future we could leverage make to add the builds to a different location, so that we could keep the lights on until it's time to replace the binary, this would make the update process go from a 45min- 1 hour downtime to under 30 seconds enough time to stop clients, copy/replace the new binaries and re-start them again.

Right now pulse is cheap and it might be expensive to be down for 1 hour if you have a lot of validators or if PLS moons, the idea above would solve this.

@rhmaxdotorg
Copy link
Owner

rhmaxdotorg commented Jul 7, 2023

Awesome! Thanks for the script and interesting details in the closing thoughts @nicogranuja!

Just a question on this part:

WARNING

If you setup your validators using the setup script from this repo, the lighthouse built binary location will be expected to be in /opt/lighthouse/lighthouse/lh however, the new binary will not be there and if you cleaned up this binary using the script above, you will need to add this line before restarting clients in update-client.sh line 53

Just to clarify, are you suggesting any code changes to the update script?

Or is adding the sudo -u $NODE_USER bash... line to update-client.sh only necessary if someone uses the helper script you shared?

As the steps of setting up a validator with the setup script and then running the update-client.sh script after a new version is released shouldn't affect the lh link, but there's a few pieces to this and the setup and update script have some differences, so maybe add this before Starting PulseChain clients @ https://github.com/rhmaxdotorg/pulsechain-validator/blob/main/update-client.sh#L54C1-L54C48

sudo -u $NODE_USER ln -s /opt/lighthouse/target/release/lighthouse /opt/lighthouse/lighthouse/lh

Since the script and service file want the latest lighthouse binary to point to /opt/lighthouse/lighthouse/lh.

I wondered if you tested/saw this or otherwise agree (since you seem to have a better testing environment than me right now :)

Misc

Checking your Lighthouse and Geth versions

$ sudo /opt/lighthouse/lighthouse/lh --version
Lighthouse Lighthouse-Pulse/v2.3.0-de8e0a0
$ /opt/geth/build/bin/geth --version
geth version 3.0.0-pulse-stable-7975e02e

@nicogranuja
Copy link

@rhmaxdotorg

Just to clarify, are you suggesting any code changes to the update script?
Yes, will add more details at the end.

Or is adding the sudo -u $NODE_USER bash... line to update-client.sh only necessary if someone uses the helper script you shared?

Actually yes, turns out the symbolic link created by the setup script here: https://github.com/rhmaxdotorg/pulsechain-validator/blob/646a642d7414c3fbebafcb02f5ab4dcc4c338afb/pulsechain-validator-setup.sh#L201C8-L201C8 was being deleted since it is an untracked file, I have updated my comment above to add it to the .gitignore file temporarily so that it is not cleaned up.

As the steps of setting up a validator with the setup script and then running the update-client.sh script after a new version is released shouldn't affect the lh link, but there's a few pieces to this and the setup and update script have some differences, so maybe add this before Starting PulseChain clients @ https://github.com/rhmaxdotorg/pulsechain-validator/blob/main/update-client.sh#L54C1-L54C48
sudo -u $NODE_USER ln -s /opt/lighthouse/target/release/lighthouse /opt/lighthouse/lighthouse/lh

Thanks for the MISC section, I have confirmed my suspicions as it turns out, the Makefile for lighthouse (or Rust itself) builds the binary in two places: /opt/lighthouse/target/release/lighthouse and the place used in the setup script /home/$NODE_USER/.cargo/bin/lighthouse

node:~$ .cargo/bin/lighthouse --version
Lighthouse Lighthouse-Pulse/v2.3.0-de8e0a0
BLS library: blst
SHA256 hardware acceleration: true
Allocator: jemalloc
Profile: release
Specs: mainnet (true), minimal (false), gnosis (false), pulsechain (true)
node:~$ /opt/lighthouse/target/release/lighthouse --version
Lighthouse Lighthouse-Pulse/v2.3.0-de8e0a0
BLS library: blst
SHA256 hardware acceleration: true
Allocator: jemalloc
Profile: release
Specs: mainnet (true), minimal (false), gnosis (false), pulsechain (true)

TLDR; everything looks good, there is no issue with setup script approach of symbolic link, as long as we don't clear the untracked file

Proposed changes

I will propose and if time permits I will raise a merge request, to reduce the downtime while upgrading the clients, the idea is simple, let's leverage that we can "rug" the binaries while these are running and start the build process without stopping the clients, after that, let's restart all 3 clients and they will start back up using the updated binaries.
I still need to test this, but AFAIK it should work no problem, please let me know your thoughts.

@rhmaxdotorg
Copy link
Owner

@nicogranuja excellent, thank you!

Just one more thing to clarify:

Do you think sudo -u $NODE_USER ln -s /opt/lighthouse/target/release/lighthouse /opt/lighthouse/lighthouse/lh needs to be added to update-client.sh or it will work fine as-is?

Again, not sure if you tested this scenario yet and some of these scenarios are harder for me to test than others.

@nicogranuja
Copy link

Do you think sudo -u $NODE_USER ln -s /opt/lighthouse/target/release/lighthouse /opt/lighthouse/lighthouse/lh needs to be added to update-client.sh or it will work fine as-is?

No, the current symbolic link should work just fine, I verified that both /opt/lighthouse/target/release/lighthouse and the symbolic link pointer: /home/$NODE_USER/.cargo/bin/lighthouse binaries are in fact the same, so everything should work the same, the only reason why I had to do it is because I was running a git reset --hard which removed the symbolic link from the lighthouse repo, so I had to recreate it. By the way I updated my scrappy fix script in my comments above to also add this link to the .gitignore so it can be run safely without having to alter the update-client.sh script

@rhmaxdotorg
Copy link
Owner

Gotcha, that makes sense!

Appreciate the details and the script that gives people options to get the clients up to date (if using the old setup script):

  1. "quick rebuild" the validator
  2. "git reset" script you wrote

I'll close this thread since it seems like we've captured a lot of the important notes and feedback, but feel free to ping it or a new thread if more stuff or ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants