Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VC blocks on SlashingDatabase::open when running with NSSM on Windows #2394

Open
remyroy opened this issue Jun 3, 2021 · 7 comments
Open
Labels
bug Something isn't working good first issue Good for newcomers windows

Comments

@remyroy
Copy link
Contributor

remyroy commented Jun 3, 2021

Description

VC blocks on SlashingDatabase::open when running with NSSM as a service on Windows. It does not run properly and it cannot attest. It does not leave SlashingDatabase::open.

Version

Lighthouse v1.4.0-rc.0-f6280aa
BLS Library: blst
Specs: mainnet (true), minimal (false), v0.12.3 (false)

Unstable
Windows 10 (10.0.19043 Build 19043)
rustc 1.52.1 (9bc8c42bb 2021-05-09)
commit f6280aa
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30037 for x64
NSSM 2.24-101-g897c7ad 64-bit 2017-04-26

Present Behaviour

When running with NSSM as a service on Windows, VC starts, displays the follow logs:

Jun 03 09:27:58.374 INFO Lighthouse started                      version: Lighthouse/v1.4.0-rc.0-f6280aa
Jun 03 09:27:58.375 INFO Configured for network                  name: prater
Jun 03 09:27:58.375 INFO Starting validator client               validator_dir: "C:\\ethereum\\var\\lib\\lighthouse\\validator\\validators", beacon_nodes: ["http://localhost:5051/"]
Jun 03 09:27:58.375 INFO HTTP metrics server is disabled
Jun 03 09:27:58.378 INFO Completed validator discovery           new_validators: 0
Jun 03 09:27:59.361 INFO Enabled validator                       voting_pubkey: 0x8fbb8e380977350eac38a66903c09b67f12f7f7794276d3e997b427f0bfb24180ca2deacb6da907a856d770babf268ff
Jun 03 09:28:00.283 INFO Modified key_cache saved successfully
Jun 03 09:28:00.283 INFO Initialized validators                  enabled: 1, disabled: 0

and stops/blocks. When debugging a little further, it blocks when entering SlashingDatabase::open.

Expected Behaviour

VC should run fine even under NSSM as a service on Windows just like it does when it does not run under NSSM.

Steps to reproduce

  1. Download and execute the Microsoft C++ Build Tools installer.
  2. Check the following checkboxes:
  • C++/CLI support for v142 build tools (Latest)
  • MSVC v142 - VS 2019 C++ x64/x86 build tools (Latest)
  • Windows 10 SDK (10.0.19041.0)
  1. Download and execute the Rust installer for Windows.
  2. Download and execute Git for Windows.
  3. Open a PowerShell Prompt as Administrator (Press ⊞ Win+R, type powershell, press Ctrl+⇧ Shift+↵ Enter and click Yes at the User Account Control window)
  4. Copy and paste the following command in your PowerShell Prompt and press ↵ Enter:
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
  1. Once Chocolatey is installed, close your PowerShell Prompt window.
  2. Open a Command Prompt as Administrator (Press ⊞ Win+R, type cmd, press Ctrl+⇧ Shift+↵ Enter and click Yes at the User Account Control window)
  3. Type each of these commands in your Command Prompt window (each line is a different command, you must press ↵ Enter at the end of the line):
choco install nssm
choco install make
choco install cmake --installargs 'ADD_CMAKE_TO_PATH=System'
  1. During the execution of that last command, you will be promted to run a script. Press Y and press ↵ Enter to run it.
  2. Close your Command Prompt window.
  3. Open a normal Command Prompt (Press ⊞ Win+R, type cmd, press ↵ Enter).
  4. Type each of these commands in your Command Prompt window (each line is a different command, you must press ↵ Enter at the end of the line):
git clone https://github.com/sigp/lighthouse.git
cd lighthouse
git checkout unstable
make
mkdir c:\ethereum\bin
copy %UserProfile%\.cargo\bin\lighthouse.exe c:\ethereum\bin
mkdir C:\ethereum\var\log
mkdir C:\ethereum\var\lib\lighthouse\validator
  1. Generate a valid keystore for the Prater network and import it with something like:
c:\ethereum\bin\lighthouse.exe account_manager validator import --network prater --datadir C:\ethereum\var\lib\lighthouse\validator --directory validator_keys
  1. Run a beacon node for Prater and make it available on http://localhost:5051
  2. Type each of these commands in your Command Prompt window (each line is a different command, you must press ↵ Enter at the end of the line):
nssm install lighthousevalidator C:\ethereum\bin\lighthouse.exe vc --network prater --datadir C:\ethereum\var\lib\lighthouse\validator --beacon-nodes http://localhost:5051
nssm set lighthousevalidator DisplayName "Lighthouse Validator Client (Prater)"
nssm set lighthousevalidator AppRotateFiles 1
nssm set lighthousevalidator AppRotateSeconds 86400
nssm set lighthousevalidator AppRotateBytes 10485760
nssm set lighthousevalidator AppStdout C:\ethereum\var\log\lighthousevalidator-service-stdout.log
nssm set lighthousevalidator AppStderr C:\ethereum\var\log\lighthousevalidator-service-stderr.log
nssm start lighthousevalidator
  1. Notice that the VC is blocked and does not work as intented. Inspect C:\ethereum\var\log\lighthousevalidator-service-stderr.log to find out that the last log message is something like:
Jun 03 09:28:00.283 INFO Initialized validators                  enabled: 1, disabled: 0
@remyroy
Copy link
Contributor Author

remyroy commented Jun 3, 2021

NSSM source code can be found on https://git.nssm.cc/nssm/nssm if that can help.

@michaelsproul
Copy link
Member

I've experimented with NSSM and I think it's a file permissions issue. I couldn't recreate the exact hang that you got, but I did get this error when I tried starting after importing the key not as the administrator:

$ nssm start lighthousevalidator
lighthousevalidator: Unexpected status SERVICE_STOPPED in response to START control.

I think I also got a similar error importing the key as admin before starting the service (will have to recheck this on Monday). The flow that definitely worked was:

  • Start service (as admin)
  • Stop service
  • Import key (as admin)
  • Start service

Let me know if that works for you.

The only suspect thing I found in our code was that we call Path::exists, which masks permissions errors. I'll switch it to using Path::metadata so that the permissions error surfaces in open_or_create.

For reference: https://doc.rust-lang.org/std/path/struct.Path.html#method.metadata

@remyroy
Copy link
Contributor Author

remyroy commented Jun 4, 2021

It really seems like a file permission issue. For some reason the permissions on C:\ethereum\var\lib\lighthouse\validator\validators\slashing_protection.sqlite were not what I expected:

icacls.exe C:\ethereum\var\lib\lighthouse\validator\validators\slashing_protection.sqlite
C:\ethereum\var\lib\lighthouse\validator\validators\slashing_protection.sqlite OWNER RIGHTS:(R,W,D,WDAC,WO)

slashing_protection_perms

By adding the SYSTEM account (the account under which services normally run) with full control on the slashing_protection.sqlite file, I got the NSSM service to start correctly.

I just tested importing my validator keystore again with lighthouse.exe account_manager validator import --network prater --datadir C:\ethereum\var\lib\lighthouse\validator --directory validator_keys as a normal user and it is that process who creates the slashing_protection.sqlite file with these unexpected permissions that makes the vc blocs if you running it under a different account than the one who called the account_manager validator import command.

It would be nice if the VC would error out with a message instead of blocking if it does not have permission to access the slashing_protection.sqlite file on Windows. I'm not sure the default OWNER RIGHTS permission on the slashing_protection.sqlite file is needed. I think those permissions could be relaxed.

@paulhauner
Copy link
Member

It would be nice if the VC would error out with a message instead of blocking if it does not have permission to access the slashing_protection.sqlite file on Windows.

It looks like we have a solution to this over in #2436 🎉

@remyroy
Copy link
Contributor Author

remyroy commented Jul 8, 2021

Seems great! Don't forget to relax the permissions on the slashing_protection.sqlite file. There is no need for them to be as restricted as they are now when created on Windows.

@michaelsproul
Copy link
Member

Yeah, let's leave this issue open as a way to track the permissions changes for Windows

bors bot pushed a commit that referenced this issue Jul 9, 2021
## Issue Addressed

Related to #2430, #2394

## Proposed Changes

As per #2430 (comment), ensure that the `ProductionValidatorClient::new` error raises a log and shuts down the VC. Also, I implemened `spawn_ignoring_error`, as per @michaelsproul's suggestion in #2436 (comment).

I got unlucky and CI picked up a [new rustsec vuln](https://rustsec.org/advisories/RUSTSEC-2021-0072). To fix this, I had to update the following crates:

- `tokio`
- `web3`
- `tokio-compat-02`

## Additional Info

NA
@xanatos
Copy link

xanatos commented Apr 25, 2022

I'll add that even the logger sets the log files a owner-only

.restrict_permissions(true)

This is quite annoying if you want to run lighthouse as a service, and then use another use to look at the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers windows
Projects
None yet
Development

No branches or pull requests

4 participants