Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datacenter node setup scripts #5517

Merged
merged 11 commits into from Aug 22, 2019

Conversation

@TristanDebrunner
Copy link
Contributor

commented Aug 14, 2019

Problem

We need to consistently setup new nodes for the datacenter

Summary of Changes

Add bash scripts that do all the setup for us. Also add helper scripts that do parts of the setup if something breaks.

@TristanDebrunner TristanDebrunner requested a review from t-nelson Aug 14, 2019

@TristanDebrunner TristanDebrunner force-pushed the TristanDebrunner:dc-node-setup branch from 3d353c0 to 9a65d8b Aug 14, 2019

@t-nelson
Copy link
Contributor

left a comment

Looks good! Just a few (probably legacy) nits

net/datacenter-node-install/scripts/setup-cuda.sh Outdated Show resolved Hide resolved
net/datacenter-node-install/scripts/set-hostname.sh Outdated Show resolved Hide resolved
net/datacenter-node-install/scripts/setup-grub.sh Outdated Show resolved Hide resolved

@TristanDebrunner TristanDebrunner force-pushed the TristanDebrunner:dc-node-setup branch from 9a65d8b to c1b3933 Aug 14, 2019

@danpaul000 danpaul000 self-requested a review Aug 19, 2019

# Setup kernel constants
cat > /etc/sysctl.d/20-solana-node.conf <<EOF
# Solana networking requirements

This comment has been minimized.

Copy link
@danpaul000

danpaul000 Aug 19, 2019

Contributor

Are these ever subject to change and therefore should they be popped out to a config file?

This comment has been minimized.

@t-nelson

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

If @TristanDebrunner doesn't beat me to it, I can take the external script path fixes over after I finish the allocation script

@danpaul000

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

Can we also get a README.md that provides some top level instructions on which scripts to run, in which order, and under what circumstances? Do these get run once ever, or all the time, or somewhere in between?

@t-nelson

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

I believe they are a run once after OS install on nodes we host, type thing

  1. setup-dc-node-1.sh
  2. reboot
  3. setup-dc-node-2.sh

Assuming everything JustWorks(TM), that is... I can add a simple README.md for now and let @TristanDebrunner add any details he's aware of when he has time.

@danpaul000

This comment has been minimized.

Copy link
Contributor

commented Aug 19, 2019

t-nelson added 5 commits Aug 21, 2019
@codecov

This comment has been minimized.

Copy link

commented Aug 21, 2019

Codecov Report

Merging #5517 into master will decrease coverage by 6.9%.
The diff coverage is 53.8%.

@@           Coverage Diff            @@
##           master   #5517     +/-   ##
========================================
- Coverage    77.7%   70.7%     -7%     
========================================
  Files         213     217      +4     
  Lines       40972   44844   +3872     
========================================
- Hits        31853   31740    -113     
- Misses       9119   13104   +3985

Pull request has been modified.

@t-nelson t-nelson requested a review from danpaul000 Aug 21, 2019

@@ -0,0 +1,80 @@
#!/usr/bin/env bash

if [[ "$EUID" -ne 0 ]]; then

This comment has been minimized.

Copy link
@t-nelson

t-nelson Aug 21, 2019

Contributor

Something I noticed while doing the cleanup round...

Depending on how the user takes root, some of the variables we make assumptions about in this script may have unexpected values. Affected are USERNAME (Both the normal shell global and the result of whoami used here) and HOME.

This comment has been minimized.

Copy link
@t-nelson

t-nelson Aug 21, 2019

Contributor

The most obvious solution I see here is to enforce that the scripts are called with root taken via sudo rather than from a root shell (ala. sudo su --). Then we'll have the SUDO_* envvars to detect the enforcement and fill out all of our script-local variables.

How does that sound to you @danpaul000 ?

This comment has been minimized.

Copy link
@danpaul000

danpaul000 Aug 22, 2019

Contributor

Yeah, I think it's reasonable to ask the user to just call sudo this_script.sh, as that is what the instruction is in the README, so now a future user has no excuse. :)

This comment has been minimized.

Copy link
@t-nelson

t-nelson Aug 22, 2019

Contributor

I'll put the checks in place and make the appropriate changes tomorrow

This comment has been minimized.

Copy link
@t-nelson

t-nelson Aug 22, 2019

Contributor

Addressed in 85b7b72

net/datacenter-node-install/setup-dc-node-1.sh Outdated Show resolved Hide resolved

# Allow admin user to log in
BASE_SSH_DIR="${HOME}/.ssh"
mkdir "$BASE_SSH_DIR"

This comment has been minimized.

Copy link
@danpaul000

danpaul000 Aug 21, 2019

Contributor

Won't $HOME be root's homedir in this case? Looks like you want $USERNAME's homedir, no?
Can use
su - ${USERNAME} -c 'mkdir "$BASE_SSH_DIR"' and others, so you don't need to chown after every file/dir creation

This comment has been minimized.

Copy link
@t-nelson

t-nelson Aug 21, 2019

Contributor

I just did the obvious thing to communicate the intent of the change. There are bigger problems, see #5517 (comment)


set -xe

"$HERE"/setup-cuda.sh

This comment has been minimized.

Copy link
@danpaul000

danpaul000 Aug 21, 2019

Contributor

Do we always want to run this? Our CPU nodes don't need cuda. Can we check for the presence of certain cards before running this?

To make things complicated, there ARE graphics cards in the CPU nodes, but cheapo ones for local console access only.

This comment has been minimized.

Copy link
@t-nelson

t-nelson Aug 21, 2019

Contributor

My gut says, "it won't hurt," but I think this is another one we'd need an answer from @TristanDebrunner on.

We could parse the output of lspci to figure out if/what cards are installed

@@ -0,0 +1,21 @@
#!/usr/bin/env bash

This comment has been minimized.

Copy link
@danpaul000

danpaul000 Aug 21, 2019

Contributor

What is this script doing? Looks like we call all those scripts already, and then randomize the password? Why?

This comment has been minimized.

Copy link
@t-nelson

t-nelson Aug 21, 2019

Contributor

@TristanDebrunner would have the best answer here, but I believe this is a separate workflow (for setting up nodes we send to validators) than what we used to setup the nodes destined for our colo rack.

t-nelson added 2 commits Aug 21, 2019
eb
@danpaul000
Copy link
Contributor

left a comment

+1 with shellcheck appeased

t-nelson added 2 commits Aug 22, 2019

@t-nelson t-nelson added the automerge label Aug 22, 2019

@solana-grimes solana-grimes merged commit 51cf559 into solana-labs:master Aug 22, 2019

10 checks passed

Summary 1 rule matches and 7 potential rules
Details
buildkite/solana Build #10668 passed (29 minutes, 49 seconds)
Details
buildkite/solana/bench Passed (3 seconds)
Details
buildkite/solana/checks Passed (1 minute, 18 seconds)
Details
buildkite/solana/coverage Passed (6 seconds)
Details
buildkite/solana/pipeline-upload Passed (2 seconds)
Details
buildkite/solana/shellcheck Passed (23 seconds)
Details
buildkite/solana/stable Passed (28 minutes, 19 seconds)
Details
buildkite/solana/stable-perf Passed (3 seconds)
Details
ci-gate Pull Request accepted for CI pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.