Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for claiming nodes as part of installation. #10084

Merged
merged 13 commits into from Mar 8, 2021
Merged
42 changes: 35 additions & 7 deletions claim/netdata-claim.sh.in
Expand Up @@ -85,15 +85,22 @@ ERROR_MESSAGES[17]="Service Unavailable"

# Exit code: 18 - Agent unique id is not generated yet.

NETDATA_RUNNING=1

get_config_value() {
conf_file="${1}"
section="${2}"
key_name="${3}"
config_result=$(@sbindir_POST@/netdatacli 2>/dev/null read-config "$conf_file|$section|$key_name"; exit $?)
# shellcheck disable=SC2181
if [ "$?" != "0" ]; then
echo >&2 "cli failed, assume netdata is not running and query the on-disk config"
config_result=$(@sbindir_POST@/netdata 2>/dev/null -W get2 "$conf_file" "$section" "$key_name" unknown_default)
if [ "${NETDATA_RUNNING}" -eq 1 ]; then
config_result=$(@sbindir_POST@/netdatacli 2>/dev/null read-config "$conf_file|$section|$key_name"; exit $?)
result="$?"
if [ "${result}" -ne 0 ]; then
echo >&2 "Unable to communicate with Netdata daemon, querying config from disk instead."
NETDATA_RUNNING=0
fi
fi
if [ "${NETDATA_RUNNING}" -eq 0 ]; then
config_result=$(@sbindir_POST@/netdata 2>/dev/null -W get2 "$conf_file" "$section" "$key_name" unknown_default)
fi
echo "$config_result"
}
Expand Down Expand Up @@ -141,13 +148,33 @@ NETDATA_USER=$(get_config_value netdata global "run as user")
[ -z "$EUID" ] && EUID="$(id -u)"


gen_id() {
local id

id="$(uuidgen)"

if [ "${id}" = "8a795b0c-2311-11e6-8563-000c295076a6" ] || [ "${id}" = "4aed1458-1c3e-11e6-a53f-000c290fc8f5" ]; then
gen_id
else
echo "${id}"
fi
}

# get the MACHINE_GUID by default
if [ -r "${MACHINE_GUID_FILE}" ]; then
ID="$(cat "${MACHINE_GUID_FILE}")"
MGUID=$ID
else
echo >&2 "netdata.public.unique.id is not generated yet or not readable. Please run agent at least once before attempting to claim. Agent generates this file on first startup. If the ID is generated already make sure you have rights to read it (Filename: ${MACHINE_GUID_FILE})."
elif [ -f "${MACHINE_GUID_FILE}" ]; then
echo >&2 "netdata.public.unique.id is not readable. Please make sure you have rights to read it (Filename: ${MACHINE_GUID_FILE})."
exit 18
else
if mkdir -p "${MACHINE_GUID_FILE%/*}" && /bin/echo -n "$(gen_id)" > "${MACHINE_GUID_FILE}"; then
ID="$(cat "${MACHINE_GUID_FILE}")"
MGUID=$ID
else
echo >&2 "Failed to write new machine GUID. Please make sure you have rights to write to ${MACHINE_GUID_FILE}."
exit 18
fi
fi

# get token from file
Expand All @@ -174,6 +201,7 @@ do
-noproxy) NOPROXY=yes ;;
-noreload) RELOAD=0 ;;
-user=*) NETDATA_USER=${arg:6} ;;
-daemon-not-running) NETDATA_RUNNING=0 ;;
*) echo >&2 "Unknown argument ${arg}"
exit 1 ;;
esac
Expand Down
1 change: 1 addition & 0 deletions packaging/docker/run.sh
Expand Up @@ -25,6 +25,7 @@ if [ -n "${NETDATA_CLAIM_URL}" ] && [ -n "${NETDATA_CLAIM_TOKEN}" ] && [ ! -f /v
-url "${NETDATA_CLAIM_URL}" \
${NETDATA_CLAIM_ROOMS:+-rooms "${NETDATA_CLAIM_ROOMS}"} \
${NETDATA_CLAIM_PROXY:+-proxy "${NETDATA_CLAIM_PROXY}"}
-daemon-not-running
Copy link
Member

@ilyam8 ilyam8 Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That condition is not correct

https://github.com/Ferroin/netdata/blob/b8ff8fbed3a2c1e670af64a668552069b3071289/packaging/docker/run.sh#L23-L24

i dont know what is /var/lib/netdata/claim.d/claimed_id. Perhaps some old dir?

  1. There is no NETDATA_CLAIM_ROOMS check. Is it ok to have no rooms. Does it work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. No, just me using the wrong name.
  2. AIUI from discussions about how claiming works that I0ve had with a couple of people, we do not need to specify rooms. Not doing so claims the node, but does not add it to any of the rooms in the space, but it can be manually added later through Netdata Cloud.

fi

exec /usr/sbin/netdata -u "${DOCKER_USR}" -D -s /host -p "${NETDATA_LISTENER_PORT}" -W set web "web files group" root -W set web "web files owner" root "$@"
127 changes: 85 additions & 42 deletions packaging/installer/kickstart-static64.sh
Expand Up @@ -12,6 +12,10 @@
# --local-files Use a manually provided tarball for the installation
# --allow-duplicate-install do not bail if we detect a duplicate install
# --reinstall if an existing install would be updated, reinstall instead
# --claim-token specify a token to use for claiming the newly installed instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I don't like here is different syntax for every script. we call claiming as:
-token=hCdOx-dvi9Wf2A4RhUCAwI6XlJWddCoZAepekiG14ZDBZYAXfMZx0M3SrGeVPOLmzWa6UkaO9Ml7zceA-H6dPEr8Iqdeek3K-UYV3YOje35QQ2-oEpxK4aFv2ri2nHkJF5LqSKg -rooms=d31094b5-c7dd-409e-8dff-c5235e930dc1 -url=https://app.netdata.cloud
e.g. -token=XXX but kickstart wants --claim-token XXX with space

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I strongly suggest we change the claiming script, not what I have here. The = syntax is actually more complicated to parse in shell script than not using it, the claim- prefix makes the meaning of the options in the installers unambiguous (assume you have no prior knowledge, what would you assume proxy to be for in an installation script?), and the double-dash lead-in is for consistency with the rest of the script, which we cannot change without breaking existing tooling users have made around working with these scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would have to go in contact with Cloud guys as well:

  • the Cloud UI generates the command for copy&pasting
  • they would also have to handle the "if your agent version is older than this use this else another thing"

Although by default I trust your assessment of the version without = being better, changing it in Claiming script is too complicated (due to Cloud thing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long-term though I think the goal is to not have users touch the claiming script except in specific exceptional cases. If we were to add a command for netdatacli to allow claiming of a running agent and merge this PR, we could pivot to not even mentioning the claiming script, and just point users at the installer options, the Docker environment variables, and the netdatacli command for claiming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Ferroin on this one. I think that the claiming script should change not to require the = syntax.

Copy link
Contributor

@knatsakis knatsakis Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must change anyway in my opinion, because '-token', '-url', '-rooms' are too generic as option names.

They should be '-claim-token', '-claim-url' and '-claim-rooms' instead

# --claim-url specify a URL to use for claiming the newly installed isntance
# --claim-rooms specify a list of rooms to claim the newly installed instance to
# --claim-proxy specify a proxy to use while claiming the newly installed instance
#
# Environment options:
#
Expand Down Expand Up @@ -224,56 +228,81 @@ NETDATA_INSTALLER_OPTIONS=""
NETDATA_UPDATES="--auto-update"
RELEASE_CHANNEL="nightly"
while [ -n "${1}" ]; do
if [ "${1}" = "--dont-wait" ] || [ "${1}" = "--non-interactive" ] || [ "${1}" = "--accept" ]; then
opts="${opts} --accept"
shift 1
elif [ "${1}" = "--dont-start-it" ]; then
NETDATA_INSTALLER_OPTIONS="${NETDATA_INSTALLER_OPTIONS:+${NETDATA_INSTALLER_OPTIONS} }${1}"
shift 1
elif [ "${1}" = "--no-updates" ]; then
NETDATA_UPDATES=""
shift 1
elif [ "${1}" = "--auto-update" ]; then
true # This is the default behaviour, so ignore it.
shift 1
elif [ "${1}" = "--stable-channel" ]; then
RELEASE_CHANNEL="stable"
NETDATA_INSTALLER_OPTIONS="${NETDATA_INSTALLER_OPTIONS:+${NETDATA_INSTALLER_OPTIONS} }${1}"
shift 1
elif [ "${1}" = "--disable-telemetry" ]; then
NETDATA_INSTALLER_OPTIONS="${NETDATA_INSTALLER_OPTIONS:+${NETDATA_INSTALLER_OPTIONS} }${1}"
shift 1
elif [ "${1}" = "--local-files" ]; then
NETDATA_UPDATES="" # Disable autoupdates if using pre-downloaded files.
shift 1
if [ -z "${1}" ]; then
fatal "Option --local-files requires extra information. The desired tarball full filename is needed"
fi
case "${1}" in
"--dont-wait") opts="${opts} --accept" ;;
"--non-interactive") opts="${opts} --accept" ;;
"--accept") opts="${opts} --accept" ;;
"--dont-start-it")
NETDATA_INSTALLER_OPTIONS="${NETDATA_INSTALLER_OPTIONS:+${NETDATA_INSTALLER_OPTIONS} }${1}"
NETDATA_CLAIM_EXTRA="${NETDATA_CLAIM_EXTRA} -daemon-not-running"
;;
"--no-updates") NETDATA_UPDATES="" ;;
"--stable-channel")
RELEASE_CHANNEL="stable"
NETDATA_INSTALLER_OPTIONS="${NETDATA_INSTALLER_OPTIONS:+${NETDATA_INSTALLER_OPTIONS} }${1}"
;;
"--disable-telemetry") NETDATA_INSTALLER_OPTIONS="${NETDATA_INSTALLER_OPTIONS:+${NETDATA_INSTALLER_OPTIONS} }${1}";;
"--local-files")
NETDATA_UPDATES="" # Disable autoupdates if using pre-downloaded files.
if [ -z "${2}" ]; then
fatal "Option --local-files requires extra information. The desired tarball full filename is needed"
fi

NETDATA_LOCAL_TARBALL_OVERRIDE="${1}"
shift 1
if [ -z "${1}" ]; then
fatal "Option --local-files requires a pair of the tarball source and the checksum file"
fi
NETDATA_LOCAL_TARBALL_OVERRIDE="${2}"

NETDATA_LOCAL_TARBALL_OVERRIDE_CHECKSUM="${1}"
shift 1
elif [ "${1}" = "--allow-duplicate-install" ]; then
NETDATA_ALLOW_DUPLICATE_INSTALL=1
shift 1
elif [ "${1}" = "--reinstall" ]; then
NETDATA_REINSTALL=1
shift 1
else
echo >&2 "Unknown option '${1}' or invalid number of arguments. Please check the README for the available arguments of ${0} and try again"
exit 1
fi
if [ -z "${3}" ]; then
fatal "Option --local-files requires a pair of the tarball source and the checksum file"
fi

NETDATA_LOCAL_TARBALL_OVERRIDE_CHECKSUM="${3}"
shift 2
;;
"--allow-duplicate-install") NETDATA_ALLOW_DUPLICATE_INSTALL=1 ;;
"--reinstall") NETDATA_REINSTALL=1 ;;
"--claim-token")
NETDATA_CLAIM_TOKEN="${2}"
shift 1
;;
"--claim-rooms")
NETDATA_CLAIM_ROOMS="${2}"
shift 1
;;
"--claim-url")
NETDATA_CLAIM_URL="${2}"
shift 1
;;
"--claim-proxy")
NETDATA_CLAIM_EXTRA="${NETDATA_CLAIM_EXTRA} -proxy ${2}"
shift 1
;;
*)
echo >&2 "Unknown option '${1}' or invalid number of arguments. Please check the README for the available arguments of ${0} and try again"
exit 1
esac
shift 1
done

if [ ! "${DO_NOT_TRACK:-0}" -eq 0 ] || [ -n "$DO_NOT_TRACK" ]; then
NETDATA_INSTALLER_OPTIONS="${NETDATA_INSTALLER_OPTIONS:+${NETDATA_INSTALLER_OPTIONS} }--disable-telemtry"
fi

if [ -n "${NETDATA_DISABLE_CLOUD}" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the checking of cloud/claiming options in a separate function to allow us to understand the high-level operations we are performing in the outer scope of the script.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m honestly not sure how much this would help. Everything from this up through the end of the next statement is all one logical block, so putting it in a function would just move the code and require following the function call to find the reference.

That said, we could (and probably should) merge the two if statements here. If the --disable-cloud option was set the next two condition checks are pointless, so they should probably just be elif statements on the end of the DISABLE_CLOUDconditional block.

if [ -n "${NETDATA_CLAIM_TOKEN}" ] || [ -n "${NETDATA_CLAIM_ROOMS}" ] || [ -n "${NETDATA_CLAIM_URL}" ]; then
run_failed "Cloud explicitly disabled but automatic claiming requested."
run_failed "Either enable Netdata Cloud, or remove the --claim-* options."
exit 1
fi
fi

# shellcheck disable=SC2235,SC2030
if ( [ -z "${NETDATA_CLAIM_TOKEN}" ] && [ -n "${NETDATA_CLAIM_URL}" ] ) || ( [ -n "${NETDATA_CLAIM_TOKEN}" ] && [ -z "${NETDATA_CLAIM_URL}" ] ); then
run_failed "Invalid claiming options, both a claiming token and URL must be specified."
exit 1
elif [ -z "${NETDATA_CLAIM_TOKEN}" ] && [ -n "${NETDATA_CLAIM_ROOMS}" ]; then
run_failed "Invalid claiming options, claim rooms may only be specified when a token and URL are specified."
exit 1
fi

# Netdata Tarball Base URL (defaults to our Google Storage Bucket)
[ -z "$NETDATA_TARBALL_BASEURL" ] && NETDATA_TARBALL_BASEURL=https://storage.googleapis.com/netdata-nightlies

Expand Down Expand Up @@ -365,4 +394,18 @@ if [ $? -eq 0 ]; then
fi
else
echo >&2 "NOTE: did not remove: ${TMPDIR}/netdata-latest.gz.run"
exit 1
fi

# --------------------------------------------------------------------------------------------------------------------

if [ -n "${NETDATA_CLAIM_TOKEN}" ]; then
progress "Attempting to claim agent to ${NETDATA_CLAIM_URL}"
NETDATA_CLAIM_PATH=/opt/netdata/bin/netdata-claim.sh

if "${NETDATA_CLAIM_PATH}" -token=${NETDATA_CLAIM_TOKEN} -rooms=${NETDATA_CLAIM_ROOMS} -url=${NETDATA_CLAIM_URL} ${NETDATA_CLAIM_EXTRA}; then
progress "Successfully claimed node"
else
run_failed "Unable to claim node, you must do so manually."
fi
fi