Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: netdata-claim.sh is too cryptic, documentation missing #12186

Closed
ktsaou opened this issue Feb 19, 2022 · 13 comments
Closed

[Feat]: netdata-claim.sh is too cryptic, documentation missing #12186

ktsaou opened this issue Feb 19, 2022 · 13 comments
Labels
area/claim feature request New features needs triage Issues which need to be manually labelled

Comments

@ktsaou
Copy link
Member

ktsaou commented Feb 19, 2022

Problem

wget -O /tmp/netdata-kickstart.sh https://my-netdata.io/kickstart.sh && sh /tmp/netdata-kickstart.sh --claim-token HIDDEN --claim-rooms HIDDEN --claim-url https://app.netdata.cloud
--2022-02-19 23:04:14--  https://my-netdata.io/kickstart.sh
SSL_INIT
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving my-netdata.io (my-netdata.io)... 172.67.156.192, 104.21.13.159, 2606:4700:3036::ac43:9cc0, ...
Connecting to my-netdata.io (my-netdata.io)|172.67.156.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘/tmp/netdata-kickstart.sh’

/tmp/netdata-kickstart.sh                       [ <=>                                                                                     ]  45,22K  --.-KB/s    in 0,007s  

2022-02-19 23:04:15 (6,37 MB/s) - ‘/tmp/netdata-kickstart.sh’ saved [46309]


 --- Using /tmp/netdata-kickstart-qviAFz08Fu as a temporary directory. --- 
 ABORTED  Found an existing netdata install at /, but the install type is 'custom', which is not supported, refusing to proceed.

Now what? No URL on the above error message to find help. No link to discord either.

Let's try to claim it by hand:

netdata-claim.sh does not have any command line help:

# netdata-claim.sh -h
Unknown argument -h

# netdata-claim.sh --help
Unknown argument --help

The "Add Nodes to War Room" modal on the cloud, does not provide any hints on how to call netdata-claim.sh.

Let's hope netdata-claim.sh supports the same parameters as kickstart.sh does:

# netdata-claim.sh --claim-token HIDDEN --claim-rooms HIDDEN --claim-url https://app.netdata.cloud
Unknown argument --claim-token

No. It does not.

Hm... probably it should work by eliminating --claim from the above command.

# netdata-claim.sh -token HIDDEN -rooms HIDDEN -url https://app.netdata.cloud
Unknown argument -token

No it does not.

Let's check the documentation. The "Add node to war room" modal, has a link to documentation: https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-linux

Great! Let's read that!

Hm... where is netdata-claim.sh mentioned?

Nowhere!

Wait! There is a example of netdata-claim.sh, as part of a netdata running under docker.

This is the solution. It also needs =:

# netdata-claim.sh -token=HIDDEN -rooms=HIDDEN -url=https://app.netdata.cloud

IT WORKED!

But -key=value is quite unusual for Linux.
The usual is --key=value (2x dashes).

Description

netdata-claim.sh should:

  1. be documented in learn
  2. offered in netdata.cloud modals
  3. support the same parameter names as kickstart.sh
  4. support the parameters in the usual Linux way

kickstart.sh when it fails, should:

  1. provide a link to documentation for help
  2. provide a link to discord for help
  3. try to claim the existing installation

Importance

blocker

Value proposition

  1. All users have to go via the claiming process
  2. Ideally kickstart.sh should claim the existing installation
  3. Or at least provide links for help

Proposed implementation

I suggested above

@ktsaou ktsaou added feature request New features needs triage Issues which need to be manually labelled labels Feb 19, 2022
@Ferroin
Copy link
Member

Ferroin commented Feb 21, 2022

Possible partial fix here: #12179

95% of the issues around this are because it’s a shell script that was cobbled together in a hurry early on in the life of the cloud and then never got reimplemented in the agent code like it should have been in the first place.

@ilyam8
Copy link
Member

ilyam8 commented Feb 21, 2022

@Ferroin the initial issue is we don't call claim() for existing custom installs even if claim-* parameters are passed.

Perhaps we just need to add custom here

case "${INSTALL_TYPE}" in
kickstart-*|legacy-*|binpkg-*|manual-static|unknown)

@Ferroin
Copy link
Member

Ferroin commented Feb 21, 2022

The particular case mentioned in the OP is indeed that, but the general information is that this is a problem regardless of why the claiming fails.

@ilyam8
Copy link
Member

ilyam8 commented Feb 21, 2022

I think that is the main problem here. If you need to claim your node the Cloud suggests using kickstart

Screenshot 2022-02-21 at 21 20 27

And the issue is - we don't call claim().

I may be wrong, but I am under impression that netdata-claim is not supposed to be called directly, but only via kickstart. Am I wrong? If not, then the discrepancy between kickstart and netdata-claim parameters is not a problem.

@ktsaou
Copy link
Member Author

ktsaou commented Feb 23, 2022

There are several issues mentioned here:

  1. Kickstart fails with a custom install.
  2. Kickstart fails without a documentation link to help the user go through the manual steps required.
  3. Kickstart fails without a a link to discord or any idea of how the user can ask for help.
  4. Kickstart does not call netdata-claim.sh on custom installs that cannot be updated.
  5. netdata-claim.sh does not respond to -h or --help.
  6. netdata-claim.sh does not accept the same claiming parameters kickstart does.
  7. netdata-claim.sh wants parameters in a very strange and unique way (one dash, equal sign)
  8. Page https://learn.netdata.cloud/docs/agent/claim#connect-an-agent-running-in-linux does not document netdata-claim.sh
  9. Page https://learn.netdata.cloud/docs/agent/packaging/installer/update says that custom type installs are probably managed by the system package manager, but not in my case.

So, because of all these, as a user of a custom install, I am totally lost! I can't connect the node to the cloud!

cc: @cpipilas

@cpipilas
Copy link

cpipilas commented Feb 24, 2022

We need to decide if we still want to expose netdata-claim.sh for claiming or we will do claiming using kickstart. I'm in favour of the second option to keep things simple, especially due to the fact that claiming params/format is different between these two.

Currently, to reconnect a node we suggest the use of netdata-claim.sh in our documentation

If we choose to go with claiming using kickstart option, the documentation needs to be updated. BTW, netdata-claim.sh is also described here: https://learn.netdata.cloud/docs/agent/claim#claiming-script and affects #12228

@Ferroin
Copy link
Member

Ferroin commented Feb 24, 2022

OK, so IOW we do want to support random unknown installation types such as users building by hand on their own system or installing through their system package manager?

Because I’ve been under the impression given pretty much everything discussed around any type of support that we do not care about such installs at all, which is why the kickstart script explicitly refuses to attempt to claim them.


In more detail as far as the specific list though:

  1. Will not be supported for any general case other than claiming. We cannot safely run the updater in these cases because we cannot determine for certain that it is a properly functioning install where the updater will work correctly (and because most of the time it will not be such an install, see my comment below about the final point).
  2. Will become irrelevant when point 4 is resolved.
  3. Will become irrelevant when point 4 is resolved.
  4. Will be fixed when Add handling for claiming non-standard install types with kickstart. #12064 is merged.
  5. Is best solved by handling things as outlined in Move the claiming process into the core agent code. #12179
  6. Same as 5.
  7. Same as 5.
  8. Is intentional, because we decided (because of prompting from a large number of people internally) that it was too hard to walk the user through finding the claiming script on their system.

As far as the final point, the documentation is correct, for a vast majority of real users, a custom install type means a distro-supplied package. It is important to remember that because you manually ran the netdata-installer.sh script, which we do not officially support doing anymore, you are a special case here, and we simply cannot always support special cases.

@ktsaou
Copy link
Member Author

ktsaou commented Feb 25, 2022

OK, so IOW we do want to support random unknown installation types such as users building by hand on their own system or installing through their system package manager?

If we can yes. If we can't no.
This only refers to the 1st bullet in my list.

Because I’ve been under the impression given pretty much everything discussed around any type of support that we do not care about such installs at all, which is why the kickstart script explicitly refuses to attempt to claim them.

Wait! "we do not care" ? No, we do care about every single netdata install.
But this "care" has many different weights.

I don't mind about automatically managing these installs, as long as there is clear help and documentation on what users should do.

So, points 1 and 4 may stay as they are if we believe that we can't do them right, or the amount of work required to do them is tremendous vs the number of users who need it.

  1. Will not be supported for any general case other than claiming. We cannot safely run the updater in these cases because we cannot determine for certain that it is a properly functioning install where the updater will work correctly (and because most of the time it will not be such an install, see my comment below about the final point).

ok

2. Will become irrelevant when point 4 is resolved.

No. This is not a problem of claiming. The problem here is that as a user I want to have a valid, supported install, but we don't provide any information on how to achieve that when you have a custom install.

3. Will become irrelevant when point 4 is resolved.

No. Same as above. I want a supported install. Claiming an existing install is a fallback. I will still run an unsupported install.

4. Will be fixed when Add handling for claiming non-standard install types with kickstart. #12064 is merged.

ok, but it may or may not work. It depends on the agent version I have installed. If it is old, the user is still doomed.

8. Is intentional, because we decided (because of prompting from a large number of people internally) that it was too hard to walk the user through finding the claiming script on their system.

So, you are suggesting that for those users that for whatever reason they need a custom install, the only way for them to claim their agents to the cloud is via the kickstart script, that they have already decided not to use for the installation.
It is confusing...

As far as the final point, the documentation is correct, for a vast majority of real users, a custom install type means a distro-supplied package. It is important to remember that because you manually ran the netdata-installer.sh script, which we do not officially support doing anymore, you are a special case here, and we simply cannot always support special cases.

This used to be the default way. We have users out there using it. We should say something about it. Pretending this does not exist, does not help. Adding some documentation about it, is not that hard...

@Ferroin
Copy link
Member

Ferroin commented Feb 25, 2022

Wait! "we do not care" ? No, we do care about every single netdata install. But this "care" has many different weights.

I don't mind about automatically managing these installs, as long as there is clear help and documentation on what users should do.

The stance you have effectively projected regarding any third-party install method since I started at the company is that we do not care, because they cannot be up to date enough for us to shove new features down users throats.

No. This is not a problem of claiming. The problem here is that as a user I want to have a valid, supported install, but we don't provide any information on how to achieve that when you have a custom install.

No. Same as above. I want a supported install. Claiming an existing install is a fallback. I will still run an unsupported install.

Apologies about both of these then, my understanding was that they were specifically about the claiming issue and not general complaints.

ok, but it may or may not work. It depends on the agent version I have installed. If it is old, the user is still doomed.

Yes, and because we cannot safely update such an install, we cannot resolve this for them.

So, you are suggesting that for those users that for whatever reason they need a custom install, the only way for them to claim their agents to the cloud is via the kickstart script, that they have already decided not to use for the installation. It is confusing...

Our own policies have functionally made it such that third party installs cannot be supported for Cloud usage. See for example the hard cutover to the new architecture that we are only giving about six weeks worth of advance notice about, which will mean that pretty much anybody who is currently installed through a third-party mechanism cannot use the Cloud at all. If we want to support such users, then the first step needs to be acknowledging that we are handling such things completely wrong and change our policies around such breaking changes, and only then can we consider worrying about other aspects of support.

And yes, there are also people who built Netdata by hand to consider, but they constitute a tiny percentage of our user base compared to other installation methods at this point, and the smaller the group of users the less tenable it is to provide support for them.

This used to be the default way. We have users out there using it. We should say something about it. Pretending this does not exist, does not help. Adding some documentation about it, is not that hard...

It has not been the officially recommended install method for five years (assuming the original commit of the kickstart script in 2017 was when we decided that that was the preferred install method), and AFAIK we have no evidence of continued usage by actual users other than you (please correct me if I am wrong about this, but be prepared to provide demonstrable proof), and given the fact that we have no indication that any significant percentage of users is actually still using such installs, support for such installs has been categorized as the lowest possible priority.

Yes, adding documentation is not hard, but it takes time which could be spent more effectively by working on things that actually benefit a significant percentage of our users.

@ktsaou
Copy link
Member Author

ktsaou commented Feb 25, 2022

I am just suggesting that adding some documentation for corner-case of users is respectful for them, and easy for us...

@ktsaou
Copy link
Member Author

ktsaou commented Feb 26, 2022

btw, netdata-claim.sh is needed also to claim static installs.

And if the node is old we have this:

# /opt/netdata/usr/bin/netdata-claim.sh -token=qmwHshM-34i6RZrblN1YXaaZJVfuIjkKN5iKohBAhR7wBgs4D5Epd3LTKi_4FoS1_tjEVyqJlEWGYFjySntO7nolTAqgpDvUZpAtE6b4GCkoggo073ILTPptVXihuJvLfM2hIu0 -rooms=02179f1f-9030-486a-a873-9b57f6420a53 -url=https://app.netdata.cloud
Token: ****************
Base URL: https://app.netdata.cloud
Id: e6e4dcc4-969f-11ec-82f2-0401d1165f01
Rooms: 02179f1f-9030-486a-a873-9b57f6420a53
Hostname: d1.firehol.org
Proxy: 
Netdata user: netdata
Generating private/public key for the first time.
Generating RSA private key, 2048 bit long modulus
.............................+++
..................................................+++
e is 65537 (0x10001)
Extracting public key from private key.
writing RSA key
Failed to connect to https://app.netdata.cloud, return code 60
Connection attempt 1 failed. Retry in 1s.
Failed to connect to https://app.netdata.cloud, return code 60
Connection attempt 2 failed. Retry in 2s.
Failed to connect to https://app.netdata.cloud, return code 60
Connection attempt 3 failed. Retry in 3s.
grep: /opt/netdata/var/lib/netdata/cloud.d/tmpout.txt: No such file or directory
grep: /opt/netdata/var/lib/netdata/cloud.d/tmpout.txt: No such file or directory
Failed to claim node with the following error message:"Unknown HTTP error message"

To work, you have to do this:

# export PATH="/opt/netdata/bin:${PATH}"
# /opt/netdata/usr/bin/netdata-claim.sh -token=qmwHshM-34i6RZrblN1YXaaZJVfuIjkKN5iKohBAhR7wBgs4D5Epd3LTKi_4FoS1_tjEVyqJlEWGYFjySntO7nolTAqgpDvUZpAtE6b4GCkoggo073ILTPptVXihuJvLfM2hIu0 -rooms=02179f1f-9030-486a-a873-9b57f6420a53 -url=https://app.netdata.cloud
Token: ****************
Base URL: https://app.netdata.cloud
Id: e6e4dcc4-969f-11ec-82f2-0401d1165f01
Rooms: 02179f1f-9030-486a-a873-9b57f6420a53
Hostname: d1.firehol.org
Proxy: 
Netdata user: netdata
Connection attempt 1 successful
Node was successfully claimed.

@netdata-community-bot
Copy link

This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:

https://community.netdata.cloud/t/issues-installing-netdata-on-servers/2476/5

@ilyam8
Copy link
Member

ilyam8 commented Dec 22, 2023

I think this issue is no longer needed. Using netdata-claim.sh is not something we recommend. It is been a while since I used that script directly last time - connecting using UI is the way!

@ilyam8 ilyam8 closed this as completed Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/claim feature request New features needs triage Issues which need to be manually labelled
Projects
None yet
Development

No branches or pull requests

5 participants