Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: ability to delete unseen, stale or live nodes (other than Offline) #690

Open
andrewm4894 opened this issue Dec 15, 2022 · 6 comments

Comments

@andrewm4894
Copy link

andrewm4894 commented Dec 15, 2022

Problem

As a user i can only delete "offline" nodes from NC. I should be able to delete any nodes i want.

We need to split the problem into cases with node status as a key:

  1. Online - We could either ban on the cloud level (not ideal) or instruct agent to disconnect by dropping cloud configuration. This is easy for directly connected nodes. More complicated case is when agent connects through claimed parent, set of parents or there are more than 1 parents in line for the node. We could disable streaming in such case, I think. Just ban on parent level from cloud connection only will mean that it will still collect the data from the node in question.
  2. Stale - Same as above but display a warning that data for this node is going to be deleted too (we should instruct a parent(s) to do so - either by marking the data to be removed and letting garbage collector to do it's job or enforcing the operation directly).
  3. Unseen - Just let me remove it and remove all the data that this particular node managed to imprint on the cloud - mostly DB entry and credentials for mqtt. I do not know if it is even possible to have an Unseen node connected through the parent so I have no idea about handling this case.
  4. Offline - there is an ability to remove node already.

Example:
I had a group of 11 nodes streaming to my parent. I deleted these VM's since i no longer need them. However i still see them in Netdata Cloud and am unable to delete them from NC.

Should i not be able to delete them? Unsure if this is a bug or feature request.

These nodes are gone an never coming back so i would like to remove them from NC. I guess maybe eventually the data for them might fall away on my parent and maybe then they would be offline in NC maybe and then i could delete perhaps. Unsure.

image

https://netdata-cloud.slack.com/archives/CS3PB0VJ7/p1671026396555759

Description

  1. Cleaner infra view.
  2. Control over the space without waiting X days for nodes to be marked as offline.
  3. More freedom in testing things without a fear of injecting ghost nodes or the same node more than once (changing configuration by accident or on purpose might change the claimid)
  4. Probably less ghost spaces - I imagine that user that just starts with NDC and tests it's capabilities might create a new space just to clean up the view.
  5. I believe some users were confused when they first tried NDC because they couldn't delete the nodes that were either set up incorrectly or already switched off. It could be a cause for dropping the offering entirely, especially when dealing with dynamic environments.

Importance

must have

Value proposition

  1. let me keep my space clean

Proposed implementation

No response

@andrewm4894 andrewm4894 changed the title [Feat]: ability to delete stale nodes that still have data on a parent [Feat]: ability to delete stale or live nodes Jan 9, 2023
@netdata-community-bot
Copy link

This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:

https://community.netdata.cloud/t/cant-delete-stale-nodes/3909/2

@gdoermann
Copy link

I found you can delete them if you delete the parent and re-install fresh on that parent machine. You have to remove the parent and all vnodes from the cloud dashboard and then when you re-claim the parent host it will set things up fresh.

@kousu
Copy link

kousu commented Nov 23, 2023

I tried to erase my historical data directly to see if that would clear it up, as a workaround until netdata makes an official way to do this. I opened up the list of stale nodes:

2023-11-23-002049_529x427_scrot

and mouse-over'd the stale node to delete and copied a link like https://EXAMPLE.ORG/v2/spaces/DOMAINTLD/rooms/local/nodes/888586af-e5ab-47f2-8094-c4948fd1243a.

Then I extracted the UUID and deleted the folder that holds its data on my parent node:

systemctl stop netdata
cd /var/lib/netdata
rm -r 888586af-e5ab-47f2-8094-c4948fd1243a ... # deleting each of the folders
systemctl start netdata

On rebooting, the charts are gone, but the node itself is still listed as "stale"

2023-11-23-003007_1366x768_scrot

So that wasn't enough.

I poked around some more and found this sqlite database:

root@monitor:~# sqlite3 /var/cache/netdata/netdata-meta.db
SQLite version 3.42.0 2023-05-16 12:36:15
Enter ".help" for usage hints.
sqlite> .headers on
sqlite> .tables
alert_hash          dimension           host                metadata_migration
chart               health_log          host_info           node_instance     
chart_label         health_log_detail   host_label        
sqlite> select * from host where hostname='host1.example.org';
host_id|hostname|registry_hostname|update_every|os|timezone|tags|hops|memory_mode|abbrev_timezone|utc_offset|program_name|program_version|entries|health_enabled
�9�ƃ!�����wK|host1.example.org|host1.example.org|15|linux|America/Toronto||1|5|EST|-18000|netdata|v1.33.1|0|1
��ER����
        �z���|host1.example.org|host1.example.org|15|linux|Etc/UTC||1|5|EST|-18000|netdata|v1.42.1|0|1

annoyingly, host_id, presumably the UUID, is stored in binary, while the rest is stored as text, but I was able to remove the entry with:

sqlite> delete from host where hostname='host1.example.org' and program_version='v1.33.1';

After another

root@monitor:~# systemctl restart netdata

the stale node is now gone from my dashboard. 🎉

Unfortunately this is not very clean. I believe there are still entries in the host_label and host_info and and node_instance tables referencing the deleted host_id, but I don't know how to input binary data in the sqlite CLI and I don't feel like digging out python right now to do it, so the garbage is just going to sit around.

@hugovalente-pm hugovalente-pm changed the title [Feat]: ability to delete stale or live nodes [Feat]: ability to delete unseen, stale or live nodes (other than Offline) May 2, 2024
@luckman212
Copy link

luckman212 commented Jun 13, 2024

I had an installation problem with a node and now it's marked as "Stale" and "delete is disabled". The node is dead and will never be coming back. How do I get rid of this thing? Is there really no way to delete this??

@netdata-community-bot
Copy link

This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:

https://community.netdata.cloud/t/impossible-to-delete-stale-node/5537/1

@luckman212
Copy link

@netdata-community-bot funny. that's MY post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants