Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected data exposure by default value of "[registry].registry to announce" #2760

Closed
mikeweilgart opened this issue Sep 19, 2017 · 30 comments

Comments

@mikeweilgart
Copy link

The default value in /opt/netdata/etc/netdata/netdata.conf is:

[registry]
        # registry to announce = https://registry.my-netdata.io

This is a public registry. What this means is, anytime you navigate with your web browser to the netdata console for a given host, your browser sends information about that host to the public registry. At least its hostname is exposed.

For some organizations hostnames represent sensitive information, so this should be documented better and probably should not be the default.

(A workaround is to set the value to the bogus registry http://localhost:19999, so the web browser will attempt to send the data to the host it resides on, which won't do anything.)

@ktsaou
Copy link
Member

ktsaou commented Sep 19, 2017

Hi,

The hostname sent does not include the domain (cannot be resolved), and the IP address of the host or the browser is not stored by the registry.

You can turn any netdata into a registry, so it is pretty simple once you decide to deploy it, to use your own (and most people do).

You are right, the wiki page did not document it tracks hostnames (although it is obvious it does it, since the menu shows hostnames). I added it now: https://github.com/firehol/netdata/wiki/mynetdata-menu-item#what-data-the-registry-maintains

@mikeweilgart
Copy link
Author

The workaround is fairly simple, as I noted, but just imagine a hostname such as "whitehouse-rhel5.2-webhost-13". You wouldn't need a domain to glean some vital data. ;)

@Ferroin
Copy link
Member

Ferroin commented Sep 19, 2017

There's also the argument that hostnames such as that shouldn't exist. If somebody is using such hostnames though, it's very likely that they will get leaked via some other software as well (email software for example). That particular example should be encoded as: '13.webhost.whitehouse.gov'. All the information it contains other than the OS and version (which should never be present in a hostname for a production system) is stuff that should be encoded in conventional DNS, for exactly this reason.

@ktsaou
Copy link
Member

ktsaou commented Sep 19, 2017

Well, I clearly understand the issue here. But still netdata requires a registry. Shipping netdata without a working default, means that it would be impossible for people to use it and understand what it is and why they need it.

netdata already provides all the means for changing hostnames for the registry and setting up your registry. So, I don't think we should change the default.

Of course, I have identified this issue and I think the default registry should be covered by some "license". This is why I have opened issue #1919 to settle this issue. However this requires some work (accept the terms, etc), and I had planned it for v1.8.0, but due to the number of bugfixes, I decided to release v1.8.0 without it and plan it for v1.9.0

Keep in mind that the global health service which is planned as the key new feature of v1.9.0 will increase the exposure. And such a license will then be required by all means.

@simonnagl
Copy link

With the current default configuration netdata exposes data to an global server. For some users this may send sensitive data to an external organization (us). It should be up to them wether to trust us. And it should be transparent for them that net data exposes this data with the default configuration and how to disable this feature.

Transparent means for me not somewhere in the wiki. Or would you agree every user should read the whole wiki before using netdata? It means BEFORE installing netdata to me. I would go so far and propose to add a note to each installation method.

I do not think we should change the default. I agree with @ktsaou without shipping a working default people will not use and and/or continues asking how to configure it.

But I think we do not want to loose or frustrate users exposing data they did not want to expose without mentioning it.

@ktsaou
Copy link
Member

ktsaou commented Sep 23, 2017

hm... this again will not solve the problem. netdata is installed via many ways and several of them will not allow users to see this info.

It is probably better to change the way the registry works. So, before sending any information to the registry, a new call will be made from the browser to the registry, to check if the registry knows the user.

This call will return true or false and will not push any data to the registry. If the user is not known at the registry, the user will have to tick I accept the terms and then his browser will start pushing data. To avoid the repeating double call to the registry, the user known to the registry flag will be saved to browser local storage.

What do you think?

@simonnagl
Copy link

This is maybe more work but will affect every Installation and is a better solution!

@mikeweilgart
Copy link
Author

@ktsaou I don't understand your suggestion. What is a "user" in the sense you are using here? Why does it matter if the "user" is known rather than just checking the "terms accepted" flag in the browser's local storage?

Also, without saving a "user rejects the terms" flag also, you are setting up users to get a UX annoyance.

@tjohnston01
Copy link

Does anyone know how to disable this feature?
I'm trying to update an internal fork of netdata to disable this feature.

Following this advice to change settings in /opt/netdata/etc/netdata or /etc/netdata/netdata.conf does not seem to work. I still see GET requests to the default registry URL.

In case it helps, I am building with the docker/makeself toolchain from this repo.

This feature should be more clearly advertised and easier to configure (regardless of the technical details or policy re: revealing hostnames).

@ktsaou
Copy link
Member

ktsaou commented May 1, 2018

@tjohnston01 currently you can't disable the registry. You can setup your own registry though. Have you tried it?

@tjohnston01
Copy link

So for example if I set

[registry]
    enabled = no
    registry to announce = http://localhost:19999

The netdata machine will send GETs to itself.

"enabled = no" in this case just means that this machine does not act as a registry (but it still sends these GET requests). Correct?

It would be good to have a way to disable both parts of this.

@Ferroin
Copy link
Member

Ferroin commented May 1, 2018

I agreed that having an option to disable this is a good idea, and I'd argue that it really should be opt-in, not opt-out. Have some pop-up on the dashboard if the config to enable/disable connecting to the registry is unset to prompt the user to opt-in if they want or opt-out if they don't, probably with something about links in alerts not working correctly without it and the fact that potentially identifying data gets sent to a a central server run by the netdata project if it's enabled.

Also, as it currently stands, I'm pretty certain that this is not EU GDPR compliant, and in particular because the hostnames are functionally identifiable data and are stored, it probably needs to be adjusted to be compliant (IANAL though).

@ktsaou
Copy link
Member

ktsaou commented May 2, 2018

Issue #1919 will turn this to opt-in.

Abstract from: https://www.eugdpr.org/gdpr-faqs.html

What constitutes personal data?
Any information related to a natural person or ‘Data Subject’, that can be used to directly or indirectly identify the person. It can be anything from a name, a photo, an email address, bank details, posts on social networking websites, medical information, or a computer IP address.

A URL can hardly be mapped to a person. However, the person cookie the registry uses, can be used to identify a person (it tracks the browser the person uses). When we will implement #1919 the registry will also have the person's email.

The only data processing done by netdata using these data, is explained in detail at https://github.com/firehol/netdata/wiki/mynetdata-menu-item. We don't expose these data to 3rd parties and we don't process them in any other way.

So, I think we can close this and continue at #1919
Right?

@toastbrotch
Copy link

i just found out today about the whole public registry and what it does hidden inside the ui. and well this is a showstopper for me and a huge loss of trust! so i came here..

i think its partly the wrong discsussion here: its not about "a working default" but "a secure default" as netdata IS working even without the need of users to expose data to any thirdparty server. the only way to fix this is to run one node as registry. i've not seen it in the documentation that this is mandatory to prevent data exposure to thirdparties. and a tool you run on every machine should have a secure default setup.

i completely do not understand why its not really possible to turn off the need of a registry at all. after all it just shows me some urls of my netdata instances.. ah right, i have bookmarks (and yes we have other tools that orchestrate our infrastructure, and generating a link to each node to connect via http...19999 is trivial).

btw: under gdpr even your ip is a personal attribute. and the registry gives personal-identifiers (aka trackingcookie) to visitors without telling them and without consens. its actually hidden in the ui and only visible if you watch traffic. its no good and not trustworthy habit to do so and possibly even not legal.

for the love of your otherwise such wonderful tool: please turn this off by default.

@ktsaou
Copy link
Member

ktsaou commented Jul 7, 2018

The registry provides a lot more functionality than browser bookmarks. For example, pan or zoom the charts on netdata server A and then click on server B. The panning/zooming is maintained. Mark an area (with alt or control + area select) on a chart on netdata A and then click netdata B. It will be maintained too. Scroll at a mysql server on netdata A and then click netdata B. If B runs a mysql server, it will automatically scroll to it. And many more...

The registry is a key attribute in our roadmap. It is the entity that will eventually allow us to provide unified cross server dashboards. So that, no matter how many netdata you have, you will use them all as one integrated application.

So, we don't plan to remove it. We actually move towards the opposite direction: enhance it significantly. For example, the next version of the registry may provide OAUTH (for authentication), central health monitoring, cross server custom dashboards editor, storage and sharing, store personal settings for all your netdata, etc.

Keep in mind that GDPR is a rulebook for processing personal data. It does not forbid personal data processing. It only enforces a ruleset to do it. Personal data are all those that somehow are associated with a person's identity (everything you can use to identify a person). The current version of the registry does not associate any of the data it maintains with any person's identity. That is, no one can identify the person using a netdata registry cookie (except of course the owner of the cookie). So, the current version of the registry is not related to GDPR.

However, the next version will, since it will require from us to somehow login (it will know our emails and this uniquely identifies a person).

@toastbrotch
Copy link

dont get me wrong:
to run my own private registry is very cool and does add value.

but lets collect the facts about your public registry registry.my-netdata.io:

  • my browser communicates with it, so your registry gets my ip: which is in fact personal data under gdpr: https://eugdprcompliant.com/personal-data/ .
  • my browser sends at least the referer which is the url of my netdata (which is information leakage)
  • warning mails also contain links via your registry that contain some more information your server gets as well
  • your public registry is default configuration
  • as user of netdata webui i am not aware of this communication and information leakage (no opt-in)
  • as user of netdata webui i am not asked if i want to share personal data with your registry
  • as user i have no possibly to even opt out
  • as user i have no information what is done with this data you get at the registry. i have not seen any declaration in the ui or elsewhere
  • it does also not matter which data you process, it matters what i send without knowing or allowing.

i don't think i need to explain further that this is not good in terms of security.

if we dig deeper in that call to your registry we see there, your public registry responds back with:

  • machine_guid
  • person_guid
  • all urls of all my netdata setups
    so you collect all my netdata setups together with identifiers and also get my ip and more stuff

to sum it up:

  • the default configuration with the usage of the public registry is unsafe as you leak lots of uncontrollable information and is towards data protection very questionable (possbly also illegal under gdpr).

as proposed i think good habit would be: do not enable your public registry by default, let user opt-in and opt-out and explain what you do with that data

@ktsaou
Copy link
Member

ktsaou commented Jul 8, 2018

I like your sensitivity, but you mix up things.

All netdata dashboards have this entry as the last entry of the my-netdata menu:

image

Read it. It explains everything in detail.

None of the information is personal.

  • IP 1.2.3.4. Is this IP personal data of someone?
  • Cookie 1111111-222222-333333--444444. This is cookie personal data of someone?
  • URL http://server1:19999/. Is this URL personal data of someone?

To become "personal data", we need a person's identity. Something like:

  • IP 1.2.3.4 is George Foo
  • Cookie 1111111-222222-333333--444444 is George Foo
  • URL http://server1:19999/ belongs to George Foo

Only in this case the IP or the cookie is personal information, under GDPR.

The registry does not currently have this extra information that personalizes the data.
So, the data are not personal. This is not my opinion. I have asked GDPR lawyers.

Netdata is distributed. This is the way it works. You can't stop it or change it. A registry is required.

The default public registry has been carefully designed to avoid any personal information, it is well documented and in case you are so sensitive, we have given you the option to install your own.

Think a bit of it: You publish a site and you add google analytics to it. Google analytics collects a lot more information compared to what the netdata registry collects. Do your users have to opt-in to it? No. Why opt-in is not required in this case? Because it does not collect personal data.

The only requirement is to let your users know you use a cookie. This is what the "What is this?" menu entry on all netdata dashboards does.

@toastbrotch
Copy link

toastbrotch commented Jul 8, 2018

i think i raised enough other problems. so please do not only respond/discuss if my ip is under the gdpr a personal data attribute which has to be treated equivalent to my name, emailaddress (or all others here: https://eugdprcompliant.com/personal-data/ ) or not. at the end even to falsify just this point does not matter if you understand and sum up all my other points.

noone ever integrates google-analytics into its own private dashboard that possibly holds secret information. thats a rather stupid point and has nothing to do with this discussion here! but you're right who ever thinks its unsafe to have google analytics integrated into his own private dashboard should not use netdata without own registry, as it defacto leaks the same amount and quality of data. besides if you ever add it, you do it by your own decision. but netdata does this without telling you and directly at first load of ui.

@Ferroin
Copy link
Member

Ferroin commented Jul 9, 2018

@ktsaou This:

Netdata is distributed. This is the way it works. You can't stop it or change it. A registry is required.

Is not entirely accurate. A registry is only required for certain features. Not everybody needs the my-netdata menu, and nowhere near everyone needs links in the notifications. They may be useful features for how you use it, but the way you use it isn't the only way to use it.

I personally never use the my-netdata menu (I've got custom dashboards that scrape-together the info that I need to track across multiple systems into one place, so I don't need quick switching between dashboards), and I also never use the links in the notifications (when I get a notification, I don't go to the Netdata dashboard, I log in to the system in question and see what's wrong, because I usually have a pretty good idea what's going on based on the notifications I have). So, for me personally, the registry is essentially useless right now, I just let the dashboard talk to it because I have no particular reason not to.

@toastbrotch
Copy link

toastbrotch commented Jul 10, 2018

in my opinion a fair workflow would be like this and it would also solve another "problem":

  • on first time pointing your browser to a netdata web-ui you are asked, if you have not defined your own registry
  • do you want to use the public registry?
    • yes: here are the terms and conditions of the registry and the privacy statements (explaining which data is sent and which data is processed/stored at the registry, as this are 2 different sets of data)
    • and also if yes: do you have already an identity you want to impersonate (explain how this works and what needs to be done to use this existing identity)
    • no: explain how to set up your own registry. (which will then make it "unusable" as long as feature request: completely turn off public and own registry #3937 is not possible)

this would solve those issues:

  • users know and are able to choose
  • you can place terms and conditions to be secured not having any responisbility if the registry ever is down, or gets hacked, or whatever
  • users instantly have their identities connected

@mikeweilgart
Copy link
Author

@toastbrotch, I agree with you about how bad it is to have the public registry enabled by default—you are stating the case well.

There's a problem with allowing a visitor to the Web UI to control the setting, though. Shouldn't that be controlled within the netdata configuration itself, which is on the host, not in the browser? (I.e. I wouldn't assume that a mere visitor should have the rights to make that decision.)

So then we're back to—even if the admin who set up netdata did intend to use the public registry, the user of the netdata Web UI may not want to send data there.

I think this becomes a question of preferences within the Web UI. Regardless of what registry is or isn't configured on the host, remember that the name of the configuration value is registry to announce.

After the registry is announced, it's still up to the browser (the Web UI) to actually send data to that registry or not.

I propose that a prompt appear in the Web UI when first visiting a netdata instance using your browser:

The netdata instance you are accessing has announced the URL "_______" as the registry.

Using a registry allows for (list out the benefits here).

If you don't send data to the registry, you can still use netdata, but (list out what features/options won't work).

Would you like to allow your browser to send data to registry _________?

(Four options: No, never; no, not this time; yes, just this time; yes, always. Could be accomplished with a "remember this setting" checkbox more fluidly.)

@cakrit
Copy link
Contributor

cakrit commented Nov 30, 2018

An update on what has happened on this.

Although issue #1919 (on opt-in to share server hostnames and URLs) was closed, the new registry will require opt-in. See the section "login to enable the registry" in #3990

We will be able to close this issue when #3990 is live.

@stale
Copy link

stale bot commented Jan 25, 2019

Currently netdata team doesn't have enough capacity to work on this issue. We will be more than glad to accept a pull request with a solution to problem described here. This issue will be closed after another 60 days of inactivity.

@stale stale bot added the stale label Jan 25, 2019
@simonnagl
Copy link

simonnagl commented Jan 26, 2019

#3990 is not done yet. This should not be closed.

@stale stale bot removed the stale label Jan 26, 2019
@cakrit cakrit added feature request New features and removed discussion labels Jan 26, 2019
@cakrit
Copy link
Contributor

cakrit commented Jan 26, 2019

It was incorrectly labeled as discussion. We're close to merging #5095, which will resolve this.

@cakrit
Copy link
Contributor

cakrit commented Feb 24, 2019

Opt-in via signing in was implemented in #5095

@jonathanmmm
Copy link

I like your sensitivity, but you mix up things.

All netdata dashboards have this entry as the last entry of the my-netdata menu:

image

Read it. It explains everything in detail.

None of the information is personal.

  • IP 1.2.3.4. Is this IP personal data of someone?
  • Cookie 1111111-222222-333333--444444. This is cookie personal data of someone?
  • URL http://server1:19999/. Is this URL personal data of someone?

To become "personal data", we need a person's identity. Something like:

  • IP 1.2.3.4 is George Foo
  • Cookie 1111111-222222-333333--444444 is George Foo
  • URL http://server1:19999/ belongs to George Foo

Only in this case the IP or the cookie is personal information, under GDPR.

The registry does not currently have this extra information that personalizes the data. So, the data are not personal. This is not my opinion. I have asked GDPR lawyers.

Netdata is distributed. This is the way it works. You can't stop it or change it. A registry is required.

The default public registry has been carefully designed to avoid any personal information, it is well documented and in case you are so sensitive, we have given you the option to install your own.

Think a bit of it: You publish a site and you add google analytics to it. Google analytics collects a lot more information compared to what the netdata registry collects. Do your users have to opt-in to it? No. Why opt-in is not required in this case? Because it does not collect personal data.

The only requirement is to let your users know you use a cookie. This is what the "What is this?" menu entry on all netdata dashboards does.

I have to add:
An IP address is personal data, without anything else.
So netdata making connection to e.g. googleapis.com without first asking (3rd party) is against at least DSGVO (Germany) and probably against GPDR, I will link a lawsuit that was filed against a website using google fonts and a users IP address was leaked to google without the users first accepting this connection, as this was 3rd party. If your own servers are also 3rd party, as e.g. you access localhost and nothing else, I don't know for sure, but google or other services are for sure.

https://www.theregister.com/2022/01/31/website_fine_google_fonts_gdpr/
The ruling directs the website to stop providing IP addresses to Google and threatens the site operator with a fine of €250,000 for each violation, or up to six months in prison, for continued improper use of Google Fonts.

Even an IP address is enough personal information, so any connection to a third party needs explicit consent from a user.

@cakrit
Copy link
Contributor

cakrit commented May 2, 2022

I have to add: An IP address is personal data, without anything else. So netdata making connection to e.g. googleapis.com without first asking (3rd party) is against at least DSGVO (Germany) and probably against GPDR, I will link a lawsuit that was filed against a website using google fonts and a users IP address was leaked to google without the users first accepting this connection, as this was 3rd party. If your own servers are also 3rd party, as e.g. you access localhost and nothing else, I don't know for sure, but google or other services are for sure.

https://www.theregister.com/2022/01/31/website_fine_google_fonts_gdpr/ The ruling directs the website to stop providing IP addresses to Google and threatens the site operator with a fine of €250,000 for each violation, or up to six months in prison, for continued improper use of Google Fonts.

Even an IP address is enough personal information, so any connection to a third party needs explicit consent from a user.

We are anonymizing IPs everywhere where we suspect they may be leaked. The agent hasn't been using GA for a while now, we just have PostHog there, which is an open source project. I haven't seen any IPs there either, though I did just ask them to double check and verify.

@jonathanmmm
Copy link

I have to add: An IP address is personal data, without anything else. So netdata making connection to e.g. googleapis.com without first asking (3rd party) is against at least DSGVO (Germany) and probably against GPDR, I will link a lawsuit that was filed against a website using google fonts and a users IP address was leaked to google without the users first accepting this connection, as this was 3rd party. If your own servers are also 3rd party, as e.g. you access localhost and nothing else, I don't know for sure, but google or other services are for sure.

https://www.theregister.com/2022/01/31/website_fine_google_fonts_gdpr/ The ruling directs the website to stop providing IP addresses to Google and threatens the site operator with a fine of €250,000 for each violation, or up to six months in prison, for continued improper use of Google Fonts.

Even an IP address is enough personal information, so any connection to a third party needs explicit consent from a user.

We are anonymizing IPs everywhere where we suspect they may be leaked. The agent hasn't been using GA for a while now, we just have PostHog there, which is an open source project. I haven't seen any IPs there either, though I did just ask them to double check and verify.

It does not matter what you do. If I acces example.com of course my IP address gets leaked to them and the hoster they use, but it should not be leaked without explicit acceptance by the user to any third party (e.g. Google)

@gmosx gmosx removed their assignment May 2, 2022
@cakrit
Copy link
Contributor

cakrit commented May 3, 2022

It does not matter what you do. If I acces example.com of course my IP address gets leaked to them and the hoster they use, but it should not be leaked without explicit acceptance by the user to any third party (e.g. Google)

We'll take the risk then. As long as the 3rd party provider (which I repeat isn't Google) is verifiably not storing personal data, we believe we'd win in court. The benefits of a sizeable percentage of anonymous statistics far outweigh the small risk of being ordered to do it in a different way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants