Can/should we add `privacy-respecting usage metrics`? #4456

CommanderStorm · 2024-02-05T14:55:59Z

Dear community,

I would like to discuss if adding privacy-respecting usage metrics is something we want to learn from and to inform our actions.

This discussion is based on two impulses

During FOSDEM, I watched a talk about "Privacy-respecting usage metrics for free software projects" by @wjt (GNOME-Contributor, EndlessOS).
In the talk, he goes into why free software projects might want to collect data on how the software is used, and how this can be done in a privacy-respecting fashion.
Moving the server #4296 we discovered that

the /version endpoint requests is about 1-2 requests per second, which makes us wonder how many Uptime Kuma instances are running all over the world.

Core requirements:

adding a pop-up for our users to decide if they would like to contribute their data to these metrics (=> "Informed consent")
Only tracking the users/system settings and system state, no users' behaviour.
Only collecting metrics in a privacy-centric way, such as the one suggested by @wjt or the ISRG via divviup
Only collecting metrics for concrete experiments/questions.

Implementing such metrics would have these concrete benefits:

during the v2.0 release, we focused a lot on improving performance for large (more than 500 monitors) deployments and reducing storage requirements.
It would be valuable to know if our prioritisation of these features is correct ("Are we pandering to he 20% or the 80%?") to make sure that we are offering a good serivce to most of our users.
Going on wild tangents which only matter to the minority of our users might not be the best use of maintainer time (despite optimisation being fun => sometimes nessesary ^^).
A large part in maintaining uptime-kuma is spent on both monitors and notification providers. It would be beneficial to know if one of these has significantly more users than others to help us prioritise PRs better.
how many users are using the proxy feature? Should implementing Notifications via proxy #616 be a priority?
how many users have are using Prometheus metrics? Should metrics-issues be a bigger priority?
how many users are using which language? (see discussion on translation quality in Translations Update from Weblate #4394)
Do people actively use the maintenance system? If yes, how many active/passively and how often? (=> existing UX, priority of improvements in this area, need for UI - Remove Monitor Pause Confirmation #2359 and other shortcuts)
Is the current incident system used? (=> existing UX, Timeline-based incident system #1253)
How many users are using groups? (=> existing UX, Selection of dependent monitors #1236)
If we push an update, how quick are users updating to it?

This would also have downsides:

instead of looking at where we might want to take this, we might look at how users are already using it. If we focus on this too hard, features like DNS: check content #432 could be de-prioritised.
Some people might be vehemently opposed to this idea, such as @ddevault in https://github.com/orgs/meilisearch/discussions/162

I would especially appreciate feedback from the regular contributors (I apologize for the ping) @louislam, @chakflying, @Zaid-maker, @marco-doerig, @Saibamen, @Computroniks, @MrEddX, @AnnAngela, @cyril59310, @apio-sys

PS: I know that privacy is a charged topic, but please let's keep the discussion civil ^^

The text was updated successfully, but these errors were encountered:

AnnAngela · 2024-02-06T07:12:57Z

I personally fully support your idea, as long as it's effectively anonymized, and look forward to others' opinions.

cyril59310 · 2024-02-06T07:34:36Z

I think that collecting data anonymously on the usage of Uptime Kuma is a good idea.

MrEddX · 2024-02-06T10:19:53Z

Yes, the idea is undeniably good and would be of great benefit to the developers of the project. From privacy or ethical point of view, I think that the end user should be given the right to choose whether this feature is active or inactive, although the data is anonymous.

ddevault · 2024-02-06T10:31:13Z

Hard requirement: any data collection must be opt-in only.

adding a pop-up for our users to decide if they would like to contribute their data to these metrics (=> "Informed consent")

Perfect, but focus also on the "informed" bit: exactly what data is collected, for what purpose, and how is it used?

Only collecting metrics for concrete experiments/questions.

I agree with this requirement, but it's not well supported by your summary.

Implementing such metrics would have these concrete benefits:

For each of these supposed benefits, you need to make a clearer case in order to justify monitoring. "It would be interesting" is generally not good enough. For example:

Do people use the maintenance system? If yes, how many and how much?

Positing a particular kind of metric you want to track like this is a good start, but expand on it:

Why do you need to know this?
What answers to this question would you expect?
Are any or all of these answers actionable? What actions, specifically, will you take for each possible answer?
What is the minimum amount of metrics you need to collect in order to provide a useful answer to this question? Be specific, e.g. "to answer this question we will record every time someone visits /maintenance, then sum this figure and include it in the weekly metrics collection batch request to our collection server")
Are the minimum necessary metrics to answer this question possible to collect in an anonymous and reasonably privacy-respecting manner?

ddevault · 2024-02-06T10:34:43Z

Also be aware that adding these features is going to subject you to the GDPR. You will have to comply with it, which means things like having a publicly accessible data protection officer.

CommanderStorm · 2024-02-06T18:32:14Z

@ddevault
I have reworded the two questions you had problems with.

You are correct, I think a public site explaining with the following content will be necessary

the collection methods and methodology
the running experiments (most of the time None, but with explanation of "What, Why, Duration of collection")
and past experiments (with the results, for transparency and for new users to make more informed decisions)
privacy policy + imprint

As for tooling, a tool like divviup by the ISRG might be a good choice.

adding these features is going to subject you to the GDPR

Actually, the GDPR only covers personally identifying data. Since I do not intend to ever store such data, such a data protection policy is simple. I have written them before and can do that again.
I would recommend you to watch the talk by Will linked above. He goes into plenty of details how this can be done in a manner which respects privacy.
When talking about "privacy respecting usage metrics", this is not the same as talking about "spyware" as you referred in https://github.com/orgs/meilisearch/discussions/162. When I talk about usage metrics, his is more nuanced and experiment based.
See Telemetry Is Not Your Enemy for an article why "Not all data collection is the same, and not all of it is bad"

As for the duration of collection, I would say this depends entirely on how our users' upgrade behaviour (likely different between major - minor - patch updates) works, as I think only via updates new metrics can be introduced or clientside-disabled.
If an experiment has ended, no data is collected and after analysis the data is deleted.
=> start an experiment, collect results, finish the experiment, publish a new version without said experiment

rezzorix · 2024-02-11T05:46:08Z

I am with @ddevault

If this is implemented then at max opt-in only and with very clear defined limited scope.

mh166 · 2024-02-11T22:49:06Z

I'm also in favor of adding such telemetry as it will be very beneficial as you laid out nicely. Of course, I agree that it should be Opt-In only.

Thoughts on the implementation

When asking for the admin's consent, please let the initial message be short and concise. From personal experience: the more text there is, the more I suspect it to be because of corporate legal reasons. Therefore i suspect nothing good, am too lazy to read on and just decline.

To prevent this, I'd suggest just a short statement. Something like "We don't collect any personally identifiable data. Just general system parameters (like: version, number of monitors, type of monitors, number of notifications, ...) are collected. The dataset is anonymized and cannot be traced back to your system."
Below that I would like to see two links:
1. "Click here to learn more" – which might either link to a documentation page or reveal a detailed explanation in-app (preferably).
2. "View collected data" – which displays (in a user-friendly, human-readable way) the data that is being collected, to allow me to make an educated decision.

Thoughts on evaluating the results

Please keep in mind that while this data might help you to prioritize bug fixes or enhancements, new features should still be considered with a reasonable high priority: you cannot measure what is not there yet. 😉 To prioritize between several new features, the number of +1s, together with the number of duplicate issues may be a better indicator.

Another thing to remember when looking at the data: an apparently low usage might not necessarily indicate little potential but might as well show opportunities for UX improvements.

An example from personal experience: when I first started using Uptime Kuma, it was not very intuitive for me to find that there is a simple incident system included. I stumbled across it by accident, then did not remember how I found it in the first place and had to figure out again how I got there in the first place.

The reason: you can only add an incident if you created a status page and if you are visiting said page and if you are editing it right now. Should you not meet either one of these conditions, you may never know about it, because not even the documentation mentions this feature.

Therefore, in this example, a change in UX might also increase the usage of the feature and consequently any prioritization related to it.

rezzorix · 2024-02-18T14:00:11Z

@louislam I am following the uptime-kuma journey since the very beginning and would be very interested to have your take on all this.

Zaid-maker · 2024-04-06T00:32:55Z

I am 100% in the favor of adding this feature ☺

rezzorix · 2024-05-25T09:56:05Z

So this topic here has been unpinned from the main page.
I guess we are in the clear now, or is someone implementing this quietly?

Lanhild · 2024-07-24T16:37:22Z

+1 for implementing such metrics. It would be useful to have some data from "extreme" users that push the limits of the app, that'd help for performance updates.

Zaid-maker · 2024-10-17T19:44:48Z

I advice to use Opentelemetry on the demo server of Uptime Kuma or we can ship Opentelemetry inside of docker image and allow user to have full control to opt-in and opt-out for ofc privacy reason and only collects anonymous data and send that data to services like Grafana or Prometheua etc., whatever Louis choose... Where that data will be used to shape Uptime Kuma future. What's your thoughts on this @CommanderStorm @chakflying

CommanderStorm · 2024-10-17T21:01:12Z

OTel is an

An Observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.

(https://opentelemetry.io/docs/what-is-opentelemetry/)

OTel Tracing/Logs is push based, but not the right fit (too much data, not privacy preserving)
OTel Metrics are a better fit, but are pull based, which is not quite a good fit. Having to know the urls/IPs of our users is also not great. Also this would not work if users have inbound networks blocked (thus underrepresent home labs, ..)

Metrics also allow de-identification => not privacy preserving.

=> DivviUp seems to be still the best option as fast as I can see, but I don't have the time to spend on such a project for the foreseeable future

Zaid-maker · 2024-10-18T07:39:43Z

=> DivviUp seems to be still the best option as fast as I can see, but I don't have the time to spend on such a project for the foreseeable future

DiviUp only have client libraries for Rust and Typescript even if we or someone else wants to do this, how this can be done?

CommanderStorm · 2024-10-18T08:32:17Z

The process would be:

Looking into what setting up two non-colluding servers takes (docs).
- Are there any roadblocks or can this be just docker compose up-ed? (minus adding a reverse proxy for certs)
- How operationally difficult is operating a DAP? (just a guess is fine)
Looking if there are blockers for adding divviup-ts?
- Should we add this in the frontend or in the backend? (likely the backend)
Think about what the phrasing on the modal (shown for every user) for users to decide if they want to share the data with us should be. (likely best done via a Draft-PR or via sharing screenshots)
Adding an example Task for reporting for example the number of monitors at startup if the user allows this.

Zaid-maker · 2024-10-18T09:20:42Z

So, first we can implement divviup on the Demo Website first to check how many instances are running all around the world then we can start integrating Divviup on this repo. Luckily Demo Website server writen in Typescript so i can give it a shot and start implementing on the Demo first.

I started to initialize the Task first

Now Have to add measurement to send data(need help in that)

rezzorix · 2024-10-18T14:52:43Z

I have questions:

Who is in the end collecting this data?
Where is this data stored?
Who has access to it?

Zaid-maker · 2024-10-18T14:57:42Z

I have questions:

Who is in the end collecting this data? Where is this data stored? Who has access to it?

Ofc, the server to collect the data is owned & maintained by Louislam himself, Only Louis has access to that data or maybe he want he can allow Frank and Nelson(apologize for mentioning name's as i don't to ping and disturb someone) to see the data and maintain the project more deeply.

rezzorix · 2024-10-18T15:25:53Z

I have questions:
Who is in the end collecting this data? Where is this data stored? Who has access to it?

Ofc, the server to collect the data is owned & maintained by Louislam himself, Only Louis has access to that data or maybe he want he can allow Frank and Nelson to see the data and maintain the project more deeply.

Storing metrics on a private server, accessible to only 2...3 people, contradicts both open-source principles and the original motivation of this project as a self-hosted, transparent solution. In fact, it raises more concerns about the motivation of some people.
@louislam hasn’t commented yet, but as the founder of the project, he should give his input on this matter, which he hasn’t done at all so far.

Zaid-maker · 2024-10-18T15:32:56Z

Storing metrics on a private server, accessible to only 2...3 people, contradicts both open-source principles and the original motivation of this project as a self-hosted, transparent solution. In fact, it raises more concerns about the motivation of some people.

The reason i give u earlier is my opinion on how this can be happen things can change by time. if Louis wants it to keep tottaly transparent to his users or make it private it's his choice on it, we are not going to collect sensitive user data we will only store data such as:

How many Instances of Uptime Kuma is running all over the world on the Demo web.
what version of Uptime Kuma people still using.
Machine type and OS
NO IP LOGGING OR ANYTHING

@louislam hasn’t commented yet, but as the founder of the project, he should give his input on this matter, which he hasn’t done at all so far.

He's busy completing the release of v2.0.0 and other things once he has time, then we know what will be future of this feature. Pinging is not good practice unless u have very urgent query

rezzorix · 2024-10-18T15:43:30Z

Zaid, I appreciate your input, but this is not about simply collecting "non-sensitive" data. It’s about ensuring the project remains transparent and aligned with its open-source principles. Whether the data is sensitive or not, storing it on a private server with access restricted to a few people goes against those values.

Regarding Louis, I don’t need to be told about pinging etiquette. The reason I brought him into this discussion is because this decision could fundamentally change the direction of the project. As the founder, his input is critical before moving forward.

Zaid-maker · 2024-10-18T16:06:02Z

Zaid, I appreciate your input, but this is not about simply collecting "non-sensitive" data. It’s about ensuring the project remains transparent and aligned with its open-source principles. Whether the data is sensitive or not, storing it on a private server with access restricted to a few people goes against those values.

Regarding Louis, I don’t need to be told about pinging etiquette. The reason I brought him into this discussion is because this decision could fundamentally change the direction of the project. As the founder, his input is critical before moving forward.

Alright sounds good, P.S

CommanderStorm · 2024-10-18T18:54:06Z

I don't care enough to fight about this.
I am going to close the issue as this seems like this is not a productive/healthy discussion.
The discussion was meant to see if being a bit more data driven can work, but since I currently don't have the energy to bring this to fruition (and I don't like how hostile this is being discussed), keeping it open is pointless.

Here is a bit of input:

DAPs need 2 NON-COLLUDING servers. That means, not operated by the same person.
No one operator can infer non-aggregated results.
those metrics being public is not something that anybody objected to. Admittedly, I don't know if there is a nice Frontend/dashboard for this.

If people are are curious, this is how Firefox does solve this (via the same tech):
https://divviup.org/blog/divvi-up-in-firefox/

Zaid-maker · 2024-10-19T07:17:42Z

those metrics being public is not something that anybody objected to.

Exactly that's what i am trying to say, Big Tech Companies never public their metrics information idk why he is keep saying the same thing anyways

Good u close this as right now most important is Release of v2.0.0, Good Luck with it!

CommanderStorm added the discussion label Feb 5, 2024

CommanderStorm pinned this issue Feb 5, 2024

CommanderStorm mentioned this issue Feb 8, 2024

settings proxy import / export #4465

Closed

CommanderStorm added the A:core label Feb 8, 2024

CommanderStorm changed the title ~~Discussion about privacy-respecting usage metrics~~ Can/should we add privacy-respecting usage metrics? Feb 9, 2024

This comment was marked as spam.

Sign in to view

CommanderStorm mentioned this issue Feb 17, 2024

v2.0.0-Release <- Read this for performance problems #4500

Closed

16 tasks

CommanderStorm mentioned this issue Feb 26, 2024

Running Uptime-Kuma on Kubernetes #4530

Closed

2 tasks

This was referenced Mar 11, 2024

[Discussion] App Layout TUM-Dev/campus_flutter#219

Closed

Seeing a connection from Uptime Kuma container to AWS #4585

Closed

CommanderStorm mentioned this issue Mar 23, 2024

feat: show monitor descriptions on status page #4612

Draft

6 tasks

This was referenced Apr 12, 2024

Add Status Class for Monitor Container in Status Page #4674

Closed

Is the dashboard loading the entire history of heartbeat? #4684

Closed

CommanderStorm mentioned this issue May 8, 2024

Uptime-Kuma does not respond #4747

Closed

2 tasks

CommanderStorm mentioned this issue May 18, 2024

Exec monitor #1117

Open

1 task

CommanderStorm unpinned this issue May 22, 2024

CommanderStorm closed this as not planned Oct 18, 2024

spirillen mentioned this issue Apr 12, 2025

divviup.org mypdns/matrix#126923

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Can/should we add `privacy-respecting usage metrics`? #4456

Can/should we add `privacy-respecting usage metrics`? #4456

CommanderStorm commented Feb 5, 2024 •

edited

Loading

AnnAngela commented Feb 6, 2024 •

edited

Loading

cyril59310 commented Feb 6, 2024

MrEddX commented Feb 6, 2024

ddevault commented Feb 6, 2024

ddevault commented Feb 6, 2024

CommanderStorm commented Feb 6, 2024 •

edited

Loading

This comment was marked as spam.

rezzorix commented Feb 11, 2024

mh166 commented Feb 11, 2024

rezzorix commented Feb 18, 2024

Zaid-maker commented Apr 6, 2024

rezzorix commented May 25, 2024

Lanhild commented Jul 24, 2024

Zaid-maker commented Oct 17, 2024 •

edited

Loading

CommanderStorm commented Oct 17, 2024

Zaid-maker commented Oct 18, 2024

CommanderStorm commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024 •

edited

Loading

rezzorix commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024 •

edited

Loading

rezzorix commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024 •

edited

Loading

rezzorix commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024

CommanderStorm commented Oct 18, 2024 •

edited

Loading

Zaid-maker commented Oct 19, 2024

Can/should we add privacy-respecting usage metrics? #4456

Can/should we add privacy-respecting usage metrics? #4456

Comments

CommanderStorm commented Feb 5, 2024 • edited Loading

AnnAngela commented Feb 6, 2024 • edited Loading

cyril59310 commented Feb 6, 2024

MrEddX commented Feb 6, 2024

ddevault commented Feb 6, 2024

ddevault commented Feb 6, 2024

CommanderStorm commented Feb 6, 2024 • edited Loading

This comment was marked as spam.

rezzorix commented Feb 11, 2024

mh166 commented Feb 11, 2024

Thoughts on the implementation

Thoughts on evaluating the results

rezzorix commented Feb 18, 2024

Zaid-maker commented Apr 6, 2024

rezzorix commented May 25, 2024

Lanhild commented Jul 24, 2024

Zaid-maker commented Oct 17, 2024 • edited Loading

CommanderStorm commented Oct 17, 2024

Zaid-maker commented Oct 18, 2024

CommanderStorm commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024 • edited Loading

rezzorix commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024 • edited Loading

rezzorix commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024 • edited Loading

rezzorix commented Oct 18, 2024

Zaid-maker commented Oct 18, 2024

CommanderStorm commented Oct 18, 2024 • edited Loading

Zaid-maker commented Oct 19, 2024

Can/should we add `privacy-respecting usage metrics`? #4456

Can/should we add `privacy-respecting usage metrics`? #4456

CommanderStorm commented Feb 5, 2024 •

edited

Loading

AnnAngela commented Feb 6, 2024 •

edited

Loading

CommanderStorm commented Feb 6, 2024 •

edited

Loading

Zaid-maker commented Oct 17, 2024 •

edited

Loading

Zaid-maker commented Oct 18, 2024 •

edited

Loading

Zaid-maker commented Oct 18, 2024 •

edited

Loading

Zaid-maker commented Oct 18, 2024 •

edited

Loading

CommanderStorm commented Oct 18, 2024 •

edited

Loading