New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: all: add opt-in transparent telemetry to Go toolchain #58894
Comments
This proposal has been added to the active column of the proposals project |
@rsc I didn't get a chance to ask this before the discussion was closed:
Why not assign every known string a number, and then use those numbers instead of sending strings? While sending the strings themselves is equivalent to the numbers, there's a "slippery slope" whiff I get from it. Publishing the known string index is another way to assure users that their privacy is being protected. There's an extra feeling of security knowing that only numbers are being sent. |
@willfaught I actually think it would be less transparent, not more. Because instead of seeing the actual human-readable information that is sent, you get a bunch of cryptic numbers that you have to interpret. Even if it was increasing a "feeling of privacy", there is a good argument to be made that giving such a feeling without actually increasing privacy (which this wouldn't do) is a bad thing. This would be privacy theater, if you will. Combine that with the fact that this would be significantly more churn and work - you either have the mapping be automatic, in which case it might completely re-number everything on every Go release, or you have to manually keep it up to date whenever something changes - and this seems an overall pretty bad idea. |
Thank you for: allowing anyone to participate in this this conversation So far I see a thoughtfully and polished proposal with user privacy as a fundamental guideline. I trust the spirit of the proposal that safeguards the balance between privacy and functionality is preserved and never broken. I'm making these comments as simple user who has chosen to use the language for work and personal use for many years now, not having any significant reason to doubt or question this choice. Grateful for the work being done and the direction where the language is going. |
Most of the data sent will be counters/numbers anyway. The format is binary, if I remember correctly, so it can be mmap'ed. This wouldn't affect the human readability of the data either way.
Avoiding a slippery slope isn't theater.
In my opinion, there isn't much widespread sympathy for having to do more work to do the right, best thing in terms of privacy. This change in direction happened because the community insisted that the Go team do the harder thing. Russ acknowledged that this change will require more work:
So I don't think it's necessarily relevant how hard it is, not without a careful tradeoff analysis. And I doubt it would entail as much hassle as you make it out to be, once a process is in place. For instance, the index can be immutable, and added to over time and releases with an AST scan. There may indeed be a good reason not to do this, but I'm not convinced that this is one of them. |
Hi @willfaught, there is a binary file used for performance reasons, but as I understand it, the uploaded data is purposefully in human-readable JSON. From the design post:
If the Go toolchain was to be underhanded about what it is doing, information could in theory be smuggled in a series of words or a series of numbers. To reduce the chance of the Go toolchain doing something underhanded here, including regarding how it handles strings, I suspect someone opting in will need to rely at least in part on the Go toolchain being open source (and either look at the code themselves, or rely on the likelihood of others doing so, the reproducibility of toolchain builds, the code review process, etc.). If someone crosses that threshold of trust and is willing to choose to opt in, I tend to agree with @Merovius that plain text strings are easier to understand and help with transparency more than a series of numbers. |
From https://research.swtch.com/telemetry-design:
I think this is saying the counter files are locally compact, but emitted as JSON. In the original thread someone made the point that plaintext is a good clue to someone who is sniffing traffic what's going on, that made sense to me. In a wonky way, a set of plaintext strings is already a set of unique numbers in this kind of scenario. |
I see. It's hard to judge the readability of I think the bigger concern—bigger than the malicious sending of personal data strings—is the accidental sending of personal data strings. Nobody wants "home_video_of_[embarrassing thing].mov" or "[embarrassing thing]website.com/go-foo/bar" accidentally sent to the Go team. And perhaps no one will file an issue to fix it either, out of embarrassment, when the problem is discovered. Not sending any strings at all rules out that possibility entirely. |
By design only strings that are explicitly requested by the telemetry server will be sent to the telemetry server. Of course we can't stop anyone from modifying the Go tools to send other strings, but it hardly seems likely to happen by accident. |
@willfaught publishing numbers instead of strings is a false sense of security. It accomplishes nothing. Who does it protect from when the strings are public and can be consulted anytime by anyone? Users would be more comfortable reading and understating right there what is being sent instead of cross-referencing that is a burden for the user an no burden for any would-be malicious actors. |
Note that one class of strings are counter names and the counter names might be a full stack trace (function names and line number offsets). The space is large. So, no, I don't think I'm overstating the hassle. And, again, I think this actually worsens the privacy and transparency of the design and makes it easier to abuse, if anything. I agree that doing extra work might be worth it, if it improved privacy - but in this case, it's a lose-lose, you'd have to do extra work to make things worse. |
Ah, I see, then the string space might indeed be very large. Makes sense. |
For those who might be interested, if users will be able to inspect the reports:
(From https://research.swtch.com/telemetry-design#counting.)
|
It's not |
Have all concerns about this proposal been addressed? |
In February I posted a series of blog posts defining Transparent Telemetry, and we had a lively discussion on #58409. In the original posts, the design was opt-out. Based on that discussion as well as private discussions with long-time contributors and users, I revised the design to be opt-in.
I propose that we add opt-in transparent telemetry to the Go toolchain as described in those posts, specifically “The Design of Transparent Telemetry.” Transparent Telemetry has the following key properties:
Please note, as described in the Why Telemetry? section of the intro post, that telemetry addresses a different kind of problem than bug reports and surveys. In particular, relying on bug reports is not sufficient to identify problems that don't obviously impact functionality, including performance problems, and surveys are not sufficient to identify the variety of usage and contexts where Go is used and which would inform prioritization of effort.
There is good reason to believe that with even tens of thousands of users opted in, we should be able to get helpful data. It will not be as complete as the opt-out system, but it should be good enough. As described in the Can We Still Make Good Decisions? section of the opt-in post, there will be certain biases in the data based on who is more likely to opt in. Once we have data, it would make sense to compare “technical demographics” like operating system and editor against the annual Go survey and Stack Overflow surveys. If there is significant skew, we could look into reweighting the data as standard polls do (https://en.wikipedia.org/wiki/Iterative_proportional_fitting).
For examples of the kinds of questions we'd use telemetry to answer and the kinds of decisions those answers would inform (but not decide directly), see “Use Cases for Transparent Telemetry”.
The text was updated successfully, but these errors were encountered: