Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add usage report to Mimir #1815

Closed
RichiH opened this issue May 4, 2022 · 11 comments
Closed

Add usage report to Mimir #1815

RichiH opened this issue May 4, 2022 · 11 comments
Assignees
Labels
ease-of-use enhancement New feature or request

Comments

@RichiH
Copy link
Member

RichiH commented May 4, 2022

Along the lines of grafana/loki#5361
NB, this took a few fixes, namely

@RichiH RichiH added the enhancement New feature or request label May 4, 2022
@RichiH
Copy link
Member Author

RichiH commented May 4, 2022

@56quarters
Copy link
Contributor

As a requirement for implementing this, I'd need to see as part of the PR:

  • Documentation about exactly what pieces of information would be collected and an example payload (JSON or similar).
  • How users can disable this beyond adding a CLI flag to the documentation of all CLI flags.
  • How the information collected is determined and what the process for changing it is.
    • Do the Mimir maintainers vote on this? Do they have any say?
    • Is this controlled by Grafana? If so, who is responsible for approving it?
      • Can anyone decide to increase the information collected or does it require approval from e.g. VP level, C-level, etc.

@RichiH
Copy link
Member Author

RichiH commented Jun 29, 2022

As per governance, rough consensus within Mimir team applies by default. Additionally, any Mimir team member can call a vote about any topic regarding the project at any time.

As a non-team member, I believe in following the principle of least surprise. As such, I would argue that data sent, syntax to disable sending, commented out section in default configuration, and documentation should mirror Tempo & Loki.

@pracucci pracucci self-assigned this Aug 2, 2022
@pracucci
Copy link
Collaborator

pracucci commented Aug 2, 2022

I'm going to work on this. Loki and Tempo already have it, and Mimir team wants to have anonymous statistics too, to better drive decisions when building features and supporting OSS users.

@pracucci
Copy link
Collaborator

pracucci commented Aug 2, 2022

Requisites

  • We want to follow how Loki and Tempo works (to keep it consistent)
  • We want it to work out of the box with no additional config

Seed file

The seed file is a JSON file named mimir_cluster_seed.json and stored at the root of the blocks storage bucket (or under the configured -blocks-storage.storage-prefix).
This file is used to store the unique cluster ID in a durable storage.

The content of the file is:

{
    # Random UUID uniquely identifying the Mimir cluster.
    UID: "xxx",

    # Timestamp of when the seed file was created.
    created_at: "2006-01-02T15:04:05.999999999Z",

    # Mimir version when the seed file was created.
    # IMPORTANT: Loki and Tempo named this field "version" but I think it's too generic and may cause misunderstanding.
    #            Also I want to keep the door open to version this file, and the field name would be called "version".
    created_version: {
        version: "",
        revision: "",
        branch: "",
        buildUser: "",
        buildDate: "",
        goVersion: "",
    },
}

Report

The report is a JSON file periodically sent from each Mimir replica to a backend API.
The report contains only anonymous statistics, used to better drive decisions when building features for the OSS community.

{
    # The cluster ID.
    "clusterID": "",

    # When the cluster was created.
    "createdAt": "",

    # When the report was created (value is aligned across all replicas of the same Mimir cluster).
    "interval": "",

    # How frequently the report is sent, in seconds.
    "intervalPeriod": 0.0,

    # The "target" used to run Mimir.
    "target": "",

    # The current Mimir version.
    "version": {},

    # The current OS and architecture.
    "os": "",
    "arch": "",

    # The Mimir edition. Supported values are: "oss", "enterprise".
    "edition": "",

    # Custom metrics tracked by Mimir. Can contain nested objects.
    "metrics": {},
}

Mimir components tracking usage stats

To get it working out of the box, in the initial implementation Mimir will support tracking of usage statistics only from components already using the blocks storage (so that it's already configured):

  • Ingesters
  • Queriers (and rulers when the querier component is running internally)
  • Store-gateway
  • Compactor

Action plan

Part of this action plan is outside of Mimir scope (e.g. GEM), but I prefer to keep it as much transparent as possible given the only good intentions we have about using these anonymous reports (all in all we want to better support the community).

Build support in Mimir

  • Create seed file when doesn't exist, or wait for a stable seed file otherwise (PR)
    • Ensure it doesn't cause any issue with bucket scanning, bucket index creation or compactor
    • Document it as invalid tenant ID
    • Re-create seed file if corrupted
  • Vendor Mimir in GEM and fix changes to object store Middlewares
  • Periodically send report to backend API (PR)
    • See nextReport() logic in Loki
  • Vendor Mimir in GEM and set the edition
  • Track custom metrics (PR)
    • Type of backend storage used (Loki example)
    • Ingester replication factor
    • Number of in-memory series in the ingester
    • Number of samples received in the ingester
    • Number of queries executed
  • CHANGELOG (PR)
  • Documentation (PR)
    • Why we collect anonymous usage stats
    • Which information is collected
    • How to disable it
  • Fix reporter: if a report fails to send, we need to try to send the same exact report, because counters are reset each time we build a new one (PR)
  • Track out of order time window configured (PR)
  • Remove the experimental flag, enable it by default, update the CHANGELOG and doc accordingly (PR)

Will follow up separately: Come up with a documented strict policy on how additional data collection should be reviewed and approved/rejected (and shared with Loki and Tempo too).

Build support in GEM

  • Set the edition to enterprise

Build backend API support

  • Build support in the backend API to collect anonymous usage stats

Build dashboard to query back anonymous usage stats

  • Build "Mimir Usage Report" dashboard

@colega
Copy link
Contributor

colega commented Aug 2, 2022

One nit:

    created_version: {
        version: "",
        revision: "",
        branch: "",
        buildUser: "",
        buildDate: "",
        goVersion: "",
    },

The information about which Mimir version created the file seems to be ephemeral, and I don't see why we would need it (debugging purposes in case it's wrong?)


The rest of the plan looks good to me! 👍

@RichiH
Copy link
Member Author

RichiH commented Aug 2, 2022

    # Random UUID uniquely identifying the Mimir cluster.
    UID: "xxx",

The comment says UUID, but the file says UID. UUID v4 are generally better than UIDs

I would argue that starting with a versioned, well, version would be better and that the other projects should also start versioning.

Nothing in the report explicitly tells me if it's Mimir or something else.

    # The current Mimir version.
    "version": {},

So maybe call this mimir_version and leave version free for versioning of the report itself?

@56quarters
Copy link
Contributor

Could a requirement of this feature please be documenting how the information collected will evolve over time, if at all? I ask because we're asking our OSS users to trust that we won't collect anything sensitive. My concern is that we inadvertently add some piece of information to the usage stats (because it would be useful to Grafana as a company) without a lot of scrutiny that causes privacy issues or similar. I know that Loki has documentation around how the feature works and we are planning to, but I'd like something that describes how the feature will work over time.

As an example we could document:

  • We will only change the information collected in a major release (or minor release with a 2 version warning).
  • Any new information collected will be mentioned in the release notes in a dedicated section.
  • The documentation about how the feature works will always have the up-to-date list of information collected.
  • OR we commit to never changing the information collected once this is in a release.

@pracucci
Copy link
Collaborator

pracucci commented Aug 8, 2022

I definitely commit to write the doc and being as much clear as possible. We can't commit to a too strict policy like "we'll never change it" or "we'll change on major releases only", but we'll definitely be very clear about what we collect and why.

@RichiH
Copy link
Member Author

RichiH commented Aug 9, 2022

Strong +1 on being aggressively transparent on what's being collected.

@pracucci
Copy link
Collaborator

Enabled by default, so consider this work done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease-of-use enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants