Add usage report to Mimir #1815

RichiH · 2022-05-04T10:58:25Z

Along the lines of grafana/loki#5361
NB, this took a few fixes, namely

RichiH · 2022-05-04T12:40:28Z

Also see https://github.com/grafana/tempo-squad/issues/81

https://github.com/grafana/loki/blob/e15a03b5e5aa2828aeabfe24cfb3584ab88fcfda/cmd/loki/loki-local-config.yaml#L32-L43 gives a nice template for wording.

56quarters · 2022-06-29T13:28:02Z

As a requirement for implementing this, I'd need to see as part of the PR:

Documentation about exactly what pieces of information would be collected and an example payload (JSON or similar).
How users can disable this beyond adding a CLI flag to the documentation of all CLI flags.
How the information collected is determined and what the process for changing it is.
- Do the Mimir maintainers vote on this? Do they have any say?
- Is this controlled by Grafana? If so, who is responsible for approving it?
  - Can anyone decide to increase the information collected or does it require approval from e.g. VP level, C-level, etc.

RichiH · 2022-06-29T15:18:41Z

As per governance, rough consensus within Mimir team applies by default. Additionally, any Mimir team member can call a vote about any topic regarding the project at any time.

As a non-team member, I believe in following the principle of least surprise. As such, I would argue that data sent, syntax to disable sending, commented out section in default configuration, and documentation should mirror Tempo & Loki.

pracucci · 2022-08-02T08:10:17Z

I'm going to work on this. Loki and Tempo already have it, and Mimir team wants to have anonymous statistics too, to better drive decisions when building features and supporting OSS users.

pracucci · 2022-08-02T10:13:59Z

Requisites

We want to follow how Loki and Tempo works (to keep it consistent)
We want it to work out of the box with no additional config

Seed file

The seed file is a JSON file named mimir_cluster_seed.json and stored at the root of the blocks storage bucket (or under the configured -blocks-storage.storage-prefix).
This file is used to store the unique cluster ID in a durable storage.

The content of the file is:

{
    # Random UUID uniquely identifying the Mimir cluster.
    UID: "xxx",

    # Timestamp of when the seed file was created.
    created_at: "2006-01-02T15:04:05.999999999Z",

    # Mimir version when the seed file was created.
    # IMPORTANT: Loki and Tempo named this field "version" but I think it's too generic and may cause misunderstanding.
    #            Also I want to keep the door open to version this file, and the field name would be called "version".
    created_version: {
        version: "",
        revision: "",
        branch: "",
        buildUser: "",
        buildDate: "",
        goVersion: "",
    },
}

Report

The report is a JSON file periodically sent from each Mimir replica to a backend API.
The report contains only anonymous statistics, used to better drive decisions when building features for the OSS community.

{
    # The cluster ID.
    "clusterID": "",

    # When the cluster was created.
    "createdAt": "",

    # When the report was created (value is aligned across all replicas of the same Mimir cluster).
    "interval": "",

    # How frequently the report is sent, in seconds.
    "intervalPeriod": 0.0,

    # The "target" used to run Mimir.
    "target": "",

    # The current Mimir version.
    "version": {},

    # The current OS and architecture.
    "os": "",
    "arch": "",

    # The Mimir edition. Supported values are: "oss", "enterprise".
    "edition": "",

    # Custom metrics tracked by Mimir. Can contain nested objects.
    "metrics": {},
}

Mimir components tracking usage stats

To get it working out of the box, in the initial implementation Mimir will support tracking of usage statistics only from components already using the blocks storage (so that it's already configured):

Ingesters
Queriers (and rulers when the querier component is running internally)
Store-gateway
Compactor

Action plan

Part of this action plan is outside of Mimir scope (e.g. GEM), but I prefer to keep it as much transparent as possible given the only good intentions we have about using these anonymous reports (all in all we want to better support the community).

Build support in Mimir

Will follow up separately: Come up with a documented strict policy on how additional data collection should be reviewed and approved/rejected (and shared with Loki and Tempo too).

Build support in GEM

Set the edition to enterprise

Build backend API support

Build support in the backend API to collect anonymous usage stats

Build dashboard to query back anonymous usage stats

Build "Mimir Usage Report" dashboard

colega · 2022-08-02T10:34:27Z

One nit:

    created_version: {
        version: "",
        revision: "",
        branch: "",
        buildUser: "",
        buildDate: "",
        goVersion: "",
    },

The information about which Mimir version created the file seems to be ephemeral, and I don't see why we would need it (debugging purposes in case it's wrong?)

The rest of the plan looks good to me! 👍

RichiH · 2022-08-02T11:55:29Z

    # Random UUID uniquely identifying the Mimir cluster.
    UID: "xxx",

The comment says UUID, but the file says UID. UUID v4 are generally better than UIDs

I would argue that starting with a versioned, well, version would be better and that the other projects should also start versioning.

Nothing in the report explicitly tells me if it's Mimir or something else.

    # The current Mimir version.
    "version": {},

So maybe call this mimir_version and leave version free for versioning of the report itself?

56quarters · 2022-08-08T16:27:05Z

Could a requirement of this feature please be documenting how the information collected will evolve over time, if at all? I ask because we're asking our OSS users to trust that we won't collect anything sensitive. My concern is that we inadvertently add some piece of information to the usage stats (because it would be useful to Grafana as a company) without a lot of scrutiny that causes privacy issues or similar. I know that Loki has documentation around how the feature works and we are planning to, but I'd like something that describes how the feature will work over time.

As an example we could document:

We will only change the information collected in a major release (or minor release with a 2 version warning).
Any new information collected will be mentioned in the release notes in a dedicated section.
The documentation about how the feature works will always have the up-to-date list of information collected.
OR we commit to never changing the information collected once this is in a release.

pracucci · 2022-08-08T16:40:19Z

I definitely commit to write the doc and being as much clear as possible. We can't commit to a too strict policy like "we'll never change it" or "we'll change on major releases only", but we'll definitely be very clear about what we collect and why.

RichiH · 2022-08-09T10:44:29Z

Strong +1 on being aggressively transparent on what's being collected.

pracucci · 2022-09-15T15:47:21Z

Enabled by default, so consider this work done.

RichiH added the enhancement New feature or request label May 4, 2022

pracucci self-assigned this Aug 2, 2022

This was referenced Aug 3, 2022

Added cluster seed support in preparation of anonymous usage reporter #2643

Merged

Periodically send anonymous usage stats report #2662

Merged

pracucci added the ease-of-use label Aug 17, 2022

This was referenced Sep 12, 2022

Enable anonymous usage statistics tracking by default #2939

Merged

Track OOO setting via anonymous usage statistics #2940

Merged

pracucci closed this as completed Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add usage report to Mimir #1815

Add usage report to Mimir #1815

RichiH commented May 4, 2022

RichiH commented May 4, 2022

56quarters commented Jun 29, 2022

RichiH commented Jun 29, 2022

pracucci commented Aug 2, 2022

pracucci commented Aug 2, 2022 •

edited

Loading

colega commented Aug 2, 2022

RichiH commented Aug 2, 2022

56quarters commented Aug 8, 2022

pracucci commented Aug 8, 2022

RichiH commented Aug 9, 2022

pracucci commented Sep 15, 2022

Add usage report to Mimir #1815

Add usage report to Mimir #1815

Comments

RichiH commented May 4, 2022

RichiH commented May 4, 2022

56quarters commented Jun 29, 2022

RichiH commented Jun 29, 2022

pracucci commented Aug 2, 2022

pracucci commented Aug 2, 2022 • edited Loading

Requisites

Seed file

Report

Mimir components tracking usage stats

Action plan

Build support in Mimir

Build support in GEM

Build backend API support

Build dashboard to query back anonymous usage stats

colega commented Aug 2, 2022

RichiH commented Aug 2, 2022

56quarters commented Aug 8, 2022

pracucci commented Aug 8, 2022

RichiH commented Aug 9, 2022

pracucci commented Sep 15, 2022

pracucci commented Aug 2, 2022 •

edited

Loading