Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analytics in Go-IPFS #980

Closed
Stebalien opened this issue Feb 26, 2019 · 13 comments
Closed

Analytics in Go-IPFS #980

Stebalien opened this issue Feb 26, 2019 · 13 comments

Comments

@Stebalien
Copy link
Member

Go-IPFS analytics requirements:

  1. Opt-in. We can prompt the user on first setup, possibly even modify their config, maybe even skip the dialog if the user has do-not-track set. However, it needs to be opt-in.
  2. Minimal. We don't need things like City/Country. We should only collect information we need to improve our software and wouldn't in any way identify our users if it were to leak.
  3. We need to clearly define what we collect, ideally displaying a sample.

If you'd like a good example of how to do this, please check out how syncthing handles analytics.

@Stebalien
Copy link
Member Author

We also need to be careful about users using IPFS over a VPN, Tor, etc. Ideally, the webui would communicate with our analytics backend through IPFS/libp2p but we'll have to figure that out.

However, if it's opt-in, we can probably punt on that.

@magik6k
Copy link
Member

magik6k commented Feb 26, 2019

@olizilla
Copy link
Member

For context for people not following along: the current situation is described in mozilla/addons-frontend#930

Opt-in. We can prompt the user on first setup, possibly even modify their config, maybe even skip the dialog if the user has do-not-track set. However, it needs to be opt-in.

Agreed. It is simplest from where we are to just disable by default and ask the user.

Minimal. We don't need things like City/Country. We should only collect information we need to improve our software and wouldn't in any way identify our users if it were to leak.

Agreed.

We need to clearly define what we collect, ideally displaying a sample.

This what we have in place so far


#930 (comment)

@Stebalien
Copy link
Member Author

Ideally, we'd also do as much event aggregation locally instead of sending events to the server (e.g., "the user uses X/Y/Z pages instead of "the user visited X/Y/Z pages at time T"). We don't need (or want) to know when a user is using the WebUI, we just want to know which features are useful, etc.

For context, my syncthing instance reports:

{
  "alwaysLocalNets": false,
  "announce": {
    "defaultServersDNS": 1,
    "defaultServersIP": 0,
    "globalEnabled": true,
    "localEnabled": true,
    "otherServers": 0
  },
  "blockStats": {},
  "cacheIgnoredFiles": false,
  "customDefaultFolderPath": false,
  "customReleaseURL": true,
  "customTempIndexMinBlocks": false,
  "customTrafficClass": false,
  "deviceUses": {
    "compressAlways": 0,
    "compressMetadata": 3,
    "compressNever": 0,
    "customCertName": 0,
    "dynamicAddr": 3,
    "introducer": 0,
    "staticAddr": 0
  },
  "folderMaxFiles": 1263,
  "folderMaxMiB": 3352,
  "folderUses": {
    "autoNormalize": 5,
    "externalVersioning": 0,
    "ignoreDelete": 0,
    "ignorePerms": 0,
    "receiveonly": 0,
    "sendonly": 1,
    "sendreceive": 4,
    "simpleVersioning": 1,
    "staggeredVersioning": 1,
    "trashcanVersioning": 0
  },
  "folderUsesV3": {
    "alwaysWeakHash": 0,
    "conflictsDisabled": 0,
    "conflictsOther": 1,
    "conflictsUnlimited": 4,
    "customWeakHashThreshold": 0,
    "disableSparseFiles": 0,
    "disableTempIndexes": 0,
    "filesystemType": {
      "basic": 5
    },
    "fsWatcherDelays": [
      10,
      10,
      10,
      10,
      10
    ],
    "fsWatcherEnabled": 5,
    "pullOrder": {
      "newestFirst": 1,
      "random": 4
    },
    "scanProgressDisabled": 0
  },
  "guiStats": {
    "debugging": 0,
    "enabled": 1,
    "insecureAdminAccess": 0,
    "insecureAllowFrameLoading": 0,
    "insecureSkipHostCheck": 0,
    "listenLocal": 1,
    "listenUnspecified": 0,
    "theme": {
      "dark": 1
    },
    "useAuth": 1,
    "useTLS": 0
  },
  "hashPerf": 157.1,
  "ignoreStats": {
    "deletable": 0,
    "doubleStars": 0,
    "escapedIncludes": 0,
    "folded": 0,
    "includes": 0,
    "inverts": 0,
    "lines": 0,
    "rooted": 0,
    "stars": 0
  },
  "limitBandwidthInLan": false,
  "longVersion": "syncthing v1.0.1 \"Erbium Earthworm\" (go1.11.5 linux-amd64) builduser@svetlemodry 2019-02-05 18:18:07 UTC [noupgrade]",
  "memorySize": 15924,
  "memoryUsageMiB": 69,
  "natType": "unknown",
  "numCPU": 4,
  "numDevices": 3,
  "numFolders": 5,
  "overwriteRemoteDeviceNames": false,
  "platform": "linux-amd64",
  "progressEmitterEnabled": true,
  "relays": {
    "defaultServers": 1,
    "enabled": true,
    "otherServers": 0
  },
  "rescanIntvs": [
    3600,
    3600,
    3600,
    3600,
    3600
  ],
  "restartOnWakeup": true,
  "sha256Perf": 164.56,
  "temporariesCustom": false,
  "temporariesDisabled": false,
  "totFiles": 2655,
  "totMiB": 4471,
  "transportStats": {
    "tcp6": 1
  },
  "uniqueID": "<omitted>",
  "upgradeAllowedAuto": false,
  "upgradeAllowedManual": false,
  "upgradeAllowedPre": false,
  "uptime": 5902,
  "urVersion": 3,
  "usesRateLimit": false,
  "version": "v1.0.1"
}

@Stebalien
Copy link
Member Author

Also, we need to audit errors. If we send back errors, we need to make sure they don't include sensitive information (ideally asking the user if they want to send an error report, telling the user exactly what's included).

@olizilla
Copy link
Member

Just to check we're on the same page, this is only counting which sections of the webapp are visited, and a handful of actions that tell us you added / removed / moved files or saved a config change in the abstract. It doesn't capure any information about go-ipfs, or your config. it doesn't capture local MFS paths nor does it capture CIDs you browses in the explore page. Only that you used the explore page.

I can disabled city level granularity for "where are our users" which is just a best effort geolocation on ipv4 address, and i can disable error reporting as it's hard to verify that an error will always be free of identifiable info. The errors it would capture right now are only js errors in the web app and the http client. It won't capture go-ipfs errors.

@Stebalien other than opt-in what are the changes from what we have that must happen before we can release it with go-ipfs? I'm happy to investigate event aggregation, but I'd rather not make that a blocker.

@momack2
Copy link
Contributor

momack2 commented Feb 27, 2019

IMHO I think there’s a difference between the metrics requirements about go-ipfs usage vs webUI interaction/usage. If I understand correctly, the metrics that @olizilla has implemented are only for the latter, and general usage of go-ipfs would in no way trigger any metrics collection until the user decided to interact with the webUI property to see stats and use features built into that UI, and even then the metrics would be only about usage of that web UI. I think that interaction path is less sensitive than the whole of go-ipfs (which we’d love more ways to understand, but 100% agree we need to be very careful/opt-in about that). I’d argue therefor that this should be able to proceed with the same metrics collection (for webUI included in go-ipfs) as all other instances of webUI/web properties.

@Stebalien - if you still strongly disagree I think the work around is as @olizilla suggests - turning the metrics default to “opt-in” (and all the other suggestions are nice-to-have). Does that sound right? Would that also make sending back errors “opt in” by default, oli?

@Stebalien
Copy link
Member Author

Stebalien commented Feb 27, 2019

I think that interaction path is less sensitive than the whole of go-ipfs

@momack2

From the user's perspective, the WebUI is a part of go-ipfs (it's the GUI). Interaction with the WebUI is just as sensitive as interaction through the CLI.

We need to think of this as an offline app, not a website.

other than opt-in what are the changes from what we have that must happen before we can release it with go-ipfs?

Other than the privacy implications, my primary concern is the message we're sending. We need to send the message "we care about your privacy, we've thought through this, and we're not trying to sneak anything past you". This is going to take a lot of careful thought (consider how much time Mozilla has spent on this).

However, we can take a short-cut. We can:

  1. Make it opt-in.
  2. Not ask the user to opt-in.
  3. Clearly label analytics as "beta". That is, "BETA: Help improve this app...".

That will allow us to iterate on this while clearly indicating that it's not finished.

To bring this out of beta, we need to:

  1. Clearly communicate what we collect (ideally with examples):
    1. "Browser information" doesn't mean anything to me. Is the WebUI sending my history? I know this probably just means my user agent but the user doesn't.
    2. Are we also collecting access times? I'm pretty sure we're reporting these errors in real-time which leaks when I'm using IPFS.
    3. What's included in these "app errors"? The fact that one can't easily answer this question is why most desktop apps ask before submitting error reports.
    4. Does "the information collected includes" mean there's more information being collected?
  2. Minimize what we collect.
    1. We don't need the city and I'm still not sure why we need the country.
    2. Do we need precise display resolution/density information? Could we round off to a few categories?
    3. Could we aggregate data over time and send batch updates with information like {sections: {settings: 10, peers: 2, ...}} instead of "user X accessed section Y at time T"?

Once we've done that, we can go ahead and ask users if they'd be willing to opt-in (we can even present a sample of the information that would be reported). However, we can only ever ask users to opt-in ONCE. We need to get it right the first time.

olizilla pushed a commit that referenced this issue Feb 27, 2019
- remove doNotTrack detection in favour of always opt-in

wip on: #980


License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>
@olizilla
Copy link
Member

olizilla commented Feb 27, 2019

For shared reference, here are the events that are tracked in current implementation

Begin Session

Sent once at the start of a new session to provide metrics on

  • App version e.g. "2.4.4"
  • User Agent e.g. ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) ..."
  • Screen resolution e.g. "800x600"
  • Screen pixel density e.g. 1
  • Browser locale e.g. "en-GB"

Example Request

GET https://countly.ipfs.io/i?begin_session=1&metrics=%7B%22_app_version%22%3A%222.4.0%22%2C%22_ua%22%3A%22Mozilla%2F5.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2010_14_2)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F72.0.3626.119%20Safari%2F537.36%22%2C%22_resolution%22%3A%221920x1200%22%2C%22_density%22%3A1%2C%22_locale%22%3A%22en-GB%22%7D&app_key=8fa213e6049bff23b08e5f5fbac89e7c27397612&device_id=17de0839-d9b5-4800-9a3c-e07270e0b17b&sdk_name=javascript_native_web&sdk_version=19.02.1&timestamp=1551799890469&hour=15&dow=2

Query params

begin_session: 1
metrics: {"_app_version":"2.4.0","_ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36","_resolution":"1920x1200","_density":1,"_locale":"en-GB"}
app_key: 8fa213e6049bff23b08e5f5fbac89e7c27397612
device_id: 17de0839-d9b5-4800-9a3c-e07270e0b17b
sdk_name: javascript_native_web
sdk_version: 19.02.1
timestamp: 1551799890469
hour: 15
dow: 2

Session duration

Tells us how long people used the Web UI for. Occurs on first enabling analytics and periodically after that.

Example Request

GET https://countly.ipfs.io/i?session_duration=9&app_key=8fa213e6049bff23b08e5f5fbac89e7c27397612&device_id=313e37ac-4d11-4280-8e39-7d2abeb0e9ad&sdk_name=javascript_native_web&sdk_version=18.11&timestamp=1551277005824&hour=14&dow=3

Query params

session_duration: 9
app_key: 8fa213e6049bff23b08e5f5fbac89e7c27397612
device_id: 313e37ac-4d11-4280-8e39-7d2abeb0e9ad
sdk_name: javascript_native_web
sdk_version: 18.11
timestamp: 1551277005824
hour: 14
dow: 3

Page view

Tells us when people load the initial page or navigate to a new section. Sections are defined as the items in the primary nav. The paths that are recorded are the regex patterns used to match the route rather than the full url, see: https://github.com/ipfs-shipyard/ipfs-webui/blob/38743dc9201795bc292e6ef132600622eb461cb2/src/bundles/analytics.js#L46-L55

Example Request

GET https://countly.ipfs.io/i?events=%5B%7B%22key%22%3A%22%5BCLY%5D_view%22%2C%22count%22%3A1%2C%22dur%22%3A9%2C%22segmentation%22%3A%7B%22name%22%3A%22%2F%22%7D%2C%22timestamp%22%3A1551277005823%2C%22hour%22%3A14%2C%22dow%22%3A3%7D%2C%7B%22key%22%3A%22%5BCLY%5D_view%22%2C%22count%22%3A1%2C%22segmentation%22%3A%7B%22name%22%3A%22%2F%22%2C%22visit%22%3A1%2C%22domain%22%3A%22localhost%22%7D%2C%22timestamp%22%3A1551277006675%2C%22hour%22%3A14%2C%22dow%22%3A3%7D%5D&app_key=8fa213e6049bff23b08e5f5fbac89e7c27397612&device_id=313e37ac-4d11-4280-8e39-7d2abeb0e9ad&sdk_name=javascript_native_web&sdk_version=18.11&timestamp=1551277006676&hour=14&dow=3

Query params

events: [{"key":"[CLY]_view","count":1,"dur":9,"segmentation":{"name":"/"},"timestamp":1551277005823,"hour":14,"dow":3},{"key":"[CLY]_view","count":1,"segmentation":{"name":"/","visit":1,"domain":"localhost"},"timestamp":1551277006675,"hour":14,"dow":3}]
app_key: 8fa213e6049bff23b08e5f5fbac89e7c27397612
device_id: 313e37ac-4d11-4280-8e39-7d2abeb0e9ad
sdk_name: javascript_native_web
sdk_version: 18.11
timestamp: 1551277006676
hour: 14
dow: 3

Custom Event

App specific actions. We record that the action happened as the key and how long it took from start to finish as the dur property. There is also a count property that lets us record how many times the event occured. This is always set to 1.

The recorded actions are:

  • CONFIG_SAVE - user updated their config. The config is not recorded**
  • CONFIG_SAVE_FAILED - an error occured while saving the config. The error is not recorded
  • FILES_MAKEDIR - user made a new directory. The dirname is not recorded
  • FILES_MAKEDIR_FAILED - an error occured creating the dir. The error is not recorded
  • FILES_WRITE - user added a files or directories. Info about the files is not recorded
  • FILES_WRITE_FAILED - files or directories could not be added. The error is not recorded
  • FILES_ADDBYPATH - user added a file by its ipfs address. The address is not recorded
  • FILES_ADDBYPATH_FAILED - an error occured adding the address. The error is not recorded
  • FILES_MOVE - user moved around in their MFS. src and dest is *not recorded.
  • FILES_MOVE_FAILED - an error occured moving the files. The error is not recorded
  • FILES_DELETE - user deleted 1 or more files. The files are not recorded
  • FILES_DELETE_FAILED - an error occured deleting the files. The error is not recorded
  • FILES_DOWNLOADLINK - use downloaded 1 or more files. The files are not recorded

Example Request

GET https://countly.ipfs.io/i?events=%5B%7B%22key%22%3A%22CONFIG_SAVE%22%2C%22count%22%3A1%2C%22dur%22%3A0.03000499999686144%2C%22timestamp%22%3A1551277789854%2C%22hour%22%3A14%2C%22dow%22%3A3%7D%5D&app_key=8fa213e6049bff23b08e5f5fbac89e7c27397612&device_id=313e37ac-4d11-4280-8e39-7d2abeb0e9ad&sdk_name=javascript_native_web&sdk_version=18.11&timestamp=1551277789855&hour=14&dow=3

Query params

events: [{"key":"CONFIG_SAVE","count":1,"dur":0.03000499999686144,"timestamp":1551277789854,"hour":14,"dow":3}]
app_key: 8fa213e6049bff23b08e5f5fbac89e7c27397612
device_id: 313e37ac-4d11-4280-8e39-7d2abeb0e9ad
sdk_name: javascript_native_web
sdk_version: 18.11
timestamp: 1551277789855
hour: 14
dow: 3

@mikeal
Copy link

mikeal commented Feb 27, 2019

  1. We don't need the city and I'm still not sure why we need the country.

I’m not 100% sure we specifically need this in this project but we will certainly use this kind of data collected from our own websites and would potentially use it if collected from other resources.

In the past I’ve used this exact data from nodejs.org to figure out where to have the Node.js Foundation run events.

@olizilla
Copy link
Member

olizilla commented Mar 5, 2019

This issue is a useful reference point mozilla/addons#3145 - firefox added google analytics to the add-ons store which, while a seperate site, appears as part of the app ui when accessed via firefoxes add ons menu.

The generaly feeling of the thread was that it should be opt-in, and it was weird that they were using google. Their solution was to make it doNotTrack aware, and plead to a special deal they have with google, which seems unconvincing to an end user who has little insight to the deal and if it is meaningfully enforcable.

If we roll out opt-in analytics, and dont send the data to a third party like google, and show clearly what is collected, I think we are doing well.

It may not do much for our actual goal of getting useful metrics on numbers of users across ipfs webui and ipfs desktop tho. Having some un-representative numbers may be better than none at all. At least we'd see any significant changes in user numbers that occur.

olizilla added a commit that referenced this issue Mar 6, 2019
see: #980

License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>
@olizilla
Copy link
Member

olizilla commented Mar 7, 2019

@Stebalien work in progress here #985 (comment)

@Stebalien
Copy link
Member Author

Those events look fine, we just need to find a way to communicate this to the user.

My one concern is that I'd much prefer to aggregate locally and avoid sending time-stamped events. It's the difference between turning in a report at the end of every day and having someone look over your shoulder. However, I understand that this just isn't how people do web analytics so I'm not going to block on this.


We don't need the city and I'm still not sure why we need the country.

I’m not 100% sure we specifically need this in this project but we will certainly use this kind of data collected from our own websites and would potentially use it if collected from other resources.

In the past I’ve used this exact data from nodejs.org to figure out where to have the Node.js Foundation run events.

That sounds like a great metric to collect from our websites and probably significantly more useful/reliable.


This issue is a useful reference point mozilla/addons#3145 - firefox added google analytics to the add-ons store which, while a seperate site, appears as part of the app ui when accessed via firefoxes add ons menu.

Yeah, Mozilla has made a bunch of questionable decisions. We should emulate their messaging and communication, not necessarily their execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants