Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Milestone 2 #270

Merged
merged 248 commits into from Jan 27, 2020
Merged

Milestone 2 #270

merged 248 commits into from Jan 27, 2020

Conversation

@m90
Copy link
Member

m90 commented Jan 24, 2020

Milestone 2 has been titled "collecting data securely". This means we started looking at how we collect data and how we ensure users are aware of what we are doing and express their consent before we do so. We also looked at how we ensure all data is encrypted safely and instances can always use secure connections.


Achievements

Roughly two months after merging Milestone 1 into master and 82 Pull Requests later we find ourselves ready for merging milestone 2. This update gives you a brief overview of what we've been working on during that period. If you're interested in the details, dive into the individual Pull Requests we link and find out how we tackled the tasks at hand at the issues we faced.

Opt-In

One of our key features is that we require user consent before collecting any usage data. During this milestone we implemented a first lightweight approach for this requirement. Websites that embed the Offen script will now display an unobstrusive banner which visitors can use to express their consent decision. Once the decision has been taken, no banners will be shown anywhere. In case the user's decision has been to deny consent, no requests against the Offen instance other than loading the script will be made on subsequent visits. Server side logging of these script-only requests contains neither IP addresses nor User-Agent Strings (this also holds true for all Server-Side logging done by Offen).

Implemented in #155 and #251

Userland Crypto

Offen encrypts all event data before it leaves the browser. This makes sure accidental data leaks or hacks of an Offen instance cannot expose user data.

Ideally, the window.crypto.subtle API is used for this in browsers. Yet, it is limited to host pages that are running in a secure context, which implies the host page is using the https protocol. And while the situation has improved greatly over the last few years, this is still not possible for everyone.

To make sure everyone can use Offen, we added an alternative crypto implementation to Offen based on node-forge. The code for this is loaded lazily so users on SSL-enabled sites are not affected by bigger bundle sizes, just like users of http-only sites can be sure their data is handled with respect.

Implemented in #200 and #201

Extended stats

Without having to collect any sensitive information about users, we added an additional set of statistics so that operators can get a better overview about what happens on their websites.

  • Average Page Load times
  • A live view of activity that has happened in the last 15 minutes (this is only visible to operators)
  • Average Page Depth
  • Share of mobile users (we do not use the UA string for determining "mobileness")
  • Landing and Exit Pages
  • Referrers by UTM Source and UTM Campaign
  • Weekly User Retention

Implemented in #203, #210, #211, #204, #205, #206, #212, #216

Live Deployment

While we shut down our very first initial deployment of Offen while working on Milestone 1, we reinstated a live deployment of the current state of Offen. It is now running on the analytics.offen.dev domain and is used for collecting usage statistics for www.offen.dev. Head over to our homepage, opt-in (if you feel like it) and use the Auditorium at https://analytics.offen.dev/auditorium to manage your data.

Some technical aspects of the deployment that illustrate the low footprint requirements of running your own instance of Offen:

  • The application is able to acquire and renew its own SSL certificate using LetsEncrypt. This means we can guarantee safe transmission of data without costs or additional effort.
  • The application is running smoothly on the cheapest available hardware. Hosted on AWS, a single free-tier eligiblet2.micro instance is capable of hosting the service at times of load.
  • Data is persisted in a local SQLite database which performs well, is easy to backup and comes at no additional infrastructure cost.
  • Running off the docker/docker image we publish on Docker Hub, no setup other than installing Docker and configuring the application using the provided setup command is required to run a production ready application.

Cross OS build

While Linux will still be our main deployment target, we think it's a valuable addition to support more operating systems, both for production deployments as well as for interested users that want to test Offen on their local machine. In this milestone, we added builds for Windows and Darwin/MacOS which we cross compile into statically linked binaries in our standard build environment.

This means that just like with Linux, you can now download a single binary file on Windows or MacOS and have an Offen Demo up and running on your local system in no time.

Implemented in #234, #253

Release strategy

Offen is under active development and things are still changing at a rapid pace, yet with the introduction of the User Opt-In feature Offen has become usable for the public.

At the moment you'd probably still have to feel a little adventurous to start using it in a production setup, but we are actively working towards stabilizing a lot of things soon and onboarding our first users. As we definitely want to keep these users, we spent time thinking about our approach towards versioning and releasing Offen in a way that is transparent and user friendly without getting into the way of development. Our thoughts on this topic are collected in this article we published on our website.

In case you do feel adventurous, head over to the releases section and check out our very first alpha release.

New docs site

While working on this milestone, we kept adding more docs and noticed we are starting to outgrow a simple GitHub wiki. This is why we migrated our docs to a dedicated site. Source code is available under a MIT License at https://github.com/offen/docs


Known issues

Event Data in Offen is encrypted. To encrypt data for users we need to generate a key and store it safely on the client side. To protect this key from 3rd party scripts that might be embedded on the host sites, this key in saved in the context of an iframe element. The way saving of this user key is implemented right now does not work in Apple's Safari browser.

Luckily, we can use the new and improved Opt-In flow to generate keys at opt-in time (instead of when creating the first event), so we will be able to add back Safari support in Milestone 3.


Up next

Milestone 3 is about "Displaying data". This implies three key areas we want to work on.

Enabling an informed Opt-In / Opt-Out decision

Offen collects usage data on user consent only. This is a great start, but we want to use the next milestone for further enabling an informed decision about this on the side of the user. This means we will revisit the different user journeys involved in making this decision, and also supply more information material for users about the topic.

Annotated statistics for users

One of the key aspects of Offen is making usage data available to users too. This is why we want to extend the Auditorium in a way that users can understand what the data actually means, why it is of interest for website operators and what the privacy implications raised by collecting it are.

Data sync

Offen encrypts all usage data. This means clients that want to access usage data have to sync against the encrypted data on the server and decrypt it client side so it can be queried. This is a relatively heavy operation.

While our current approach works well enough, we want to look into making this more performant and robust so it can be used in scenarios that handle a lot of data.


Bonus: Getting your hands dirty

Building for ARM

In Milestone 2 we added cross-OS builds for Offen using xgo. By default our build covers Linux, which is our main deploy target, and also Window and MacOS which are the most common operating systems for users that want to try Offen or participate in its development. All of these builds target x86_64 architecture.

This does not mean we are limited to these operating systems or architectures though. If, for example, you are an ARM user and want to build Offen, you can build a version for your combination of OS and processor architecture.

After you have cloned the repository ensure you have Docker installed. Once done you can pass your target OS and architecture to make build:

TARGETS=linux/arm64 make build

Now, you're ready to use the binary. A good start is running a one-off demo version:

./bin/offen-linux-arm64 demo

Adding a new metric to the stats

Milestone 2 had us look into extending the basic set of stats we offered previously. If you're interested in how this works from a code perspective, let's have a look at how to add another metric: the share of new users in the selected timeframe.

To calculate this metric, we need to know the following: how many of the user identifiers in the current timeframe are already contained in the set of events before the beginning of the timeframe.

In vault/src/stats.js we could write a function that calculates this value:

exports.newUsers = consumeAsync(newUsers)

// please note that this function is optimized for obviousness
// instead of performance. We'd write it differently if it'd
// be included in the actual application.
function newUsers (events, allEvents) {
  // collect the ids of the events in the timeframe
  // so we can filter all events against it and get an array
  // of user ids from outside of our current timerange
  var eventIds = events.map(function (event) {
    return event.eventId
  })
  var previousEvents = allEvents.filter(function (event) {
    return eventIds.indexOf(event.eventId) < 0
  })
  var previouslySeenUsers = previousEvents.map(function (event) {
    return event.secretId
  })

  // next, we can count how many of the unique user ids in the
  // current timeframe are contained in the list of previously
  // seen users
  var secretIds = events.map(function (event) {
    return event.secretId
  })
  if (secretIds.length === 0) {
    // if no events are found we can return early instead
    // of dividing by zero
    return 0
  }

  var uniqueSecretIds = secretIds.filter(function (secretId, index) {
    return secretIds.indexOf(secretId) === index
  })
  var newUsers = uniqueSecretIds.filter(function (secretId) {
    return previouslySeenUsers.indexOf(secretId) < 0
  })

  return newUsers.length / uniqueSecretIds.length
}

Next, we want our default stats query to actually calculate this value. In vault/src/queries.js add the following to the getDefaultStats function:

var allEvents = table.toArray()
// eventsInBounds already contains the events in the requested timeframe
var newUsers = stats.newUsers(eventsInBounds, allEvents)

When the function returns, ensure the new value is being passed through:

return Promise
  .all([
    // ...previous values
    newUsers
  ])
  .then(function (results) {
    return {
      // ...previousValues
      newUsers: results[/*pick the index where newUsers has been placed*/]
    }
  })

Now that the value is calculated and included in the vault's query result, we can move on to the Auditorium where the metric can be displayed. In auditorium/views/main.js look for where the keyMetrics section is being defined. Add the new value by inserting a new chunk that looks like this:

var keyMetrics = html`
...
${keyMetric(__('New Users'), `${formatNumber(state.model.newUsers, 100)} %`)}
...

If you now run your local application, check the Auditorium and you can see the new metric in action.


Feedback? Found a bug?

If you have any feedback, comment or bug report on this milestone release, we'd love to hear from you. Open an issue, leave a comment on this PR or send us an email at hioffen@posteo.de.

m90 added 30 commits Nov 29, 2019
fix unlabeled input elements in forms
Use single router for serving entire app
Bind crypto implementation to execution context instead of importing
Do not use compound indexes for IndexedDB
Add flag to read account password from stdin
Store user keys as jwk, specify tagLength for Edge support
Hide native CryptoKey in native crypto implementation
Add alternative crypto implementation based on userland code
Load forge based crypto lazily only when needed
Do not expose account data during key exchange
Collect time to interactive, display average pageload time
Calculate average page depth
m90 added 28 commits Jan 22, 2020
Fix bad segmenting of retention periods
Allow 400 failure when purging non-existent user
Store cleanup
Fix bootstrap command failing on bad previous state
Define message for empty tables at view level
Improve semantics around users, secrets and account users
Add async attribute to embed instructions
Add test site for generating usage data during development
Adjust CI workflow to build tagged releases
Tag master releases as stable
Default to alpine friendly database location
@m90 m90 merged commit 75b365b into master Jan 27, 2020
7 checks passed
7 checks passed
ci/circleci: auditorium Your tests passed on CircleCI!
Details
ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: integration Your tests passed on CircleCI!
Details
ci/circleci: packages Your tests passed on CircleCI!
Details
ci/circleci: script Your tests passed on CircleCI!
Details
ci/circleci: server Your tests passed on CircleCI!
Details
ci/circleci: vault Your tests passed on CircleCI!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.