Skip to content

Commit

Permalink
Merge pull request #49 from Zsailer/fine-grained-data-collection
Browse files Browse the repository at this point in the history
Fine grained data collection
  • Loading branch information
Zsailer committed Sep 9, 2020
2 parents e76585d + 717f8ac commit d44e217
Show file tree
Hide file tree
Showing 10 changed files with 528 additions and 122 deletions.
81 changes: 9 additions & 72 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,11 @@ Jupyter Telemetry

Telemetry provides a configurable traitlets object, EventLog, for structured event-logging in Python. It leverages Python's standard logging library for filtering, handling, and recording events. All events are validated (using jsonschema) against registered JSON schemas.

If you're looking for telemetry in Jupyter frontend applications (like JupyterLab), checkout the work happening in jupyterlab-telemetry_!
The most common way to use Jupyter's telemetry system is to configure the ``EventLog`` objects in Jupyter Applications, (e.g. JupyterLab, Jupyter Notebook, JupyterHub). See the page ":ref:`using-telemetry`"

If you're looking to add telemetry to an application that you're developing, check out the page ":ref:`adding-telemetry`"

If you're looking for client-side telemetry in Jupyter frontend applications (like JupyterLab), checkout the work happening in jupyterlab-telemetry_!

.. _jupyterlab-telemetry: https://github.com/jupyterlab/jupyterlab-telemetry

Expand All @@ -25,81 +29,14 @@ Jupyter's Telemetry library can be installed from PyPI.
pip install jupyter_telemetry
Basic Usage
-----------

Here's a basic example of an EventLog.

.. code-block:: python
import logging
from jupyter_telemetry import EventLog
eventlog = EventLog(
# Use logging handlers to route where events
# should be record.
handlers=[
logging.FileHandler('events.log')
],
# List schemas of events that should be recorded.
allowed_schemas=[
'uri.to.event.schema'
]
)
EventLog has two configurable traits:

- ``handlers``: a list of Python's logging handlers.
- ``allowed_schemas``: a list of event schemas to record.

Event schemas must be registered with the EventLog for events to be recorded. An event schema looks something like:

.. code-block:: json
{
"$id": "url.to.event.schema",
"title": "My Event",
"description": "All events must have a name property.",
"type": "object",
"properties": {
"name": {
"title": "Name",
"description": "Name of event",
"type": "string"
}
},
"required": ["name"],
"version": 1
}
Two fields are required:

- ``$id``: a valid URI to identify the schema (and possibly fetch it from a remote address).
- ``version``: the version of the schema.

The other fields follow standard JSON schema structure.

Schemas can be registered from a Python dict object, a file, or a URL. This example loads the above example schema from file.

.. code-block:: python
# Record an example event.
event = {'name': 'example event'}
eventlog.record_event(
schema_id='url.to.event.schema',
version=1,
event=event
)
.. toctree::
:maxdepth: 2
:maxdepth: 1
:caption: Table of Contents:

pages/schemas.rst

pages/configure
pages/application
pages/schemas

Indices and tables
------------------
Expand Down
44 changes: 44 additions & 0 deletions docs/pages/application.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
.. _adding-telemetry:

Adding telemetry to an application
==================================

Jupyter Telemetry enables you to log events from your running application. (It's designed to work best with traitlet's `Application` object for simple configuration.) To use telemetry, begin by creating an instance of ``EventLog``:

.. code-block:: python
from jupyter_telemetry import EventLog
class MyApplication:
def __init__(self):
...
# The arguments
self.eventlog = EventLog(
...
# Either pass the traits (see below) here,
# or enable users of your application to configure
# the EventLog's traits.
)
EventLog has two configurable traits:

- ``handlers``: a list of Python's logging handlers that handle the recording of incoming events.
- ``allowed_schemas``: a dictionary of options for each schema describing what data should be collected.

Next, you'll need to register event schemas for your application. You can register schemas using the ``register_schema_file`` (JSON or YAML format) or ``register_schema`` methods.


Once your have an instance of ``EventLog`` and your registered schemas, you can use the ``record_event`` method to log that event.

.. code-block:: python
# Record an example event.
event = {'name': 'example event'}
self.eventlog.record_event(
schema_id='url.to.event.schema',
version=1,
event=event
)
35 changes: 35 additions & 0 deletions docs/pages/configure.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. _using-telemetry:

Using telemetry in Jupyter applications
=======================================

Most people will use ``jupyter_telemetry`` to log events data from Jupyter applications, (e.g. JupyterLab, Jupyter Server, JupyterHub, etc).

In this case, you'll be able to record events provided by schemas within those applications. To start, you'll need to configure each application's ``EventLog`` object.

This usually means two things:

1. Define a set of ``logging`` handlers (from Python's standard library) to tell telemetry where to send your event data (e.g. file, remote storage, etc.)
2. List the names of events to collect and the properties/categories to collect from each of those events. (see the example below for more details).

Here is an example of a Jupyter configuration file, e.g. ``jupyter_config.d``, that demonstrates how to configure an eventlog.

.. code-block:: python
from logging import FileHandler
# Log events to a local file on disk.
handler = FileHandler('events.txt')
# Explicitly list the types of events
# to record and what properties or what categories
# of data to begin collecting.
allowed_schemas = {
"uri.to.schema": {
"allowed_properties": ["name", "email"],
"allowed_categories": ["category.jupyter.org/user-identifier"]
}
}
c.EventLog.handlers = [handler]
c.EventLog.allowed_schemas = allowed_schemas
48 changes: 33 additions & 15 deletions docs/pages/schemas.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
Writing a Schema
================
Writing a schema for telemetry
==============================

All Schemas should be a valid `JSON schema`_ and can be written in valid YAML or JSON.

Schemas should follow valid `JSON schema`_. These schemas can be written in valid YAML or JSON.
At a minimum, valid Jupyter Telemetry Event schema requires have the following keys:

At a minimum, valid schemas should have the following keys:

- ``$id`` : a valid URL where the schema lives.
- ``$id`` : a URI to identify (and possibly locate) the schema.
- ``version`` : schema version.
- ``title`` : name of the schema
- ``description`` : documentation for the schema
Expand All @@ -16,28 +15,47 @@ At a minimum, valid schemas should have the following keys:

+ ``title`` : name of the property
+ ``description``: documentation for this property.
+ ``pii``: (optional) boolean for whether this property is personally identifiable information or not.
+ ``categories``: list of types of data being collected

- ``required``: list of required properties.

Here is a minimal example of a valid JSON schema for an event.

.. code-block:: yaml
$id: url.to.event.schema
$id: event.jupyter.org/example-event
version: 1
title: My Event
description: |
All events must have a name property
type: object
properties:
name:
title: Name
description: |
Name of event
type: string
thing:
title: Thing
categories:
- category.jupyter.org/unrestricted
description: A random thing.
user:
title: User name
categories:
- category.jupyter.org/user-identifier
description: Name of user who initiated event
required:
- name
- thing
- user
.. _JSON schema: https://json-schema.org/


Property Categories
-------------------

Each property can be labelled with ``categories`` field. This makes it easier to filter properties based on a category. We recommend that schema authors use valid URIs for these labels, e.g. something like ``category.jupyter.org/unrestricted``.

Below is a list of common category labels that Jupyter Telemetry recommends using:

.. _JSON schema: https://json-schema.org/
* ``category.jupyter.org/unrestricted``
* ``category.jupyter.org/user-identifier``
* ``category.jupyter.org/user-identifiable-information``
* ``category.jupyter.org/action-timestamp``

0 comments on commit d44e217

Please sign in to comment.