Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rest API endpoints for Events #851

Merged
merged 12 commits into from May 31, 2022

Conversation

3coins
Copy link
Contributor

@3coins 3coins commented May 21, 2022

Part of #780

@codecov-commenter
Copy link

codecov-commenter commented May 21, 2022

Codecov Report

Merging #851 (f87107e) into main (f25fd33) will increase coverage by 0.02%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##             main     #851      +/-   ##
==========================================
+ Coverage   70.20%   70.22%   +0.02%     
==========================================
  Files          65       65              
  Lines        7621     7650      +29     
  Branches     1268     1273       +5     
==========================================
+ Hits         5350     5372      +22     
- Misses       1887     1895       +8     
+ Partials      384      383       -1     
Impacted Files Coverage Δ
jupyter_server/services/events/handlers.py 89.55% <93.75%> (+2.71%) ⬆️
jupyter_server/services/kernels/kernelmanager.py 78.64% <0.00%> (-1.62%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f25fd33...f87107e. Read the comment docs.

@3coins
Copy link
Contributor Author

3coins commented May 21, 2022

cc @Zsailer @afshin

jupyter_server/services/events/handlers.py Outdated Show resolved Hide resolved
jupyter_server/services/events/handlers.py Outdated Show resolved Hide resolved
@Zsailer
Copy link
Member

Zsailer commented May 23, 2022

We may need to consider the issues raised by jupyter/telemetry#21 in this PR.

How do we know if the event source is "trusted"? We might want to consider adding some signing mechanism to verify that the event came from a trusted source. This is more critical in a telemetry use-case, but also seems relevant for generic events. Thoughts?

@Zsailer
Copy link
Member

Zsailer commented May 23, 2022

This looks great, @3coins! Thank you for working on this.

@3coins
Copy link
Contributor Author

3coins commented May 23, 2022

We may need to consider the issues raised by jupyter/telemetry#21 in this PR.

How do we know if the event source is "trusted"? We might want to consider adding some signing mechanism to verify that the event came from a trusted source. This is more critical in a telemetry use-case, but also seems relevant for generic events. Thoughts?

Thanks for bringing this up. A few questions come to mind:

  1. What is the definition of a trusted source? It seems like the attached issue is talking about knowing a "trusted" component. Wouldn't anything a user installs constitute as a trusted component?
  2. There are extensions on the frontend, does that constitute a component?
  3. There are extensions/modules on the back-end, what constitutes as a component?
  4. What is it that we are trying to prevent here, can we hone in on this and come up with a few scenarios?

@afshin
Copy link
Contributor

afshin commented May 24, 2022

Reading the original issue about signing, it raises a few questions and observations.

Let's consider a case where we have a client-side piece of functionality (labextension, browser plugin, curl invocation, dedicated REST API client, etc.). For example, let's imagine a case where we want to record how often the user switches themes; the POST /api/events request body might look something like this:

{
  "schema_name": "ui:theme-change",
  "version":     "0.1",
  "timestamp":   1653391523679,
  "event": {
    "from": "Solarized",
    "to":   "Nord",
    "mode": "Light"
  } 
}

Here we assume the schema_name refers to something like https://schema.jupyter.org/ui:theme-change/0.1 ... more on this assumption later (spoiler alert: let's host a static site with our published, versioned schemas, cc: @bollwyvl).

I think the comments by @betatim and @jaipreet-s in the issue are right that we might not be able to validate a signature added by the client because if the client is going to use a private key to create a signature of the event contents, then the hashing algorithm and the private key need to exist at the point where the POST originates.

Of the examples I gave above: only a browser plugin or a curl request are likely to have a private key that they can use for creating a signature along with a public key registered with the server somewhere, because otherwise, they'd have to basically host their private key as a static asset on the web server that hosts the other HTML/CSS/JS/image assets.

I would suggest that for the first iteration, we leave signature out of the REST API entirely and merely have it in the record_event API if at all, as that is meant to be invoked server-side, where key management is a smaller hill to climb. In this vision, only server extensions can really sign events and truly be trusted. The primary downside of this is that they need to be written in Python (i.e. the curl example won't support it).

What do you think?

@afshin
Copy link
Contributor

afshin commented May 24, 2022

FYI @3coins, I rebased and pushed to your branch!

@3coins
Copy link
Contributor Author

3coins commented May 24, 2022

Thanks @afshin

@3coins 3coins marked this pull request as ready for review May 24, 2022 15:35
@Zsailer
Copy link
Member

Zsailer commented May 24, 2022

Thanks, @afshin. Great example and explanation. I agree, let's worry about this in future iterations. At the very least, it's helpful to have these thoughts documented here for future reference.

@Zsailer
Copy link
Member

Zsailer commented May 24, 2022

@3coins, what do you think about including a unit test that checks the whole flow? This test would register an event schema, add it to the allowed_schemas, add a handler pipe the data to an io stream, POST an instance of the event to the REST API, verify that the event made it to the io stream.

For reference, you can borrow work that @kiendang did in his previous telemetry PR:

import json
EVENT = {
'schema': 'https://example.jupyter.org/client-event',
'version': 1.0,
'event': {
'user': 'user',
'thing': 'thing'
}
}
async def test_client_eventlog(jp_eventlog_sink, jp_fetch):
serverapp, sink = jp_eventlog_sink
serverapp.eventlog.allowed_schemas = {
EVENT['schema']: {
'allowed_categories': [
'category.jupyter.org/unrestricted',
'category.jupyter.org/user-identifier'
]
}
}
r = await jp_fetch(
'api',
'eventlog',
method='POST',
body=json.dumps(EVENT)
)
assert r.code == 204
output = sink.getvalue()
assert output
data = json.loads(output)
assert EVENT['event'].items() <= data.items()

@3coins
Copy link
Contributor Author

3coins commented May 24, 2022

@3coins, what do you think about including a unit test that checks the whole flow? This test would register an event schema, add it to the allowed_schemas, add a handler pipe the data to an io stream, POST an instance of the event to the REST API, verify that the event made it to the io stream.

For reference, you can borrow work that @kiendang did in his previous telemetry PR:

import json
EVENT = {
'schema': 'https://example.jupyter.org/client-event',
'version': 1.0,
'event': {
'user': 'user',
'thing': 'thing'
}
}
async def test_client_eventlog(jp_eventlog_sink, jp_fetch):
serverapp, sink = jp_eventlog_sink
serverapp.eventlog.allowed_schemas = {
EVENT['schema']: {
'allowed_categories': [
'category.jupyter.org/unrestricted',
'category.jupyter.org/user-identifier'
]
}
}
r = await jp_fetch(
'api',
'eventlog',
method='POST',
body=json.dumps(EVENT)
)
assert r.code == 204
output = sink.getvalue()
assert output
data = json.loads(output)
assert EVENT['event'].items() <= data.items()

Good idea! Will add this soon.

@kiendang
Copy link
Contributor

@3coins here's the relevant fixture

@pytest.fixture
def jp_eventlog_sink(jp_configurable_serverapp):
"""Return eventlog and sink objects"""
sink = io.StringIO()
handler = logging.StreamHandler(sink)
cfg = Config()
cfg.EventLog.handlers = [handler]
serverapp = jp_configurable_serverapp(config=cfg)
yield serverapp, sink

jupyter_server/services/events/handlers.py Outdated Show resolved Hide resolved
jupyter_server/services/events/handlers.py Outdated Show resolved Hide resolved
@Zsailer
Copy link
Member

Zsailer commented May 25, 2022

@3coins, this is looking good! Just one more minor comment around error handling that I noticed.


try:
if "timestamp" in payload:
timestamp = datetime.strptime(payload["timestamp"], "%Y-%m-%d %H:%M:%S")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to include microseconds in this kind of timestamp since we should be producing sub-second events.

I would also recommend recording the UTC offset so that "rollup/analysis" applications can operate on data from different timezones. Using an offset is probably more user-friendly wrt today's Jupyter users than going with UTC-based timestamps directly (although more difficult for analysis).

Suggested change
timestamp = datetime.strptime(payload["timestamp"], "%Y-%m-%d %H:%M:%S")
timestamp = datetime.strptime(payload["timestamp"], "%Y-%m-%d %H:%M:%S.%f %z")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that suggestion. I will update in the next push.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-bates I ended up removing the microseconds from the date format because EventLog is dropping this in the record_event method; isoformat by default drops the microseconds.
https://github.com/jupyter/telemetry/blob/master/jupyter_telemetry/eventlog.py#L237

Let me know if we can handle this in a better way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-bates makes a good point here. I think this is an issue with jupyter_telemetry. We should fix this there, then circle back to this line of code later.

@Zsailer
Copy link
Member

Zsailer commented May 27, 2022

@3coins, I'm working on getting jupyter_events packaged and released today. Then, we can drop the dependency on jupyter_telemetry. I'm fine if we do this in a separate PR though.

@afshin
Copy link
Contributor

afshin commented May 28, 2022

If schema_name is meant to hold a full URL (or local schema name) then can we just call it schema in the REST API? I assumed it was more like a name that somewhere else will be associated with a full URL but the mock events are not just a name, they're the full URL (minus protocol).

@3coins
Copy link
Contributor Author

3coins commented May 28, 2022

If schema_name is meant to hold a full URL (or local schema name) then can we just call it schema in the REST API? I assumed it was more like a name that somewhere else will be associated with a full URL but the mock events are not just a name, they're the full URL (minus protocol).

Perhaps, we should update this to be schema_id because this is the unique id that identifies the schema. We should also change this in the record_event method, so this naming is consistent.

@afshin
Copy link
Contributor

afshin commented May 28, 2022

Ah I see, thanks for the clarification. I'll defer to you on whichever field you decide 🚀

I was mostly hoping to avoid an _ (underscore) in the JSON but not at the cost of inconsistency.

@Zsailer
Copy link
Member

Zsailer commented May 31, 2022

This looks good to me. Thanks, @3coins!

I'll work on switching to jupyter_events in a follow-up PR.

@Zsailer Zsailer merged commit 98700d1 into jupyter-server:main May 31, 2022
@welcome
Copy link

welcome bot commented May 31, 2022

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@3coins 3coins deleted the events-rest-endpoints branch November 9, 2022 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants