New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give orchest api access to the analytics module #947
Give orchest api access to the analytics module #947
Conversation
Adds an an Analytics subscriber to the orchest-api, this subscriber is subscribed to all events, which will lead to all events getting sent to the analytics server.
This is to avoid subtle errors due to a wrongly named function.
This allows to keep track of what subset of the pipeline nodes where used for the interactive run.
if isinstance(sub, models.AnalyticsSubscriber): | ||
if app_utils.OrchestSettings()["TELEMETRY_DISABLED"]: | ||
_logger.info( | ||
"Telemetry is disabled, skipping event delivery to analytics." | ||
) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving the check for TELEMETRY_DISABLED
into the AnalyticsSubscriber
, feels like it is something the subscriber should own, e.g.: "Am I active?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean at the database level? Or just under the umbrella of the AnalyticsSubscriber
logic? Note that models.py
won't be able to access OrchestSettings()
because said settings use the models.py
module, but we might be able to fetch the TELEMETRY_DISABLED
value through current_app
, although I am not too happy to haves models.py
access the current_app
, feels a bit peculiar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just under the umbrella of the AnalyticsSubscriber logic?
This. I agree that putting it inside the models.py
feels wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. AnalyticsSubscriber
is a model so we are in a bit of a rut given the lack of extension methods in python (and I'd rather not monkey patch), nevertheless, I'll look for improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think it looks great! I thought creating an Analytics subscriber was a clever way to implement things ;)
Apart from what comes out of the discussions on this PR, I think it can be merged.
Identified that the dictionary defined two duplicates: - `ONE_OFF_JOB_PIPELINE_RUN_CREATED` - `CRON_JOB_PIPELINE_RUN_CREATED`
I see, I think it's very reasonable from the application POV, I think we can leave it as it is, let's see what's @astrojuanlu opinion. |
About the two arguments, if we are to create ad hoc classes or
About the use of a class as the contract, I think one of the main "issues" (for lack of a better word) is that the
About point 2:
If I were to synthesize:
|
…-api Add environment created/delete events to `orchest-api` (notifications)
I think I like this best (although not completely convinced about it being an improvement). One of the main reasons I went with the current "weak interface" is that I didn't want to pollute a bunch of webserver proxy endpoints to do the anonymization. Given that this constraint is now no longer valid (although might translate to the orchest-api) the "bring your own" seems like a valid proposal. In the end, to share anonymization functions across endpoints it feels like pretty much the same logic is repeated as it is currently. Just in a different place. Ideally, the |
After an internal chat we decided to go for a "bring your own anonymized event" approach, will be done in another PR. |
…orchest-api Move job_update event to the `orchest-api` (notifications)
…thub.com:orchest/orchest into improv/add-interactive-runs-event-to-orchest-api
…t-to-orchest-api Add interactive pipeline run events to `orchest-api` (notifications)
…active-session:pipeline-run:*
…hest into improv/add-builds-events-to-orchest-api
…est-api Add jupyter builds and interactive sessions events to `orchest-api` (notifications)
It's redundant and it being in synchronous API call can lead to issues with big projects.
👍🏽 to having |
Description
This PR gives access to the
analytics
module to theorchest-api
by moving it in the shared library. Analytics events that were previously sent by the webserver and that exist in theorchest-api
are now responsibility of theorchest-api
since this allows to leverage a number of things like asynchronous delivery, retries, access to theorchest-api
db, etc.The
orchest-api
analytics back-end is implemented as a subscriber subscribed to all events.Now that the
analytics
event is in the shared library I think the interface/contract between the caller and the callee should be stronger. In particular, sending an event through theanalytics
module works by doing something akin toanalytics.send_event(event_type, event_properties)
, where event_type is anEnum
andevent_properties
is a dictionary. It might be better to have a defined type (a class, data class, or typed dict) and make sending an event have the form ofanalytics.send_event(my_event_instance)
. This would reduce coupling between theanalytics
module and its callers, along with providing better guarantees about what is being sent, since currently we rely on the caller to act correctly w.r.t. the content ofevent_properties
. @yannickperrenet Keen to know your opinion.This PR changes the name of the
analytics
events, bringing them in line with how events are named in theorchest-api
, more details will be provided once other changesets toimprov/orchest-api-analytics-base
are done, in particular, the following PRs:orchest-webserver
to theorchest-api
for the aforementioned reasonsorchest-api
events, thus expandinganalytics
send_event
like explained in the previous paragraphThe base of this PR was started pre-controller and I ended up not being able to split this into two smaller PRs for easier review, sorry about that :).