Feature: Services by ricklamers · Pull Request #224 · orchest/orchest

ricklamers · 2021-05-02T20:53:17Z

Description

This PR adds the idea of "services" to Orchest. A service can be any container that can run as part of a session, e.g. a streamlit app to be interacted with.

Example service JSON configuration:

{
  "services": {
    "streamlit": {
      "command": [],
      "env_variables": {
        "BASE_PATH": "$BASE_PATH_PREFIX_80",
        "MY_ENV": "123"
      },
      "env_variables_inherit": [
        "ABC"
      ],
      "image": "httpd:2.4", // or "environment@<uuid>"
      "name": "streamlit",
      "ports": [
        80
      ],
      "preserve_base_path": false,
      "binds": {
        "/project-dir": "/my-project-dir-path-in-image",
        "/data": "/my-data-path-in-image"
      },
      "scope": [
        "interactive",
        "noninteractive"
      ]
    }
  }
}

Todo

Cleaning up code
Adding section to the docs on services
Create a front-end for the services so that the user does not have to edit the raw JSON configuration for services

ricklamers · 2021-05-02T20:54:21Z

Todos:

BASE_PATH should be able to be referenced in all fields in the services JSON in the pipeline.
Handle environment variables in a way consistent with step env vars
Handle easy and reliable pointing to network addresses in pipeline code. Differentiate between user address and container address. (Traffic shouldn’t go out of the container network for pipeline code)

fruttasecca · 2021-05-03T08:33:18Z

Should the memory and jupyter server(s) be services? It would increase consistency and would make for a nice, self-documenting feature, since "services" in the pipeline settings would already be populated with some entries. I am aware there might be some technical difficulties into making both of them into services, but it's worth reasoning about.

Handle environment variables in a way consistent with step env vars

Assuming those are to be secrets (non versioned) the naive solution would be to have services inherit the pipeline environment variables, but we run into the risk of collisions, secrets leaking into services that did not need them, etc. The way services are currently declared might be at odds with what we want to do. We could use a slightly higher level input form instead of having the user writing jsons in code mirror, and take care of what should go in the pipeline def json and what should be secret and written to the db behind the scenes.

…onment variable passing.

ricklamers · 2021-05-03T13:49:40Z

Environment variables are now passed, but only when explicitly requested:

[{
    name: "webserver",
    image: "httpd:2.4",
    environment_inherit: ["ENV_VAR_NAME"]
}]

ricklamers · 2021-05-03T13:53:17Z

Should the memory and jupyter server(s) be services? It would increase consistency and would make for a nice, self-documenting feature, since "services" in the pipeline settings would already be populated with some entries. I am aware there might be some technical difficulties into making both of them into services, but it's worth reasoning about.

They are not optional and URL routing might be different to services. So I don't think they should be unified into services. As I feel like services should just be user defined services. But they are very similar internally indeed. We should re-use code between services and JupyterLab/the memory server as much as possible. They already share some parts of the implementation internally (e.g. they're both assigned to self._containers[resource]).

ricklamers · 2021-05-03T13:56:38Z

We could use a slightly higher level input form instead of having the user writing jsons in code mirror

I think this is something we should iterate toward. Not necessarily release v1. But do agree it would be nicer.

Secrets can now be passed using environment variable inheritance.

ricklamers · 2021-05-03T13:59:32Z

Handle easy and reliable pointing to network addresses in pipeline code. Differentiate between user address and container address. (Traffic shouldn’t go out of the container network for pipeline code)

This is now handled by the Orchest SDK.

It returns: {"internal_urls": [...], "external_urls": [...], "ports": [80, 8081]}. I.e. one endpoint for each port. At the moment we only support TCP port forwarding.

ricklamers · 2021-05-03T14:04:03Z

One pet peeve is the necessity to tweak container services to incorporate a base path. This is due to how the nginx-proxy (in this implementation) forwards requests.

The structure is: <origin><base_path><port><application_path>
http://127.0.0.1:8000/service-webserver-uuiduuid-uuiduuid_80_/some-path.txt

Or broken into its components:
origin: http://127.0.0.1:8000
base_path: /service-webserver-uuiduuid-uuiduuid
port: _80_
application_path: /some-path.txt

Would like to do it in a 'cleaner' way. But this works reliably and allows you to flexibily run services on any TCP port in the service container without requiring any changes to the way in which we currently host Orchest on AWS and how it runs locally.

ricklamers · 2021-05-03T14:07:23Z

Another potential issue is that we now pull the containers for services from DockerHub when we start the session. This could take a long time and could make the session start request time out. Should we bite the bullet and make the session start (POST) a polling operation?

fruttasecca · 2021-05-05T09:34:20Z

Another potential issue is that we now pull the containers for services from DockerHub when we start the session. This could take a long time and could make the session start request time out. Should we bite the bullet and make the session start (POST) a polling operation?

I think so

yannickperrenet · 2021-05-05T19:37:35Z

    "uuid": "pipeline-uuid",
    "settings": {},
    "parameters": {},
+    "services": [],


Should not forget to also update the JSON schema in the docs.

Good point!

On another note, didn't we change it to be a dictionary instead?

…e to BASE_PATH_PREFIX.

ricklamers · 2021-05-18T11:06:40Z

Note: we should not access the userdir/ in the orchest-api. This requires a refactor of https://github.com/orchest/orchest/pull/224/files#diff-3c5dc561a3261d8aacd881b4cb16e43c99215b5a7966faf9aadc43734dd6302bR158

…into feature-services

…ervices

…into feature-services

yannickperrenet · 2021-06-07T10:23:20Z

+  the inherited ones. Note that, while project and pipeline environment variables
+  are considered as `secrets`, services environment variables aren't and will
+  be persisted in the pipeline definition file.
+- **scope**: To specify if the service should be running in interactive mode, jobs, or both.


Naming the scope to be "jobs" instead of "non-interactive" makes a lot of sense to me and I am sure it is a lot more obvious for the users as well.

Should we define scope to be "pipeline editor" or "jobs" instead?

Makes sense to me, @ricklamers @joe-bell? (will need a small GUI tweak)

…ver, user services

…into feature-services

yannickperrenet · 2021-06-08T08:38:30Z

Alignment of the buttons in the pipeline editor is off. (This is only on this branch and not on master)

…ices

…ervices

…into feature-services

First attempt at services feature.

71193f2

fruttasecca self-requested a review May 3, 2021 07:53

fruttasecca reviewed May 3, 2021

View reviewed changes

Comment thread services/orchest-api/app/app/core/sessions.py Outdated

ricklamers added 2 commits May 3, 2021 15:47

Further implementation of services, list endpoints in frontend, envir…

199e381

…onment variable passing.

Make more explicit, catching all types of Exceptions

fe46b57

ricklamers added 2 commits May 3, 2021 17:00

Fail when pipeline definition cannot be loaded.

011d20c

Add /data mount to services

0efdecd

yannickperrenet reviewed May 5, 2021

View reviewed changes

ricklamers added 3 commits May 11, 2021 14:49

Merge branch 'master' into feature-services

f6ce897

Remove incorrectly commited files

5f393d2

Change port suffix from _port_ to _port. Rename BASE_PATH sub variabl…

c9d7ed9

…e to BASE_PATH_PREFIX.