Skip to content
This repository was archived by the owner on Jan 10, 2026. It is now read-only.

Feature: Services#224

Merged
ricklamers merged 144 commits intomasterfrom
feature-services
Jun 8, 2021
Merged

Feature: Services#224
ricklamers merged 144 commits intomasterfrom
feature-services

Conversation

@ricklamers
Copy link
Copy Markdown
Member

@ricklamers ricklamers commented May 2, 2021

Description

This PR adds the idea of "services" to Orchest. A service can be any container that can run as part of a session, e.g. a streamlit app to be interacted with.

Example service JSON configuration:

{
  "services": {
    "streamlit": {
      "command": [],
      "env_variables": {
        "BASE_PATH": "$BASE_PATH_PREFIX_80",
        "MY_ENV": "123"
      },
      "env_variables_inherit": [
        "ABC"
      ],
      "image": "httpd:2.4", // or "environment@<uuid>"
      "name": "streamlit",
      "ports": [
        80
      ],
      "preserve_base_path": false,
      "binds": {
        "/project-dir": "/my-project-dir-path-in-image",
        "/data": "/my-data-path-in-image"
      },
      "scope": [
        "interactive",
        "noninteractive"
      ]
    }
  }
}

Todo

  • Cleaning up code
  • Adding section to the docs on services
  • Create a front-end for the services so that the user does not have to edit the raw JSON configuration for services

@ricklamers
Copy link
Copy Markdown
Member Author

ricklamers commented May 2, 2021

Todos:

  • BASE_PATH should be able to be referenced in all fields in the services JSON in the pipeline.
  • Handle environment variables in a way consistent with step env vars
  • Handle easy and reliable pointing to network addresses in pipeline code. Differentiate between user address and container address. (Traffic shouldn’t go out of the container network for pipeline code)

@fruttasecca fruttasecca self-requested a review May 3, 2021 07:53
Comment thread services/orchest-api/app/app/core/sessions.py Outdated
@fruttasecca
Copy link
Copy Markdown
Member

Should the memory and jupyter server(s) be services? It would increase consistency and would make for a nice, self-documenting feature, since "services" in the pipeline settings would already be populated with some entries. I am aware there might be some technical difficulties into making both of them into services, but it's worth reasoning about.

Handle environment variables in a way consistent with step env vars

Assuming those are to be secrets (non versioned) the naive solution would be to have services inherit the pipeline environment variables, but we run into the risk of collisions, secrets leaking into services that did not need them, etc. The way services are currently declared might be at odds with what we want to do. We could use a slightly higher level input form instead of having the user writing jsons in code mirror, and take care of what should go in the pipeline def json and what should be secret and written to the db behind the scenes.

@ricklamers
Copy link
Copy Markdown
Member Author

ricklamers commented May 3, 2021

Environment variables are now passed, but only when explicitly requested:

[{
    name: "webserver",
    image: "httpd:2.4",
    environment_inherit: ["ENV_VAR_NAME"]
}]

@ricklamers
Copy link
Copy Markdown
Member Author

ricklamers commented May 3, 2021

Should the memory and jupyter server(s) be services? It would increase consistency and would make for a nice, self-documenting feature, since "services" in the pipeline settings would already be populated with some entries. I am aware there might be some technical difficulties into making both of them into services, but it's worth reasoning about.

They are not optional and URL routing might be different to services. So I don't think they should be unified into services. As I feel like services should just be user defined services. But they are very similar internally indeed. We should re-use code between services and JupyterLab/the memory server as much as possible. They already share some parts of the implementation internally (e.g. they're both assigned to self._containers[resource]).

@ricklamers
Copy link
Copy Markdown
Member Author

We could use a slightly higher level input form instead of having the user writing jsons in code mirror

I think this is something we should iterate toward. Not necessarily release v1. But do agree it would be nicer.

Secrets can now be passed using environment variable inheritance.

@ricklamers
Copy link
Copy Markdown
Member Author

  • Handle easy and reliable pointing to network addresses in pipeline code. Differentiate between user address and container address. (Traffic shouldn’t go out of the container network for pipeline code)

This is now handled by the Orchest SDK.

It returns: {"internal_urls": [...], "external_urls": [...], "ports": [80, 8081]}. I.e. one endpoint for each port. At the moment we only support TCP port forwarding.

@ricklamers
Copy link
Copy Markdown
Member Author

ricklamers commented May 3, 2021

One pet peeve is the necessity to tweak container services to incorporate a base path. This is due to how the nginx-proxy (in this implementation) forwards requests.

The structure is: <origin><base_path><port><application_path>
http://127.0.0.1:8000/service-webserver-uuiduuid-uuiduuid_80_/some-path.txt

Or broken into its components:
origin: http://127.0.0.1:8000
base_path: /service-webserver-uuiduuid-uuiduuid
port: _80_
application_path: /some-path.txt

Would like to do it in a 'cleaner' way. But this works reliably and allows you to flexibily run services on any TCP port in the service container without requiring any changes to the way in which we currently host Orchest on AWS and how it runs locally.

@ricklamers
Copy link
Copy Markdown
Member Author

Another potential issue is that we now pull the containers for services from DockerHub when we start the session. This could take a long time and could make the session start request time out. Should we bite the bullet and make the session start (POST) a polling operation?

@fruttasecca
Copy link
Copy Markdown
Member

Another potential issue is that we now pull the containers for services from DockerHub when we start the session. This could take a long time and could make the session start request time out. Should we bite the bullet and make the session start (POST) a polling operation?

I think so

"uuid": "pipeline-uuid",
"settings": {},
"parameters": {},
"services": [],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not forget to also update the JSON schema in the docs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another note, didn't we change it to be a dictionary instead?

@ricklamers
Copy link
Copy Markdown
Member Author

Note: we should not access the userdir/ in the orchest-api. This requires a refactor of https://github.com/orchest/orchest/pull/224/files#diff-3c5dc561a3261d8aacd881b4cb16e43c99215b5a7966faf9aadc43734dd6302bR158

Comment thread orchest-sdk/python/orchest/services.py Outdated
Comment thread services/orchest-api/app/app/core/sessions.py
Comment thread services/orchest-api/app/app/core/sessions.py Outdated
Comment thread services/orchest-api/app/app/core/sessions.py Outdated
Comment thread services/orchest-api/app/app/core/sessions.py Outdated
Comment thread services/orchest-api/app/app/core/sessions.py Outdated
Comment thread services/orchest-api/app/app/core/sessions.py Outdated
Comment thread services/orchest-api/app/app/core/sessions.py Outdated
Comment thread docs/source/user_guide/services.rst Outdated
Comment thread docs/source/user_guide/services.rst Outdated
the inherited ones. Note that, while project and pipeline environment variables
are considered as `secrets`, services environment variables aren't and will
be persisted in the pipeline definition file.
- **scope**: To specify if the service should be running in interactive mode, jobs, or both.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming the scope to be "jobs" instead of "non-interactive" makes a lot of sense to me and I am sure it is a lot more obvious for the users as well.

Should we define scope to be "pipeline editor" or "jobs" instead?

Copy link
Copy Markdown
Member

@fruttasecca fruttasecca Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, @ricklamers @joe-bell? (will need a small GUI tweak)

@joe-bell joe-bell mentioned this pull request Jun 7, 2021
3 tasks
@yannickperrenet
Copy link
Copy Markdown
Contributor

Alignment of the buttons in the pipeline editor is off. (This is only on this branch and not on master)

image

@ricklamers ricklamers merged commit 56e8f6e into master Jun 8, 2021
@yannickperrenet yannickperrenet deleted the feature-services branch June 8, 2021 15:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand editor support with integrated code-server

4 participants