Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

[Proposal] openpai-runtime interface #12

Open
Binyang2014 opened this issue May 28, 2020 · 0 comments
Open

[Proposal] openpai-runtime interface #12

Binyang2014 opened this issue May 28, 2020 · 0 comments
Assignees

Comments

@Binyang2014
Copy link
Collaborator

Binyang2014 commented May 28, 2020

Current situation:

Currently, opnepai-runtime is tightly coupled with PAI and Framework Controller.
We just split the code but some logic is mixed. To use runtime, we need to use PAI and framework controller.
We need decouple with these components for independent release cycle and efficient development.

Current Problem:

  • Add features need to cross many repos.
  • Runtime can not be used by other projects, need to modify runtime code if third-party user want to customize runtime.
  • Runtime only can work with PAI and framework controller.

Methods:

Treat all PAI related logical as runtime-plugin. Then openpai-runtime repo only keep the main logical, PAI related code will be treated as PAI specific runtime plugin and maintained in PAI repo.

To implement this, we introduce two concept: init-plugin and runtime-plugin. init-plugin is maintained by developer and used to generate executable code which run in runtime-plugin. End users don't known anything about the init-plugin.

runtime-plugin is used by end user. End user use this plugin to run some command before/after actually commands.

Workflow for openpai-runtime.

  1. Start init-container, read init-plugin spec and run init-plugin sequentially.
  2. Read runtime-plugin spec generate runtime executable file
  3. Start user container, run runtime executable and start the user commands.

Implementation

Init plugin config spec

For init-plugin, it will run in init container. The workdir for init plugin is init.d folder. These plugins is doing some preparing actions such as render user commands... Here is a sample spec for init-plugin. The plugins will run sequentially :

initPlugins:
- name: frameworkBarrier
  command: 'frameworkBarrier framework.json'
- name: frameworkParser
  command: 'python frameworkParser.py framework.json'
- name: imageChecker
  command: 'python imageChecker.py'
- name: portChecker
  command: 'python portChecker.py portListFile'
- name: userCommandRender
  command: 'python command_render.py'

This spec can be transfer to runtime through INIT_CONFIG env or can be a file named init_plugins.yaml under PAI_CONFIG_DIR. We will try to parse INIT_CONFIG env first. If this env is empty, we will try to read init_plugins.yaml. If init_plugins.yaml is absent, the default config init_plugins_default.yaml will be used.

Assumption about init-plugin

We believe init-plugin is rarely changes。 Each cluster only has one configured init-plugin config. So we prefer put init-plugin config into docker image or k8s configMap

Runtime plugin & secret & exitSpec & env

Runtime spec need to be passed through RUNTIME_CONFIG env, or can be a file at PAI_CONFIG_DIR/runtime_plugin.yaml

The spec for runtime plugin

commands: ["ls  -al"]
runtimePlugin:
- plugin: ssh
  parameters:
    jobssh: true
- plugin: teamwise_storage
  parameters:
    storageConfigNames:
      - confignfs

secret file should stored at ${PAI_CONFIG_DIR}/secret.yaml and exit-spec should stored at ${PAI_CONFIG_DIR}/runtime-exit-spec.yaml for environment which want to pass to user container, please put env into ${PAI_RUNTIME_DIR}/env

Development & Usage

Customize runtime

In init container, we will try to run scripts under /user/local/pai/init.d folder. If you want to customize your init-container, please put your scripts under /user/local/pai/init.d folder

The way to build init-container:

FROM openpairuntime/openpai-runtime:latest
COPY src/* /user/local/pai/init.d

If init-plugin will output to a file, it's developer responsibility to make sure the file is on the correct path and don't overwrite something. It's developer responsibility to maintain the customized config file and make sure it's work

Use openPAI runtime

apiVersion: v1
kind: Pod
metadata:
  name: job
  namespace: default
spec:
  initContainers:
  - name: init
    image: openpairuntime/openpai-runtime:latest
    env:
    - name: RUNTIME_CONFIG
      value: >-
        commands: ["ls  -al",  "echo hi"]
        runtimePlugin:
        - plugin: ssh
          parameters:
            jobssh: true
        - plugin: teamwise_storage
          parameters:
            storageConfigNames:
            - confignfs
    - name: INIT_CONFIG
      value: >-
      initPlugins:
      - name: frameworkParser
        command: pai/frameworkBarrier framework.json
      - name: frameworkParser
        command: frameworkParser.py framework.json
    volumeMounts:
    - name: pai-vol
      mountPath: '/usr/local/pai'
    - name: 'job-secrets'
      mountPath: '/usr/local/pai/config/secrets.yaml'
    - name: 'job-exit-spec'
      mountPath: '/usr/local/pai/config/runtime-exit-spec.yaml'
  containers:
  - name: app
    image: ubuntu:latest
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"
    command: ['/usr/local/pai/runtime']
    volumeMounts:
    - name: pai-vol
      mountPath: '/usr/local/pai'
    - name: 'job-secrets'
      mountPath: '/usr/local/pai/config/secrets.yaml'
    - name: 'job-exit-spec'
      mountPath: '/usr/local/pai/config/runtime-exit-spec.yaml'
  volumes:
  - name: pai-vol
    emptyDir: {}
  - name: 'job-secrets'
    secret:
      secretName: 'job-secrets'
  - name: 'job-exit-spec'
    configMap:
      name: runtime-exit-spec-configuration

Result

After this change runtime repo will only keep common plugin:
imageChecker, userCommandRender, portConflictChecker, envGenerator. Each plugin will have clear interface and developer can reuse these plugins.
PAI related plugins will move to PAI repo. such as frameworkBarrier, frameworkParser...

Interfaces:

ENV: INIT_CONFIG, RUNTIME_CONFIG
File: PAI_CONFIG_DIR/init_plugins.yaml ${PAI_CONFIG_DIR}/secret.yaml , ${PAI_CONFIG_DIR}/runtime-exit-spec.yaml, PAI_CONFIG_DIR/runtime_plugin.yaml

Pro:
For new runtime requirement, can be implement rather as init_plugin and runtme_plugin. Do not need to change runtime code is the feature is PAI specific.
Runtime can be reused by other project

Con:

  • New interface, much work to do. Complex data/config pass through env, not friendly for end-user.
  • And new config, the job spec size may larger than before. (Can let other plugin provide task spec, such as call API to get task sepc and put it into some path)

TBD

  • How to customize image build. Allow user customize init container, will need to copy file into docker image. Should provide a pattern for build new runtime.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant